Automatic Differentiation Step by Step

Mark Saroufim
14 min readNov 12, 2019

Automatic Differentiation lets you compute exact derivatives in constant time

Differentiation shows up everywhere from the backprop algorithm in deep neural networks to the equations of motion in physics and to pretty much any field that needs to quantify a rate of change.

Automatic Differentiation is the secret sauce that powers all the hottest and latest Machine Learning frameworks from Flux.jl to Pytorch to Tensorflow. Differentiation in general is becoming a first class citizen in programming languages with early work started by Chris Lattner of LLVM fame — see the Differentiable Programming Manifesto for more detail.

Those frameworks essentially offer you a mini programming language embedded in a larger one where computing the derivative of a function takes as much time as evaluating the function. In the case of Deep Learning, you define a network with a loss function and get a gradient for free.

Automatic Differentiation != Numeric Differentiation

To make the above point clearer let’s go over Symbolic Differentiation and Numerical Differentiation and then introduce Automatic Differentiation and what makes it so great.

But just know that it doesn’t have to be complicated, in fact you can have a working implementation of Automatic Differentiation fit in a single Tweet.

Symbolic Differentiation

Symbolic differentiation works by breaking apart a complex expression into a bunch of simpler expressions by using various rules — very similar to a compiler.

Examples of some rules

Sum rule

Constant rule

Derivatives of powers rule

And then you would use those rules to solve a differential expression, as an example we’d like to find the derivative of f where f