Automatic Differentiation Step by Step

Mark Saroufim
14 min readNov 12, 2019

Automatic Differentiation lets you compute exact derivatives in constant time

Differentiation shows up everywhere from the backprop algorithm in deep neural networks to the equations of motion in physics and to pretty much any field that needs to quantify a rate of change.

Automatic Differentiation is the secret sauce that powers all the hottest and latest Machine Learning frameworks from Flux.jl to Pytorch to Tensorflow. Differentiation in general is becoming a first class citizen in programming languages with early work started by Chris Lattner of LLVM fame — see the Differentiable Programming Manifesto for more detail.

Those frameworks essentially offer you a mini programming language embedded in a larger one where computing the derivative of a function takes as much time as evaluating the function. In the case of Deep Learning, you define a network with a loss function and get a gradient for free.

Automatic Differentiation != Numeric Differentiation

To make the above point clearer let’s go over Symbolic Differentiation and Numerical Differentiation and then introduce Automatic Differentiation and what makes it so great.