I’m trying to teach a lesson on gradient descent from a more statistical and theoretical perspective, and need a good example to show its usefulness.
What is the simplest possible algebraic function that would be impossible or rather difficult to optimize for, by setting its 1st derivative to 0, but easily doable with gradient descent? I preferably want to demonstrate this in context linear regression or some extremely simple machine learning model.
I do not understand most comments here.
Gradient descent is just the same as the tangent method. It is ubiquitously used, e.g. find minimum of whatever polynomial of degree >= 4.
Calculating derivative and finding 0 of the derivative is still the same problem. You look for a numerical solution using gradient descents. Bisection is slower and not so effective for multivariate functions.
I would say the opposite: there are more optimisation problems where gradient descent is used than not (excluding everything which can be solved by linear systems)
This reminds me that I still didn’t read the paper on forward-forward algorithm and thus I’m not even sure if it’s still gradient descent.