Linear Algebra Foundation
Gradients and Hessians
Recall that a matrix $ A \in \mathbb{R}^{n \times n} $ is symmetric if $ A^T = A $, that is, $ A_{ij} = A_{ji} $ for all $ i, j $. Also recall the gradient $ \nabla f(x) $ of a function $ f : \mathbb{R}^n \rightarrow \mathbb{R} $, which is the n-vector of partial derivatives
\[\nabla f(x) = \begin{bmatrix} \frac{\partial}{\partial x_1} f(x) \\ \vdots \\ \frac{\partial}{\partial x_n} f(x) \end{bmatrix}\]where $ x = $
\[\begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix}\]The hessian $ \nabla^2 f(x) $ of a function $ f : \mathbb{R}^n \rightarrow \mathbb{R} $ is the $ n \times n $ symmetric matrix of twice partial derivatives,
\[\nabla^2 f(x) = \begin{bmatrix} \frac{\partial^2}{\partial x_1^2} f(x) & \frac{\partial^2}{\partial x_1 \partial x_2} f(x) & \cdots & \frac{\partial^2}{\partial x_1 \partial x_n} f(x) \\ \frac{\partial^2}{\partial x_2 \partial x_1} f(x) & \frac{\partial^2}{\partial x_2^2} f(x) & \cdots & \frac{\partial^2}{\partial x_2 \partial x_n} f(x) \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2}{\partial x_n \partial x_1} f(x) & \frac{\partial^2}{\partial x_n \partial x_2} f(x) & \cdots & \frac{\partial^2}{\partial x_n^2} f(x) \\ \end{bmatrix}\]-
(a) Let $ f(x) = \frac{1}{2} x^T A x + b^T x $, where $ A $ is a symmetric matrix and $ b \in \mathbb{R}^n $ is a vector. What is $ \nabla f(x) $?
-
(b) Let $ f(x) = g(h(x)) $, where $ g : \mathbb{R} \rightarrow \mathbb{R} $ is differentiable and $ h : \mathbb{R}^n \rightarrow \mathbb{R} $ is differentiable. What is $ \nabla f(x) $?
-
(c) Let $ f(x) = \frac{1}{2} x^T A x + b^T x $, where $ A $ is symmetric and $ b \in \mathbb{R}^n $ is a vector. What is $ \nabla^2 f(x) $?
-
(d) Let $ f(x) = g(a^T x) $, where $ g : \mathbb{R} \rightarrow \mathbb{R} $ is continuously differentiable and $ a \in \mathbb{R}^n $ is a vector. What are $ \nabla f(x) $ and $ \nabla^2 f(x) $? (Hint: your expression for $ \nabla^2 f(x) $ may have as few as 11 symbols, including ‘ and parentheses.)
Solutions
Problem 1(a)
We want to find the gradient $ \nabla f(x) $ of the function $ f(x) = \frac{1}{2} x^T A x + b^T x $, where $ A $ is a symmetric matrix and $ b \in \mathbb{R}^n $ is a vector.
-
Differentiate $ \frac{1}{2} x^T A x $:
The derivative of the quadratic form $ x^T A x $ with respect to $ x $ is $ Ax + A^T x $. Since $ A $ is symmetric ($ A = A^T $), this simplifies to $ 2Ax $. The coefficient $ \frac{1}{2} $ in front of $ x^T A x $ will cancel the 2 from the derivative, resulting in:
\[\frac{\partial}{\partial x} \left(\frac{1}{2} x^T A x\right) = Ax\] -
Differentiate $ b^T x $:
The gradient of the linear form $ b^T x $ with respect to $ x $ is $ b $, because the derivative of each component $ b_i x_i $ with respect to $ x_i $ is just $ b_i $:
\[\frac{\partial}{\partial x} (b^T x) = b\] -
Combine the results:
The gradient $ \nabla f(x) $ is the sum of the gradients of $ \frac{1}{2} x^T A x $ and $ b^T x $:
\[\nabla f(x) = Ax + b\]
Thus, the gradient $ \nabla f(x) $ of the function $ f(x) $ is $ Ax + b $.
Problem 1(b) Solution
Find the gradient $ \nabla f(x) $ of the function $ f(x) = g(h(x)) $, where $ g: \mathbb{R} \rightarrow \mathbb{R} $ is differentiable and $ h: \mathbb{R}^n \rightarrow \mathbb{R} $ is differentiable.
-
By the chain rule for gradients, the gradient of $ f $ with respect to $ x $ is the product of the derivative of $ g $ with respect to $ h(x) $ and the gradient of $ h $ with respect to $ x $:
\[\nabla f(x) = g'(h(x)) \cdot \nabla h(x)\]where $ g’(h(x)) $ is a scalar and $ \nabla h(x) $ is a vector.
Problem 1(c) Solution
Given $ f(x) = \frac{1}{2} x^T A x + b^T x $, where $ A $ is symmetric and $ b \in \mathbb{R}^n $ is a vector, find the Hessian $ \nabla^2 f(x) $.
- We have already calculated $ \nabla f(x) = Ax + b $ in problem 1(a).
- Now, to find the Hessian, we differentiate the gradient $ \nabla f(x) $ with respect to $ x $ again.
- The derivative of $ Ax $ with respect to $ x $ is $ A $ since $ A $ is constant with respect to $ x $.
-
The derivative of $ b $ with respect to $ x $ is zero since $ b $ does not depend on $ x $.
Combining these results, we get:
\[\nabla^2 f(x) = A\]
Problem 1(d) Solution
To solve problem 1(d), we need to find the gradient $ \nabla f(\mathbf{x}) $ and the Hessian $ \nabla^2 f(\mathbf{x}) $ of the function $ f(\mathbf{x}) = g(a^T \mathbf{x}) $, where $ g: \mathbb{R} \rightarrow \mathbb{R} $ is continuously differentiable and $ a \in \mathbb{R}^n $ is a vector.
Finding the Gradient $ \nabla f(\mathbf{x}) $:
The gradient of a scalar function is a vector of its first partial derivatives. Here, we use the chain rule:
-
Apply the chain rule: $ f(\mathbf{x}) = g(u) $ with $ u = a^T \mathbf{x} $. The derivative of $ f $ with respect to $ x_i $ is: \(\frac{\partial f}{\partial x_i} = \frac{\partial g}{\partial u} \cdot \frac{\partial u}{\partial x_i}\) where $ \frac{\partial u}{\partial x_i} = a_i $.
-
Compute the gradient: \(\nabla f(\mathbf{x}) = \left[ \begin{array}{c} \frac{\partial g}{\partial u} a_1 \\ \vdots \\ \frac{\partial g}{\partial u} a_n \end{array} \right]\) Factoring out $ \frac{\partial g}{\partial u} $: \(\nabla f(\mathbf{x}) = g'(a^T \mathbf{x}) \cdot a\)
Finding the Hessian $ \nabla^2 f(\mathbf{x}) $:
The Hessian is a square matrix of second partial derivatives.
-
Use the product rule: For the second derivatives, we differentiate $ g’(a^T \mathbf{x}) \cdot a $ again with respect to $ x $.
-
Compute the Hessian: Each element $ (i, j) $ of the Hessian is: \(\frac{\partial^2 f}{\partial x_i \partial x_j} = a_i \cdot g''(a^T \mathbf{x}) \cdot a_j\) Thus, the Hessian matrix is: \(\nabla^2 f(\mathbf{x}) = g''(a^T \mathbf{x}) \cdot a a^T\)
The gradient is a vector in the direction of $ a $ scaled by $ g’ $, and the Hessian is an outer product of $ a $ with itself, scaled by $ g’’ $.
Enjoy Reading This Article?
Here are some more articles you might like to read next: