A compilation of concepts I want to remember...

Navigation
 » Home
 » About Me
 » Github

Gradient descent with a dash of Linear Algebra #2

11 Jan 2017 » deeplearning, math

This is #2 of a series where I am trying to digest the presentation, Ian Goodfellow.

We left off, having defined the cost function, \(J(\theta)\), the gradient, \(\nabla_{\theta}J(\theta)\), and the Hessian, \(\textbf{H}\).

The \(\textbf{H}\), can be diagonalized, as \(\textbf{H}\) is symmetric, and requires that the \(\det{H} \neq 0\). \[\textbf{H}=\textbf{Q}\Lambda\textbf{Q}^T\] where \(\textbf{Q}\) is the matrix of eigenvectors, \(v_i\), and \(\Lambda\) is the diagonal matrix containing the eigenvalues, \(\lambda_i\). Thus for example, a \(2\times2\) Hessian matrix with unique eigenvalues,\(\lambda_1,\lambda_2\) the expansion would be:

\[\textbf{H}=\begin{bmatrix}v_1 & v_2 \end{bmatrix}\begin{bmatrix}\lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}\begin{bmatrix}v_1 \\ v_2 \end{bmatrix}\]

Further, he defines the \(\textbf{H}\), applied in the direction \(d\), as \(d^T\textbf{H}d=\sum_i\lambda_i\cos^2(\theta_i)\) where \(\theta\) in this case is defined as the angle between the eigenvector, \(v_i\), and direction vector \(d\). \(\textit{This is important, as its used in the simplification to follow}\). Just remember that if directions aligned and \(v_i\) is just a translation of \(d\), the vectors run parallel, thus angle between is \(0\), and \(\cos^2(\theta_i) = 1\)

Will continue on the next post…all mistakes are mine, if any please point out and I will amend.