Derivatives of Generalized Eigenvalues and Eigenvectors

By David K. Zhang


Under Construction!

Consider the generalized eigenvalue problem

(1)
\[H \vv = E O \vv \]

where $H$ and $O$ are $n \times n$ matrices, $E$ is an eigenvalue, and $\vv$ is the corresponding eigenvector. For simplicity, we will restrict our attention to the special case of a symmetric-definite eigenvalue problem, in which $H$ and $O$ are real symmetric matrices and $O$ is positive-definite.

Suppose $H$ and $O$ depend smoothly upon a real parameter $\alpha$. That is to say, the entries $\{H_{ij}(\alpha)\}$ and $\{O_{ij}(\alpha)\}$ are smooth (infinitely differentiable) functions of $\alpha$. Suppose further that this dependence is explicitly known, in the sense that the derivatives $\pdv{H_{ij}}{\alpha}$ and $\pdv{O_{ij}}{\alpha}$ can be calculated by a known procedure. Then we can ask how the eigenvalues $E$ and eigenvectors $\vv$ vary in response to change in $\alpha$.

By differentiating equation (1) with respect to $\alpha$ and applying the product rule, we have

(2)
\[\pdv{H}{\alpha} \vv + H \pdv{\vv}{\alpha} = \pdv{E}{\alpha} O \vv + E \pdv{O}{\alpha} \vv + EO \pdv{\vv}{\alpha} \]

which, after collecting and regrouping terms on the LHS, can be rewritten as

(3)
\[\qty(\pdv{H}{\alpha} - \pdv{E}{\alpha} O - E \pdv{O}{\alpha}) \vv + (H - EO) \pdv{\vv}{\alpha} = \vo. \]

Now, we take the inner product of both sides of this equation with the original eigenvector $\vv$.

(4)
\[\vv^T \qty(\pdv{H}{\alpha} - \pdv{E}{\alpha} O - E \pdv{O}{\alpha}) \vv + \vv^T (H - EO) \pdv{\vv}{\alpha} = 0 \]

Observe that $\vv^T (H - EO) = \qty[(H - EO) \vv]^T = \vo^T$ by the original eigenvalue equation (1). Thus, $\vv^T (H - EO) \pdv{\vv}{\alpha} = \vo^T \pdv{\vv}{\alpha} = 0$, and only the first term of equation (4) remains.

(5)
\[\vv^T \qty(\pdv{H}{\alpha} - \pdv{E}{\alpha} O - E \pdv{O}{\alpha}) \vv + \cancelto{0}{\vv^T (H - EO) \pdv{\vv}{\alpha}} = 0 \]
(6)
\[\vv^T \qty(\pdv{H}{\alpha} - \pdv{E}{\alpha} O - E \pdv{O}{\alpha}) \vv = 0 \]

Now, by solving for $\pdv{E}{\alpha}$, we obtain the result

(7)
\[\boxed{\pdv{E}{\alpha} = \frac{\vv^T \pdv{H}{\alpha} \vv}{\vv^T O \vv} - E \frac{\vv^T \pdv{O}{\alpha} \vv}{\vv^T O \vv}.} \]

Remark: Many symmetric-definite eigen-solvers (for example, xSYGVX in LAPACK) follow the particularly convenient convention of returning eigenvectors $\vv$ normalized so that $\vv^T O \vv = 1$. In this case, a few CPU cycles can be saved by omitting division by $\vv^T O \vv$ from equation (7).

To obtain the corresponding formula for $\pdv{\vv}{\alpha}$, we return to equation (3), which can be rewritten as

(8)
\[(H - EO) \pdv{\vv}{\alpha} = -\qty(\pdv{H}{\alpha} - \pdv{E}{\alpha} O - E \pdv{O}{\alpha}) \vv. \]

It is tempting to multiply both sides of this equation by $(H - EO)^{-1}$, but the eigenvalue equation (1) guarantees that $\vv \in \ker(H-EO)$. Thus, the matrix $H - EO$, having nontrivial kernel, is explicitly singular. This means that equation (8) fails to uniquely specify the vector $\pdv{\vv}{\alpha}.$

However, we can still derive a meaningful result if we multiply not by the inverse $(H - EO)^{-1}$, but by the Moore-Penrose pseudoinverse $(H - EO)^+.$ In this case, we obtain the equation

(9)
\[(H - EO)^+ (H - EO) \pdv{\vv}{\alpha} = -(H - EO)^+ \qty(\pdv{H}{\alpha} - \pdv{E}{\alpha} O - E \pdv{O}{\alpha}) \vv \]

where $(H - EO)^+ (H - EO)$ is not the identity matrix, but the projection matrix onto the orthogonal complement of $\ker(H - EO).$

This has a clear geometric interpretation if we recall that an eigenvector is never uniquely determined. In the non-degenerate case, an eigenvector is only determined up to a nonzero multiplicative constant, and in general, a set of $k$ degenerate eigenvectors is determined by any choice of basis in the eigenspace $\ker(H - EO).$ Thus, any change in $\vv$ occurring inside $\ker(H - EO)$ is geometrically inconsequential. The meaningful change is that which occurs orthogonal to $\ker(H - EO),$ and this is precisely what is computed by the Moore-Penrose pseudoinverse.

We will therefore adopt the convention that the derivative $\pdv{\vv}{\alpha}$ should always be assumed orthogonal to the eigenspace $\ker(H - EO)$. With this convention in place, equation (9) reduces to the final result

(10)
\[\boxed{\pdv{\vv}{\alpha} = -(H - EO)^+ \qty(\pdv{H}{\alpha} - \pdv{E}{\alpha} O - E \pdv{O}{\alpha}) \vv.} \]

Remark: It is possible to evaluate the RHS of equation (10) without directly computing the pseudoinverse of $H-EO,$ thanks to the following fact: if $A$ is a rank-deficient matrix, then $A^+ b$ is the unique minimum (Euclidean) norm solution of the underdetermined linear system $Ax = b$. Minimum-norm solvers for undetermined linear systems are widely available (for example, xGELSD in LAPACK) and may be preferable depending on the efficiency of the provided implementation.

Suppose now that the matrices $H$ and $O$ depend smoothly upon two real parameters $\alpha$ and $\beta$. By differentiating equation (2) with respect to $\beta$, we see that

(11)
\[\begin{aligned} \pdv{H}{\alpha}{\beta} & \vv + \pdv{H}{\alpha} \pdv{\vv}{\beta} + \pdv{H}{\beta} \pdv{\vv}{\alpha} + H \pdv{\vv}{\alpha}{\beta} \\ &= \pdv{E}{\alpha}{\beta} O \vv + \pdv{E}{\alpha} \pdv{O}{\beta} \vv + \pdv{E}{\alpha} O \pdv{\vv}{\beta} \\ &\pe + \pdv{E}{\beta} \pdv{O}{\alpha} \vv + E \pdv{O}{\alpha}{\beta} \vv + E \pdv{O}{\alpha} \pdv{\vv}{\beta} \\ &\pe + \pdv{E}{\beta} O \pdv{\vv}{\alpha} + E \pdv{O}{\beta} \pdv{\vv}{\alpha} + EO \pdv{\vv}{\alpha}{\beta}. \end{aligned} \]

As before, we take the inner product of both sides with $\vv$ to obtain

(12)
\[\begin{aligned} \vv^T \pdv{H}{\alpha}{\beta} & \vv + \vv^T \pdv{H}{\alpha} \pdv{\vv}{\beta} + \vv^T \pdv{H}{\beta} \pdv{\vv}{\alpha} + \vv^T H \pdv{\vv}{\alpha}{\beta} \\ &= \pdv{E}{\alpha}{\beta} \vv^T O \vv + \pdv{E}{\alpha} \vv^T \pdv{O}{\beta} \vv + \pdv{E}{\alpha} \vv^T O \pdv{\vv}{\beta} \\ &\pe + \pdv{E}{\beta} \vv^T \pdv{O}{\alpha} \vv + E \vv^T \pdv{O}{\alpha}{\beta} \vv + E \vv^T \pdv{O}{\alpha} \pdv{\vv}{\beta} \\ &\pe + \pdv{E}{\beta} \vv^T O \pdv{\vv}{\alpha} + E \vv^T \pdv{O}{\beta} \pdv{\vv}{\alpha} + E \vv^T O \pdv{\vv}{\alpha}{\beta} \end{aligned} \]

and use the original eigenvalue equation to eliminate a pair of terms:

(13)
\[\begin{aligned} \vv^T \pdv{H}{\alpha}{\beta} & \vv + \vv^T \pdv{H}{\alpha} \pdv{\vv}{\beta} + \vv^T \pdv{H}{\beta} \pdv{\vv}{\alpha} + \cancel{\vv^T H \pdv{\vv}{\alpha}{\beta}} \\ &= \pdv{E}{\alpha}{\beta} \vv^T O \vv + \pdv{E}{\alpha} \vv^T \pdv{O}{\beta} \vv + \pdv{E}{\alpha} \vv^T O \pdv{\vv}{\beta} \\ &\pe + \pdv{E}{\beta} \vv^T \pdv{O}{\alpha} \vv + E \vv^T \pdv{O}{\alpha}{\beta} \vv + E \vv^T \pdv{O}{\alpha} \pdv{\vv}{\beta} \\ &\pe + \pdv{E}{\beta} \vv^T O \pdv{\vv}{\alpha} + E \vv^T \pdv{O}{\beta} \pdv{\vv}{\alpha} + \cancel{E \vv^T O \pdv{\vv}{\alpha}{\beta}} \end{aligned} \]

By solving for $\pdv{E}{\alpha}{\beta}$, we obtain the result

(14)
\[\boxed{ \begin{aligned} \pdv{E}{\alpha}{\beta} &= \frac{1}{\vv^T O \vv} \bigg[ \vv^T \pdv{H}{\alpha}{\beta} \vv + \vv^T \pdv{H}{\alpha} \pdv{\vv}{\beta} + \vv^T \pdv{H}{\beta} \pdv{\vv}{\alpha} \\ &\pe - \pdv{E}{\alpha} \vv^T \pdv{O}{\beta} \vv - \pdv{E}{\alpha} \vv^T O \pdv{\vv}{\beta} - \pdv{E}{\beta} \vv^T \pdv{O}{\alpha} \vv - \pdv{E}{\beta} \vv^T O \pdv{\vv}{\alpha} \\ &\pe - E \vv^T \pdv{O}{\alpha}{\beta} \vv - E \vv^T \pdv{O}{\alpha} \pdv{\vv}{\beta} - E \vv^T \pdv{O}{\beta} \pdv{\vv}{\alpha} \bigg]. \end{aligned} } \]

With the second derivative of the eigenvalue in hand, we return to equation (11) and isolate terms containing $\pdv{\vv}{\alpha}{\beta}$ to obtain

(15)
\[\begin{aligned} (H - EO) \pdv{\vv}{\alpha}{\beta} &= - \pdv{H}{\alpha}{\beta} \vv - \pdv{H}{\alpha} \pdv{\vv}{\beta} - \pdv{H}{\beta} \pdv{\vv}{\alpha} \\ &\pe + \pdv{E}{\alpha}{\beta} O \vv + \pdv{E}{\alpha} \pdv{O}{\beta} \vv + \pdv{E}{\alpha} O \pdv{\vv}{\beta} + \pdv{E}{\beta} \pdv{O}{\alpha} \vv \\ &\pe + E \pdv{O}{\alpha}{\beta} \vv + E \pdv{O}{\alpha} \pdv{\vv}{\beta} + \pdv{E}{\beta} O \pdv{\vv}{\alpha} + E \pdv{O}{\beta} \pdv{\vv}{\alpha}. \end{aligned} \]

Applying the same technique as before, we multiply both sides by the pseudoinverse $(H - EO)^+$ to obtain the result

(16)
\[\boxed{ \begin{aligned} \pdv{\vv}{\alpha}{\beta} &= (H - EO)^+ \bigg[ - \pdv{H}{\alpha}{\beta} \vv - \pdv{H}{\alpha} \pdv{\vv}{\beta} - \pdv{H}{\beta} \pdv{\vv}{\alpha} \\ &\pe + \pdv{E}{\alpha}{\beta} O \vv + \pdv{E}{\alpha} \pdv{O}{\beta} \vv + \pdv{E}{\alpha} O \pdv{\vv}{\beta} + \pdv{E}{\beta} \pdv{O}{\alpha} \vv \\ &\pe + E \pdv{O}{\alpha}{\beta} \vv + E \pdv{O}{\alpha} \pdv{\vv}{\beta} + \pdv{E}{\beta} O \pdv{\vv}{\alpha} + E \pdv{O}{\beta} \pdv{\vv}{\alpha} \bigg]. \end{aligned} } \]

In the special case $\alpha = \beta,$ these results reduce to the following:

(17)
\[\boxed{ \begin{aligned} \pdv[2]{E}{\alpha} &= \frac{1}{\vv^T O \vv} \bigg[ \vv^T \pdv[2]{H}{\alpha} \vv + 2 \vv^T \pdv{H}{\alpha} \pdv{\vv}{\alpha} - E \vv^T \pdv[2]{O}{\alpha} \vv \\ &\pe - 2 \pdv{E}{\alpha} \vv^T \pdv{O}{\alpha} \vv - 2 \pdv{E}{\alpha} \vv^T O \pdv{\vv}{\alpha} - 2 E \vv^T \pdv{O}{\alpha} \pdv{\vv}{\alpha} \bigg] \\ \pdv[2]{\vv}{\alpha} &= (H - EO)^+ \bigg[ - \pdv[2]{H}{\alpha} \vv - 2 \pdv{H}{\alpha} \pdv{\vv}{\alpha} + \pdv[2]{E}{\alpha} O \vv \\ &\pe + E \pdv[2]{O}{\alpha} \vv + 2 \pdv{E}{\alpha} \pdv{O}{\alpha} \vv + 2 \pdv{E}{\alpha} O \pdv{\vv}{\alpha} + 2 E \pdv{O}{\alpha} \pdv{\vv}{\alpha} \bigg] \end{aligned} } \]