If it's your day job to push proteins in silico then you will one day have to calculate the RMSD of a protein. For example, you've just simulated the protein GinormousA for a whole micro-second, but you don't even know if GinormousA is stable. Sure, you could load up a protein viewer and eyeball the stability of the protein throughout the simulation. But eye-balling is just not going to cut it with your boss -- you need to slap a number on it. This is where RMSD comes in. RMSD is a quantitative measurement of difference between two atomic structures.
A molecule's conformation is basically a set of 3-dimensional coordinates. This means that a conformation is a set of vectors, , where each has three components. As the molecular conformation changes, therefore, the result is a new set of vectors . We can capture the the differences between conformations using these two sets of vectors.
To make life easier, we first shift the center of mass of and to the origin of the coordinate system (we can always shift them back afterwards). Once "centered," the differences between the vector sets can be captured by the total squared distance between them:
The root mean-square deviation (RMSD) is then:
But hola, you say...this mean-square measure doesn't measure similarity very well! After all, any rotation of the set of (which doesn't change the internal arrangement of ), would distort the RMSD! What we really need, then, is to find the best rotation of with respect to before taking the RMSD. To rotate the vectors , we apply a rotation matrix to get . We can then express as a function of the rotation :
Thus, to find the RMSD, we must first find the rotation that minimizes -- the optimal superposition of the two structures.
First, we expand (which is now a function of ):
The first part, , is invariant with respect to a rotation . The variation resides in the last term, . The goal, therefore, is to find the matrix that gives the largest possible value of , from which we obtain:
where is a matrix and is an matrix. Juggling the matrices inside the trace, we get:
Now, we still don't know what is, but we can study the matrix , which is the correlation matrix between and . Furthermore, by invoking the powerful Singular Value Decomposition theorem, we know that we can always write R as:
and obtain the matrices and (which are orthogonal), and (which is positive diagonal, with elements ). We can then substitute these matrices into the expression for :
Because is diagonal, we can get a simple expression for in terms of the :
Given this, the largest possible value for any element of is 1, or . And since is a product of terms linear in the diagonal elements of , the maximum value of occurs when the :
Plugging this into the equation for RMSD, we get
From this analysis, we can also determine the rotation that yields the optimal superposition of the vectors. When , then , the identity matrix. This immediately gives us an equation for the optimal rotation in terms of the left and right orthogonal vectors of the matrix R:
we can see that the singular value term of is . In other words, the singular values of () are related to the singular values of () by the relationship . And also, the right singular vectors are the same for both and .
Furthermore, since is a symmetric real matrix, its eigenvalues are identical to its singular values . Thus, we can use any standard method (e.g. the Jacobi transformation) to find the eigenvalues of , and by substituting into the RMSD formula:
When does this happen? To answer this question, take a look at your hands. Assuming that you're a primate, you have two of them, both alike, but reflected across the plane of symmetry that runs down the center of your body. Now, try to superimpose your hands. Go ahead. Try. If you're observant, you'll notice that you simply can't superimpose your hands. Even if you cut them off of your arms, the best you can hope to do is to align them so that their outlines are coincident -- but the palm of one hand will still be touching the other. You can never perfectly superimpose your hands, because your hands are chiral.
The same thing happens in molecular systems -- molecules are frequently chiral, and when two molecules are related by chirality, it sometimes happens that their optimal superposition involves a rotation, followed by a reflection. Thus, the "improper" rotation. Unfortunately, a reflection in will result in an incorrect RMSD value using the stated algorithm, so we need to detect this situation before it becomes a problem.
To restrict the solutions of to proper rotations, we check the determinants of and :
If then there is a reflection in , and we must eliminate the reflection. The easiest way to do this is to reflect the smallest principal axis in , by reversing the sign of the third (smallest) eignevalue, which yields the final Kabsch formula for RMSD in terms of and :
In addition to these methods, a get_trivial_rmsd() function is provided to calculate the RMSD value between two structures without attempting to superimpose their atoms.