Theory
Recall that the goal of refinement is to lower the crystallographic residual while maintaining reasonable protein geometry. The R-factor decreases as refinement proceeds, owing to the similarity between the residual (line 1 of Eq. 12.1) and the R-factor (Eq. 12.2). Crystallographers (particularly advisors) sometimes worship the R-factor: the lower the better. But, is a lower R-factor always an indication of a higher quality structure?
The answer is no. In fact, one can force the R-factor to be as low as desired by any number of underhanded schemes: increasing the value ofWA, not enforcing non-crystallograhic symmetry during refinement, adding more and more solvent atoms to the model, using anisotropic B-factors, and refining occupancies. These schemes are acceptable only if you have high enough resolution data. If you don't have the data, you might be guilty of "overfitting your data". You might be lowering the R-factor, but the model is not getting any better.
We can avoid overfitting the data by monitoring the cross
validation R-factor, which is sometimes called the free-R.
I won't say much about the theory of cross validation. I will just
point you to a few MUST READ papers. MUST READ means you MUST READ
these papers or risk being embarrassed during group meetings or at conferences.
Worse still, you might be accosted by the Rfree police.
Practice
Required files and constants:
Exercises