Refinement: Getting Started

Theory

Refinement is an iterative process in which the atomic model is modified, structure factor amplitudes are calculated from the modified model, and the agreement between these calculated structure factor amplitudes (|Fc|) and the experimental or observed ones (|Fobs|) is determined. The goal is to find the model that produces the best agreement between |Fobs| and |Fc|.

It is useful to think of refinement as the problem of finding the minimum of a function that mathematically expresses the agreement between |Fobs| and |Fc|. This function is called a target function, or Exref. A commonly used target function is the crystallographic residual: SUM {(|Fobs| - k|Fc|)2}, where the sum runs over all the reflections in your data set, and k is a scale factor needed to put the Fc on the same scale as the Fobs.

A model consists typically of five parameters for each atom: x,y,z, B, and Q. The triplet (x,y,z) specifies the position of each atom in an orthogonal coordinate system. B is the B-factor or temperature factor of each atom, and it is related to the thermal motion of the atom. But, beware that the B-factor can also contain information about other types of "disorder" including errors that you — the crystallographer — made while constructing and refining your model. Q is the occupancy and it is the fraction of time that the atom spends at position (x,y,z). Typically, Q=1.  If one has data better than about 1.8 Å, then occupancies between zero and one are sometimes used.  We won't worry too much about occupancies except to say two things.  First, setting Q=0 removes an atom's contribution to the calculated structure factors.  Verify this fact by examining Eq. 12.9.  Second, some files from the Protein Data Bank conain atoms with Q=0.  This is an indication that the crystallograher did not know where to place the atom, probably because of poor electron density in that volume of the structure.  For completeness, the atoms were included in the model but the user should beware that the (x,y,z) position of any atom with Q=0 is probably highly speculative.

For this module, we'll be concerned with two types of crystallographic refinement:  rigid body and overall B-factor.   In rigid body refinement, big sections of the protein, such as subunits, move as rigid bodies.  In the simplest case, the entire protein is treated as one rigid body, which results in  6 degrees of freedom.  GAPDH is a homotetramer, so a natural rigid body scheme would be to treat each subunit as a separate body.  Rigid body refinement is useful in the early stages of structure determination and it is usually done with low resolution data (15-3Å).  Overall B-factor refinement is useful for finding a ballpark esimate of the average B-factor of your structure.  Like rigid body refinement, it is done at the early stages of the refinement process.  Refinement of the atomic B-factors is a bit tricky and we will reserve it for the latter stages of refinement.
 

Practice

Required files and constants:

  1. X-plor coordinate file (.x)
  2. X-plor protein structure file (.psf)
  3. Diffraction data file for cross validation that contains h, k, l, Fobs, sF, test (.cv)
  4. Unit cell information: a,b,c,a,b,g, space group
  5. WA value (usual range: 100,000 — 300,000)
  6. File containing non-crystallographic symmetry.
  7. File containing atoms to be omitted from model during refinement.
  8. X-plor refinement input file (refi.inp)
How to run the refinement program, X-plor
 
Add these aliases to your .cshrc file:

alias xplor '/du/xplor/object_library/xplor_small_dxml.exe'

alias xplorbig '/du/bin/xploron3851_alpha_osf_exe'

The small version of xplor is dimensioned for 20000 atoms; the big version accommodates 96000 atoms. Fill in the question marks (???) in refi.inp and select the desired refinement options. Read refi.inp and make sure that you know what each line is doing. You will need the X-plor manual, which is available online or in hard copy.

To run X-plor, type

xplor < refi.inp > refi.out &

Hint: type this to see whether errors have occurred:

grep ERR refi.out

Exercises
  1. What is the completeness of your data set according to X-plor.  Does it agree with Scalepack's estimate?  If not, why not?
  2. Calculate the R-factor by setting all the refinement options to zero.
  3. Change the space group to something ridiculous like P6(5)22 and calculate the R-factor.
  4. Do overall Bfactor refinement.  Does the overall B go up or down?  Why?
  5. Make an omit file that omits the 8 NAD molecules.  Recalculate R.
  6. Make an omit file that omits the ABCD tetramer.  Recalculate R.
  7. Do rigid body refinement.  Does it improve the model?
  8. Modify refi.inp so that all 8 subunits are treated as rigid bodies and repeat rigid body refinement.