Quick start instructions (the short
version):
The purpose of this guide is to aid users in using AutoLink
to automatically obtain backbone resonance assignments as easily as possible. Just
download the program from this website and unpack it. From CARA’s terminal
window select run from file and indicate the AutoLink executable in the
dialogue box. For most cases, AutoLink’s controls will start with reasonable
settings, so you can just activate the main process by pressing the AutoLink button. The program will run
for anywhere between a few minutes and a few hours (possibly days for unusual
projects and settings). When it has reached completion, it will display its
results in its main program window. Red is bad, yellow good. Dark blue
indicates a poor sequence match, but red means even worse. The blue and green
line in the fragment display corresponds to Ca delta deltas. Green indicates
higher probably for a-helices, blue for b-sheets, and in between for loops, coil, etc.
Right click functions on the fragment display will give more
specific information. Note: AutoLink is driven by “relative certainty”. This
means that the display will have red markings for spin system links that either
don’t match very well, or, even if the spin systems do match, if there is not
enough data to be relatively certain of the spin system link (other reasonable
alternative links exist).
For first-time AutoLink users: it is recommended that you
just click on the AutoLink button and start the main process as soon as your
CARA repository is ready. You can familiarize yourself with the information in
this manual while the program runs, if you like, but the best way to learn how
to use AutoLink is to allow it to create some output. You can then just inspect
the output (which will probably be somewhat intuitive to you) with the right
click functions and get a “feel” for what the output indicators mean. Remember
“Red=dead”. In other words, if you see red in your output, it means that
AutoLink is not very certain about that part of the assignments/spin system
links. The spin systems involved are probably the ones you need to inspect in
CARA to verify that the spins are picked properly. If the relative certainty is
low, you may want to consider acquiring a bit more data to help in assigning
your molecule.
A final note to all users: AutoLink will consider all
assignments provided by it user in the CARA repository as unchangeable, so it’s
a good idea to unassign any spin systems that you are not sure of before
running the program.
Good luck and remember—you can always just e-mail for
support if you have any questions or suggestions (mailto:jmasse@mol.biol.ethz.ch?subject=AutoLink
question).
Getting started (the long version
for the really dedicated):
In order to use AutoLink, one must
first download and unpack CARA. Then click on the link on the preceding page and download
the AutoLink binary to a file. Once the user has obtained both programs,
AutoLink can be executed from CARA. Just go to CARA’s “terminal window”, right
click in the script window, and select “Run from File”. This will bring up a
dialog box which will allow you to select the appropriate file for execution.
In order for AutoLink to function, the user must have a
properly defined CARA repository with pre-defined “residue types” and
previously created (though unlinked and unassigned) “spin systems” and “spins”.
The simplest way to obtain a proper CARA repository is to simply download an
appropriate template from the CARA web site, and read in a sequence file that
contains the sequence of the molecule of interest (see the CARA user’s manual
for details). Creation of spin systems and spins can either be done by hand,
which for most NMR assignment problems can be done fairly quickly, or by a
downloading peak/spin system-picking script from the CARA website and executing
it.
Once the spin systems and spins have been created in CARA,
the user is ready for AutoLink. AutoLink’s primary goal is to link the CARA spin systems, much as the user would by an semi-automated
approach. By doing so, AutoLink builds spin system fragments which it then
compares to empirically-derived chemical shift values in order to assign the
fragments to specific residues within the macromolecule.
Like CARA, AutoLink does not have any special requirements
as to what type of data must be provided. It can work from any set of NMR
spectra. The program uses an RHP-(Relative-Hypothesis-Prioritization)-based
approach to decision making. This means that the program can determine the
“uniqueness” of the spin system links and assignments it makes. It is designed
to avoid over-assigning molecules where the data may not be sufficient to
determine a solution. This means that the program will avoid making assignments
that cannot be known with relative certainty within the context of the data it
is given. Thus AutoLink works best with data sets where more data is available.
The program can also work with restricted amounts of data, however, for which
it will determine what parts of the sequence can be
determined. The undeterminable parts of the sequence will remain unassigned.
AutoLink is generally capable of returning good results on minimal spectra. In
fact it is theoretically possible to assign a molecule with no sequential data
at all, relying on the observed chemical shifts (from a 15N-HSQC,
for example) and nearest-neighbor analysis to determine assignments. This is
only likely to return a reasonable result, however, for very short sequences
(<20 residues?) which contains no repetitive elements.
In general, for most molecules of <150 residues, three
spectra are sufficient to obtain a reasonable set of assignments. Typically,
for proteins, these spectra are a 15N-NOESy-HMQC, an HNCA, and
either a CBCA(CO)NH or an HNCACB.
AutoLink is capable of making use of both short-range and
medium-range NOE data, as well as supplied secondary structure information
(i.e. from sequence alignment to a molecule of known structure, or
sequence-based secondary structure prediction algorithms).
Use of AutoLink is through its graphical interface, which
makes use of strategic color-coding to aid both in operating the program and
interpreting its output. Once assignments have been obtained, it generally only
takes a few minutes to assess their quality.
The remaining instructions below describe the controls and
functions of AutoLink in some detail. This manual is not comprehensive as it is
still under construction. However, from skimming through it the user can get a
general idea of the program’s capabilities, and how to best apply it to
overcome problems with overlap in the input data, work with data where some
spectra contain a more complete set of resonance than others, and compensate
for other common problems in NMR data. AutoLink can be classed as an “expert
system” type of program which is best used, as usual, by an “expert user”,
which one can become with only a little practice. It should be noted that the
control settings that are already present in the downloaded binary have been
set to pre-optimized defaults which are good for most NMR problems of <150
residues, so even the inexperienced user can get off to a running start.
As a general rule of thumb, most sequences of <150
residues can be assigned by AutoLink in a few hours, if all of the spin systems
have been identified correctly or the program has been given sufficient
instructions to compensate for possible errors. Longer sequences, or more complex
analytical procedures (such as medium-range NOE usage) can cause the program to
take considerably longer to run, especially in cases where the amount of
corroborating input data is low (i.e. assigning a molecule solely from NOE
data). Work is in progress to increase AutoLink’s speed through the use of
multiple coordinated copies of the program working together on a network.
About this guide:
This guide is crude and still under construction. The
following information is not comprehensive and will be updated later. Still,
this page contains enough information to aid the user in general applications
of the program. A more complete description of the algorithm and functions of
AutoLink will soon be published. Please feel free to contact us if you have any
questions regarding the use of this program—we’re here to help ((mailto:jmasse@mol.biol.ethz.ch?subject=AutoLink
question).).
Program Guide:
This program is for the correlation of related spin systems
across several spectra and linking of spin systems in to assigned fragments. In
order to use it, the spin systems and spins in each spectrum should be
previously picked.
Assignment
of spin systems proceeds in two basic steps. The first step involves comparison
of spin systems to determine the relative probability that they are adjacent in
the protein sequence. A pair of potentially linked spin systems is called a
spin system “link hypotheses”. In general, spin systems with more peaks in
common score higher than those with fewer peaks.
The second
step of the assignment process involves assessing how compatible each link
hypothesis is with the protein sequence. As a general organizational feature,
the spectrum-based link hypotheses scoring controls are on the left in the
AutoLink control window, while the right window contains controls more
concerned with sequence comparison and non-spectrum-based biasing parameters.
Scoring of Link Hypotheses:
Each spectrum (or atom type) can be scored separately. The
individual spin system pair scores from each spectrum are then combined to get
an overall score. The output from the individual spectrum scores and the
overall scores are printed to files. After the program has been run these files
will be found in the same directory as the CARA repository file in files called
"*.out". Which file contains which output will be obvious.
The output files contain lists of spin systems and their relative scores. The
lists are sorted such that the best matches are at the top of the list. Use of
the lists to aid assigning is simple. You can just start at the top of the list
(the TOTAL list is the best start point) and inspect this spin system to see if
it is a match.
The following sub-sections below describe parameter settings
that affect the evaluation and relative importance of each spectrum or atom
type for spin system pair (link hypothesis) scoring.
xpeaks (on/off):
This parameter controls whether the relevant scoring function is used in the
analysis.
spectral density bias: During the comparison of spin systems within the spectrum, it is
possible to weigh against spins that commonly occur in many spin systems, and
weigh in favor of spin systems that are rare. Rare peaks are often more
significant in considering spin system links. The range of acceptable values is
from 0 to 1. 0 means that no spectral density compensation is used. A 1 means that a "perfect" spectral density compensation is
used, exactly proportional to the spectral densities. A value
between 0 and 1 means that partial compensation will be used, proportional to
the spectral density, but weighted by the input value. This function is
particularly useful for high-density spectra such as noesys.
Spin-spin function: The comparison of peaks within
spectra is governed by three components: peak match factor, peak match
exponent, and peak match slope: The scoring function is:
score=a*((a-(a*chemical_shift_delta)^b))/a)-(c-1)
where
a=peak match factor, b=peak match exponent, c=peak match slope and
chemical_shift_delta=the difference in ppm of the two spins being considered.
The parameters can be set either by selecting one of the pre-set functions in
the spin-spin control box, or selecting 0 and setting the individual components
yourself. The following is a list of the preset function values and the
corresponding settings for peak match factor, peak match exponent, and peak
match slope.
nucleus |
spin-spin function |
fact |
exp |
slope |
|
M |
|
1H |
1 |
|
100 |
4 |
1 |
ß |
o |
1H |
2 |
|
400 |
1.9 |
1 |
|
s |
1H |
3 |
|
1 |
1 |
20 |
ß |
t |
1H |
4 |
|
1 |
1 |
0.9 |
|
|
13C |
1 |
|
120 |
2 |
1 |
|
o |
13C |
2 |
|
10 |
3 |
1 |
ß |
f |
13C |
3 |
|
2 |
4 |
1 |
|
t |
13C |
4 |
|
3 |
1.5 |
1 |
|
e |
13C |
5 |
|
7 |
20 |
20 |
|
n |
13C |
6 |
|
1 |
1 |
12 |
|
|
13C |
7 |
|
1 |
1 |
8 |
ß |
u |
13C |
8 |
|
1 |
1 |
4 |
ß |
s |
13C |
9 |
|
1 |
1 |
2 |
|
e |
13C |
10 |
|
1 |
1 |
1.5 |
|
d |
Score vs. ppm delta curves are plotted in the main viewer
window whenever a parameter is changed, so selection/design of an appropriate
matching function is straightforward. Testing has shown that the most useful
settings appear to be setting 1 for 1Hs and 8 for 13Cs.
offset bias:
If this parameter is set to 1, only spins whose offset (ie the "-1"
in CA-1) will be considered. In other words, spin systems that have a match of
HAàHA-1 or HAàHB-1 will score positively, while
spin systems that have a HAàHA or HA-1àHA-1
willscore poorly. If this is set to 0, then the offset is ignored and all of
the above examples would score the same, if their chemical shifts matched
equally. Values between 0 and 1 cause proportional weighting.
Set this closer to 1 for spectra or atom types where the offsets are certain
(for example when an HN(CO)CA spectrum has been used
to identify the CA-1 peaks in an HNCA spectrum).
Atom label bias: If this parameter is set to 1,
only spins matching in atom type will be considered (ie: Cas with Cas reguardless of offset, and Cbs with Cbs, also regardless of offset). A 0
setting will cause the atom type to be irrelevant in the scoring. This is
useful for spectra such as HNCACB where the Cas and the Cbs have different signs, and can be
easily separated.
Labeled spin bias: When picking peaks, it is
sometimes impossible to be sure which spin is, for example Ca and which is Ca-1. Therefore, it is possible to
leave the picked spin unmarked. If this parameter is set to 0, all spin system
matching will be independent of atom label. This is especially important for
noesys, where the peaks are rarely assigned before the sequential assignments
are finished. A 1 will cause all spins for which there is no label to be
ignored.
Minimum fitness threshold: All scores in a given spectrum for
each spin system that are below this value are rounded to 0. This is just for
convenience. Useful values are between 0 and 1, since all individual spectrum
scores are scaled to that range. A higher setting will result in using less
computer memory, shorter output lists, and faster calculations. It also may
mean a loss in accuracy if it is set too high.
Previous run file: A previous output file can be used
as a basic interface to give very specific commands to AutoLink. In order to
use an previous output file as an input file, the file
must simply be renamed to “previous_run.inp”. This input file can be used to
make notes for yourself while assigning your molecule which will be propagated
through successive rounds of assignment into future output files. Comments in
the data file are expected to consist of a "#" character followed by
anything up to the end of the line. Note that the comments can also contain
commands for the AutoLink algorythm. Four commands are currently supported:
"<link>", "<unlink>", "<exclude_1>",
and "<exclude_2>". The <link> and <unlink> commands
link and unlink the spin system pair on whose line they occur in the data, and
force AutoLink not to change their status. AutoLink can also be told to not
change any links already established at the start of the algorythm (see below).
The <exclude_1> and <exclude_2> commands force one spin system of
the pair on the line to be excluded from possible linking/unlinking. The
<exclude_1> command affects the first spin systems ability to change its
successor link or link status. Likewise, the <exclude_2> command affects
the ability of the second spin system to change its predecessor link or link
status. These commands are generally useful for excluding spin systems from
linking if they may be considered artifacts in the spectrum rather than actual spin
systems.
score combination equation: This equation defines how each of the individual spin
pairing scores are combined to form the total. Each score type is addressed by
its common name. For example, the equation CA*CB defines the total overall score
for any given spin system pair as the score from compairing the Cas times the score from comparing the
Cbs. The
individual scores are pre-standardized before combining them, so each score
type ranges from 0 to 1. This is important when balancing the relative
importance of each score type. For example, if one wanted to add 1H
data to Cas and Cbs, but wanted the Cas to be weighted twice as heavily as
the other two, the equation "H+CB+2*CA" would suffice. This type of
scoring combination allows a great deal of flexibility in
"fuzzy-logical" spin system pair scoring. A "+" is
comparable to a "fuzzy-OR" function. A "*" functions as a
"fuzzy-AND" operation. The equation can be as complex as necessary,
using parenthesis (though {, [, and ( are all
syntactically synonymous but can be used for clarity to the user).
Exponentiation is symbolized by either the "**" or the "^"
operators. Numbers preceding ('s are assumed to be coefficients of whatever is
in the parenthesis. To aid in complex fuzzy score combinations, three new
operators have been defined, "&&", "||", and
"&|". || is defined to be the average of the two operands (i.e.
1||0.5=0.75). This can be used as a quasi-OR whose product maintains the
original value ranges of the operands. && is defined by the equation
x&&y=(x*y)^0.5. This is a quasi-AND which also
maintains the original value range of the operands. &| is a quasi AND/OR
function defined by x&|y=((x+y)/2+(x*y)^0.5)/2.
This function is half-way between a quasi-AND and a quasi-OR and is remarkably
useful for analyzing spin system pairs. For example, the equation
(H&|CB)*CA could be interpreted as "requiring >0 score in the noesy
or >0 score from the Cbs, much better if both are >0, but still marginally OK if just one of
the two is >0, but still requiring a >0 score in the Cas." Of course all of the
operators give gradual results, with better scoring individual scores leading
to higher overall scores.
It is often best to start out with very a stringent equation
at first, using stringent parameters for the individual pairing scores as well,
and then loosening things up later on, maintaining the spin-system links
obtained from stringent rounds of autolinking during the later less stringent
rounds.
Non-spectrum-based Biasing controls:
This
section is primarily concerned with the control parameters of AutoLink that are
involved in both spin-system-to-sequence comparison and also with those that
control the rhp-based “decision-making” process.
Hold existing links: If this is on, then AutoLink will
not change any of the links currently existing at the time it is started. It
will instead try to finish the assignments in consideration of the current
links. This is useful if the user already has some links formed, either from
previous AutoLink runs or by manual analysis which are considered certain. If
the user has some links but is not absolutely sure they are correct, then this
parameter is best set to off. In this case, AutoLink will still try to assign
the molecule with the current assignments in mind, but will be free to change
the starting links if the need arises.
It is important at this point to highlight the difference in
AutoLink’s behavior toward pre-defined links and pre-defined assignments. As
described in the previous paragraph, AutoLink can be either directed to
consider previously existing spin system links as automatically correct, or
alternatively to consider them only as suggestions. This is in contrast to the
way AutoLink views pre-existing residue assignments. AutoLink will always
consider all existing assignments as 100% correct. This is because assignment
is the goal of the program. If the user is uncertain of any existing
assignments, it is best to remove these assignments prior to running the
program. In fact there is little purpose for most applications in including any
pre-existing assignments, since AutoLink will automatically reassign all of the
spin systems that can be unambiguously assigned at the user-designated points
in its analysis (see “Assigned Positions” below).
Max. # num
of rounds: This
is the maximum number of AutoLink cycles that will be executed. A 0 value means
that there is no maximum. Note that AutoLink may stop at an earlier number of
rounds if it can't make any more progress with the current settings. In
general, it’s a good idea to set this to 0 and let the program self-terminate.
This is because only at the end of the process is AutoLink relatively certain
of it’s links.
# links/round: AutoLink works by doing a sort of
"cha-cha". It accepts a certain maximum number of new links per
cycle. It then removes some of the links. How many it removes depends on these
control settings and how many links were formed in the same round (The program
will try to avoid having a net loss of links in any given round). # links/round
is the maximum number of links that will be formed per round. New links may
require that old links be broken, however, so it is possible for fewer “net”
links to be made.
# unlinks/round: This is the maximum number of
backward steps (unlinks) per round AutoLink will take. Fewer unlinks may occur
in order to insure forward progress for any given round.
link threshold:
This is the relative fitness score that a spin system link must match or beat in
order to be considered. Lower values will cause the program to take longer to
run, while higher values may restrict the number of links that can be made.
Unlink threshold: This is the relative fitness score
that a spin system link must be below in order to be considered for unlinking.
Note that a spin system link that is above the threshold can still be broken if
it is necessary to do so in order to form an even more favorable link.
Score delta bias: Autolink is capable of evaluating
the relative importance of hypothetical links. A spin system may only have one
reasonable partner for a link, while another spin system link in absolute terms
may appear more favorable. If score_delta_bias is set closer to 1, AutoLink
will tend to give precedence to spin system links that are relatively fit
compared to other spin system links involving the same spin systems and other
spin systems. In general it is a good idea to always set this parameter to 1
(fully on). Testing shows that it just makes the program “smarter”.
Random factor bias: If this value is set >0 then a
random factor will be added to the evaluation of the fitness scores to
determine links and unlinks. The relative importance of the random factor is
controlled by this factor. Values closer to 0 will result in less importance of
the random factor. Values closer to 1 will result in more importance of the
random factor. The magnitude and the distribution of the random factor is controlled by the following factors:
center_weighting_factor, number_of_centered_random_numbers,
off_center_weighting_factor, number_of_off_center_random_numbers, exponent,
max_amplitude. Currently these parameters have been preset to reasonable
values. So far, for the nmr assignment problems we
have tested, random factor bias has been un-necessary. It may be useful in
order to use stochastic effects to speed up the processing of larger problems
in the future. Currently it is recommended that it be left set to 0 (fully
off), or at least close to 0 (0.01).
juggernaut_mode: This is a binary switch (off or
on). If juggernaut_mode is turned on, then AutoLink will be forced to make a
new link each round, if possible, even at the expense of breaking several other
links. This mode is only used to experiment with new link hypothesis in rare cases
when The user believes that something is wrong with
the current assignments and wants AutoLink to suggest something new. It is
highly experimental and the recommended setting for most applications is off.
Assigned positions: This control affects the way
AutoLink handles spin system fragment to protein sequence matching involving
spin systems and residues that have been assigned during previous AutoLink
rounds. The program can be instructed to consider already existing assignments
at three stages—when it attempts to make new links, when it considers which
links to break, and/or after all linking/unlinking cycles have been completed.
Testing shows that letting AutoLink consider assignments at all three stages is
best. The program will generally run faster if it is told not to consider
assignments during the cycles (the first two points listed above), but may be
less accurate for assignment problems with much ambiguity. If AutoLink is
directed to consider assignments at the end of the cycles, it will assign
whichever fragments can be unambiguously assigned before terminating.
Otherwise, the fragments will be left unassigned.
Link repeat bias: This control is 1- the “link
repeat penalty” described in the paper. The control setting has been reversed merely
to make it more consistent with the other control biases (0=off, 1=on). Link
repeat penalty is a factor which affects repeated attempts to form a link
involving the same spin system. AutoLink remembers how many times a spin system
is linked (both as a predecessor and as a successor). All link hypothesis are
biased by multiplication by link_repeat_penalty^(number
of repeated attempts to find match). This is important for avoiding getting
stuck in a loop. Sometimes once a link is formed, it
causes a later re-evaluation of its own score such that it becomes disfavored.
Then a new link forms that again makes the original link look OK again.
Sometimes a cycle forms where the same set of links are repeatedly accepted and
rejected. This is where link repeat penalty is especially important because it
forces AutoLink to break the cycle and consider other links. The links involved
in the cycle are then re-evaluated in the context of the new links. This
process is repeated until either the new links determine which of the links
involved in the cycle or until all of the links involved in the cycle are
disfavored below the link threshold, in which case, none of the links will be
accepted, being regarded as indeterminate. The bottom line is that AutoLink
"saves for later" ambiguous link hypotheses until they are no longer
ambiguous. For this reason, it is important that link_repeat_penalty be set to
a value <1 (link repeat bias >0). The best values are those that are
nearly 1, but not quite (i.e. 0.90-0.99) (link repeat bias 0.01-0.1). Values in
this range are sufficiently "not 1" enough to prevent perpetual
cycling, but not enough of a penalty to significantly effect non-cyclic
hypothesis re-evaluation.
AutoLink log file: This is
the name of the file where AutoLink will report what it is doing during the
linking/unlinking cycles. See below for some notes about the output. The
progress of the program can be evaluated by viewing this file.
Sequence fit controls: These
controls affect how AutoLink matches spin system fragments to protein
sequences.
stan dev factor: This factor affects the penalty for chemical shift differences between
the observed and the expected (predicted) chemical shifts when comparing
observed spin sytems to template residue types. A higher value means a lower
penalty for larger deviations. As a general rule this value should be set to
1.0 or near 1.0 for most standard CARA repositories (where the predicted
chemical shifts are already associated with a wide standard deviation). If one
is using nearest-neighbor predicted chemical shifts, however, a value of ~1.5
is more appropriate. This allows for the possibility of somewhat deviant
chemical shift matches while still biasing moderately heavily in favor of close
chemical shift matches.
sequence fit threshold: This value is the minimum fitness score that a fragment to sequence
match must have in order to be considered as a possible match. Not only does
the overall score of each fragment have to be higher then the sequence fit
threshold, but so also must be the individual sequence fits of each spin system
with the fragment to their corresponding positions within the protein sequence.
In general good values for this parameter are between 0.2 and 0.4. Correct
sequence matches in this range are extremely improbable (most good matches are
at least 0.7 or greater), but a low value is chosen to allow for the
possibility of unusual shemical shift values. In general, since AutoLink relies
on only relative criteria for its decisions, a high value for sequence fit
threshold is unnecessary.
Nearest neighbors: This is a toggle switch that
controls whether AutoLink uses the empirical chemical shifts found in the
repository’s residue types or uses nearest-neighbor-predicted chemical shifts
for spin system to sequence matching. For nearest-neighbor-based chemical shift
prediction, AutoLink uses the parameters published by Wang and Jardetzky
(2002)* which can take into account secondary structure information.
* Wang
Y, Jardetzky O. Investigation of the neighboring residue effects on protein
chemical shifts. J Am Chem Soc. 2002 Nov 27;124(47):14075-84
non-matching spin bias:
This bias controls how AutoLink considers matches of spin systems that have
“inappropriate” spin labels to residue positions in the protein sequence. An
inappropriate spin label is, for example, a “Cb” in a spin system that is currently
being considered for fitting to a GLY in the protein sequence. If non-matching
spin bias is set to 1, then an inappropriate spin will cause the spin
system-sequence match to be considered impossible (fitness score 0). A setting
of 0, on the other hand, will cause the non-matching spin sto be completely ignored.
Intermediate values cause intermediate penalties. Settings other than 1 allow
AutoLink to consider the possibility of user error in defining the spin labels
of spin systems during sequence matching.
Secondary CA bias: This bias allows AutoLink to bias
its links and assignments based on putative secondary structure elements based
on the Ca chemical
shifts without prior knowledge (either from the user, or from predictions)
about the secondary structure. The program checks for fluctuations of the Cas from above and below the average
value and biases against fragment to sequence matches with more fluctuations
compared to matches with fewer. Since the direction of the Ca deviation from the average values
is secondary structure dependent, a high value for secondary CA bias causes
AutoLink to bias in favor of chemical shift assignments that are consistent
with some kind of secondary structure elements. The penalty is actually not
assessed for fluctuations across the average value itself, but rather for fluctuations
around a “neutral zone” which is at a set distance from the average chemical
shift values. This is necessary because magnitude of the average deviation in
the Cas of the a-helices from the overall average is different from
the magnitude of the average deviation of the b-sheets. Empirical testing shows
that the neutral zone should be centered (neutral zone parameter in AutoLink’s
user interface) at about -1 ppm from the average values. Of course minor
fluctuations should be ignored, so the neutral zone must have a width >0
ppm. Testing shows that a reasonable value for the width of the neutral zone to
be ~+/ 1 ppm. As with most of AutoLink’s biases, the strength of the bias
ranges from 0-1, controlled by the slider in the user interface.
Sequence controls: By
clicking this button (middle of the menu bar at the top of the main program
window), the user can bring up another window which contains controls that
affect specific positions within the protein sequence. One of them is the
ability to include/exclude specific residue positions from the analysis. This
is useful for assignment problems where not every spin system is evident in the
spectra and the user has some idea which segments might be absent. By telling
AutoLink to exclude certain positions from consideration, the program must
instead only look for links that form fragments that fit into the remaining
sequence positions. This function is also useful if the user thinks there may
be problems in the input data (i.e. incorrectly classified peaks in spin
systems). For these cases, AutoLink will generally have difficulty finding an
assignment at some positions within the protein sequence, specifically those
for which the spin system is incorrect. In order to deal with this, the user
can loosen the input parameters stringency (see the scoring biases and scoring
equation sections above) and direct AutoLink to look for only specific links.
The sequence control window can also be
used to input secondary structure data. This data might be from sequence alignments
with a known protein, or from secondary structure prediction algorithms or even
just a guess. If secondary structure information is included, AutoLink will
only look for results that are consistent with the specified secondary
structure.
Alternatively, secondary structure
prediction information from the program YASPIN* can be directly included in
AutoLink’s analysis, allowing the uncertainties in the secondary structure
determination to be propagated through AutoLink’s analysis. In order to enter the
output from YASPIN into AutoLink, one must first run YASPIN (this can be done
from their website, http://ibivu.cs.vu.nl/programs/yaspinwww/)
and save the results into a file. This file can be simply a saved copy of the
e-mail response sent by the YASPIN server. Just right click somewhere on the
secondary structure prediction panels in the sequence control window and select
“read from file”. This will bring up a new window from which the user can select
the YASPIN output file. After the file is selected, the results will be
displayed in the secondary structure prediction boxes. The user can control
which parts of the prediction are included in AutoLink’s analysis by
straightforward point-and-click controls. Testing shows that
the best results are obtained by including the helix and sheet predictions, but
disincluding the coil-predicted segments. This is because the chemical
shift prediction parameters of Wang and Jardetsky are heavily biased in favor
of actual random coil elements, while the secondary structure prediction from
YASPIN classifies anything (including loops and turns) together with random
coil elements.
*Lin
K, Simossis VA, Taylor WR, Heringa J. A simple and fast secondary structure
prediction method using hidden neural networks. Bioinformatics.
2004 Sep 17
AutoLink command buttons (top of right scroll window):
Display fragments button: Pressing this button brings up
AutoLink’s “fragment display” in the main viewer window. The fragment display
contains one entry for each currently existing CARA spin system fragment. Each entry consistes of several elements. The first element
is the overall relative fitness score to either the currently position within
the protein sequence or the best available position if the fragment is
currently unassigned. The next element is the individual spin system to
sequence fits for each spin system of the fragment (assuming a match to the
same position as the overall reported score). Right clicking
near one of the spin system to residue matching scores displays a list of the
spins of the spin system, its chemical shift, and the average value and
standard deviation for each spin. The next entry contains a list of the
spin systems (by spin system ID) in the fragment and their corresponding
assignments. Right clicking between two spin systems displays the overall
fitness score of the link hypothesis in the spectral data and the sub-scores
that were combined to calculate the overall score. The next element of the
fragment entries is the delta of the observed Ca from the average Ca value.
This is useful for assessing potential secondary structure elements. These
latter three elements are displayed in color-coded lines to aid the usere in
rapidly interpreting the display. The subsequent elements of the fragment
entries display the best available sequence match, the number of available
sequence matches, the number of sequence matches (available or currently
assigned to other spin systems), and lastly up to two possible sequence
positions where the fragment might match (also irrespective of other
assignments). The fragment display thus contains most of the necessary
information for the user to rapidly assess the quality of the current spin
system links and assignments.
Assign fragments button: Pressing this button causes
AutoLink to assign all currently existing fragments whose assignments can be unambiguously
determined.
Unassign all systems button: Pressing this button and then
selecting “Yes” in the resulting dialogue box unassigns all of the spin systems
without affecting the links between the systems.
Unlink current links button: If this button is pressed and then
the user clicks “Yes” in the resulting dialogue box, all of the current spin
system links between unassigned spin systems will be deleted. Links between
currently assigned spin systems will be unaffected. This is a useful function
in order to start a new AutoLink run unbiased by unassigned spin system
fragments.
Assign Candidates button: Pressing this button brings up two
selectable options: 1) “Assign candidates for Glycine” and 2) “Assign
candidates for Glycine/not Glycine”. If the first option is chosen then
AutoLink will set the candidate list (see CARA user guide) of all spin systems
that it will consider only as a potential match to glycine appropriately. If
the second option is chosen, the AutoLink will (in addition ot
marking all of the glycines) also mark all of the other spin systems as
potemtially matching any residue type except glycine. This will not affect the
program’s results in any way, but can save a little cpu
time. It is actually more useful as a shortcut for the user in order to mark
the spin system candidates appropriately in order to aid the user in inspecting
spin systems and spectra in CARA.
AutoLink button: Pressing this button starts the
main AutoLink algorithm, so it will cause AutoLink to evaluate and examine the
spin system link hypotheses, form appropriate links, and assign fragments.
Some notes on the output: The
following is a sample output segment
from a log file:
line #:
______ ______ ______
______ ______ ______
______ ______ ______
______ ______ ______
______
1
__/0.5164\__/0.3555\__/0.3945\__/0.4377\__/0.4523\__/0.8423\__/0.4660\__/0.4995\__/0.5649\__/0.4877\__/0.4320\__/0.3986\__/0.5020\__
2 [14]
|0.7741| 39 ------ 42 ------ 46 ------
63 ------ 44 ------ 53 ------ 43 ------ 65 ------ 66 ------ 74 ------ 48 ------
79 ------ 75 ------ 51
3 dCA from
mean: -0.44 -3.92
0.90 0.06 0.28
1.48 1.11 0.48
-0.98 1.33 -1.18
-0.76 -2.10 -5.80
4 best
available sequence match: THR54-->LYS67
5 # of
available sequence matches: 1
6
assigned to: THR54-->LYS67
7 # of
sequence matches: 1
8 top
sequence matches:
9
position: THR54-->LYS67 score: 0.7741
Each round of
AutoLink will display one such set of output for each fragment that is at least
2 spin systems long. Line 1 shows the scores of each spin system link (based on
spectra only). Line 2 shows the length of the fragment followed by the best
sequence fit score for the fragment (accounting for current assignments),
followed by the spin systems of the fragment. Line 3 shows the difference of
the Ca chemical
shifts of the above spin systems from their theoretical means based on the
proposed sequence alignment. Line 4 shows the sequence position at which the
fragment gets the score in line 2. Line 5 shows how many positions the fragment
matches above sequence_fit_threshold, taking prior assignments into account.
Line 6 shows the currently assigned position of the fragment in the protein
sequence. Lines 7+ show comparable data as lines 2, 5, and 6, except that the
scores and positions are those ignoring spin system assignments. It is
relatively easy to assign a protein from this data. Simply accept all
assignments for which there is no alternative. Then run the program again,
holding all starting links unchanged and review the fragment list again. Repeat
these steps until no unassigned spin systems are left that match only one
position in the protein sequence.