AutoLink Instructions:

Quick start instructions (the short version):

The purpose of this guide is to aid users in using AutoLink to automatically obtain backbone resonance assignments as easily as possible. Just download the program from this website and unpack it. From CARA’s terminal window select run from file and indicate the AutoLink executable in the dialogue box. For most cases, AutoLink’s controls will start with reasonable settings, so you can just activate the main process by pressing the AutoLink button. The program will run for anywhere between a few minutes and a few hours (possibly days for unusual projects and settings). When it has reached completion, it will display its results in its main program window. Red is bad, yellow good. Dark blue indicates a poor sequence match, but red means even worse. The blue and green line in the fragment display corresponds to Ca delta deltas. Green indicates higher probably for a-helices, blue for b-sheets, and in between for loops, coil, etc.

Right click functions on the fragment display will give more specific information. Note: AutoLink is driven by “relative certainty”. This means that the display will have red markings for spin system links that either don’t match very well, or, even if the spin systems do match, if there is not enough data to be relatively certain of the spin system link (other reasonable alternative links exist).

For first-time AutoLink users: it is recommended that you just click on the AutoLink button and start the main process as soon as your CARA repository is ready. You can familiarize yourself with the information in this manual while the program runs, if you like, but the best way to learn how to use AutoLink is to allow it to create some output. You can then just inspect the output (which will probably be somewhat intuitive to you) with the right click functions and get a “feel” for what the output indicators mean. Remember “Red=dead”. In other words, if you see red in your output, it means that AutoLink is not very certain about that part of the assignments/spin system links. The spin systems involved are probably the ones you need to inspect in CARA to verify that the spins are picked properly. If the relative certainty is low, you may want to consider acquiring a bit more data to help in assigning your molecule.

A final note to all users: AutoLink will consider all assignments provided by it user in the CARA repository as unchangeable, so it’s a good idea to unassign any spin systems that you are not sure of before running the program.

Good luck and remember—you can always just e-mail for support if you have any questions or suggestions (mailto:jmasse@mol.biol.ethz.ch?subject=AutoLink question).

Getting started (the long version for the really dedicated):

In order to use AutoLink, one must first download and unpack CARA. Then click on the link on the preceding page and download the AutoLink binary to a file. Once the user has obtained both programs, AutoLink can be executed from CARA. Just go to CARA’s “terminal window”, right click in the script window, and select “Run from File”. This will bring up a dialog box which will allow you to select the appropriate file for execution.

In order for AutoLink to function, the user must have a properly defined CARA repository with pre-defined “residue types” and previously created (though unlinked and unassigned) “spin systems” and “spins”. The simplest way to obtain a proper CARA repository is to simply download an appropriate template from the CARA web site, and read in a sequence file that contains the sequence of the molecule of interest (see the CARA user’s manual for details). Creation of spin systems and spins can either be done by hand, which for most NMR assignment problems can be done fairly quickly, or by a downloading peak/spin system-picking script from the CARA website and executing it.

Once the spin systems and spins have been created in CARA, the user is ready for AutoLink. AutoLink’s primary goal is to link the CARA spin systems, much as the user would by an semi-automated approach. By doing so, AutoLink builds spin system fragments which it then compares to empirically-derived chemical shift values in order to assign the fragments to specific residues within the macromolecule.

Like CARA, AutoLink does not have any special requirements as to what type of data must be provided. It can work from any set of NMR spectra. The program uses an RHP-(Relative-Hypothesis-Prioritization)-based approach to decision making. This means that the program can determine the “uniqueness” of the spin system links and assignments it makes. It is designed to avoid over-assigning molecules where the data may not be sufficient to determine a solution. This means that the program will avoid making assignments that cannot be known with relative certainty within the context of the data it is given. Thus AutoLink works best with data sets where more data is available. The program can also work with restricted amounts of data, however, for which it will determine what parts of the sequence can be determined. The undeterminable parts of the sequence will remain unassigned. AutoLink is generally capable of returning good results on minimal spectra. In fact it is theoretically possible to assign a molecule with no sequential data at all, relying on the observed chemical shifts (from a ¹⁵N-HSQC, for example) and nearest-neighbor analysis to determine assignments. This is only likely to return a reasonable result, however, for very short sequences (<20 residues?) which contains no repetitive elements.

In general, for most molecules of <150 residues, three spectra are sufficient to obtain a reasonable set of assignments. Typically, for proteins, these spectra are a ¹⁵N-NOESy-HMQC, an HNCA, and either a CBCA(CO)NH or an HNCACB.

AutoLink is capable of making use of both short-range and medium-range NOE data, as well as supplied secondary structure information (i.e. from sequence alignment to a molecule of known structure, or sequence-based secondary structure prediction algorithms).

Use of AutoLink is through its graphical interface, which makes use of strategic color-coding to aid both in operating the program and interpreting its output. Once assignments have been obtained, it generally only takes a few minutes to assess their quality.

The remaining instructions below describe the controls and functions of AutoLink in some detail. This manual is not comprehensive as it is still under construction. However, from skimming through it the user can get a general idea of the program’s capabilities, and how to best apply it to overcome problems with overlap in the input data, work with data where some spectra contain a more complete set of resonance than others, and compensate for other common problems in NMR data. AutoLink can be classed as an “expert system” type of program which is best used, as usual, by an “expert user”, which one can become with only a little practice. It should be noted that the control settings that are already present in the downloaded binary have been set to pre-optimized defaults which are good for most NMR problems of <150 residues, so even the inexperienced user can get off to a running start.

As a general rule of thumb, most sequences of <150 residues can be assigned by AutoLink in a few hours, if all of the spin systems have been identified correctly or the program has been given sufficient instructions to compensate for possible errors. Longer sequences, or more complex analytical procedures (such as medium-range NOE usage) can cause the program to take considerably longer to run, especially in cases where the amount of corroborating input data is low (i.e. assigning a molecule solely from NOE data). Work is in progress to increase AutoLink’s speed through the use of multiple coordinated copies of the program working together on a network.

About this guide:

This guide is crude and still under construction. The following information is not comprehensive and will be updated later. Still, this page contains enough information to aid the user in general applications of the program. A more complete description of the algorithm and functions of AutoLink will soon be published. Please feel free to contact us if you have any questions regarding the use of this program—we’re here to help ((mailto:jmasse@mol.biol.ethz.ch?subject=AutoLink question).).

Program Guide:

This program is for the correlation of related spin systems across several spectra and linking of spin systems in to assigned fragments. In order to use it, the spin systems and spins in each spectrum should be previously picked.

Assignment of spin systems proceeds in two basic steps. The first step involves comparison of spin systems to determine the relative probability that they are adjacent in the protein sequence. A pair of potentially linked spin systems is called a spin system “link hypotheses”. In general, spin systems with more peaks in common score higher than those with fewer peaks.

The second step of the assignment process involves assessing how compatible each link hypothesis is with the protein sequence. As a general organizational feature, the spectrum-based link hypotheses scoring controls are on the left in the AutoLink control window, while the right window contains controls more concerned with sequence comparison and non-spectrum-based biasing parameters.

Scoring of Link Hypotheses:

Each spectrum (or atom type) can be scored separately. The individual spin system pair scores from each spectrum are then combined to get an overall score. The output from the individual spectrum scores and the overall scores are printed to files. After the program has been run these files will be found in the same directory as the CARA repository file in files called "*.out". Which file contains which output will be obvious. The output files contain lists of spin systems and their relative scores. The lists are sorted such that the best matches are at the top of the list. Use of the lists to aid assigning is simple. You can just start at the top of the list (the TOTAL list is the best start point) and inspect this spin system to see if it is a match.

The following sub-sections below describe parameter settings that affect the evaluation and relative importance of each spectrum or atom type for spin system pair (link hypothesis) scoring.

xpeaks (on/off): This parameter controls whether the relevant scoring function is used in the analysis.

spectral density bias: During the comparison of spin systems within the spectrum, it is possible to weigh against spins that commonly occur in many spin systems, and weigh in favor of spin systems that are rare. Rare peaks are often more significant in considering spin system links. The range of acceptable values is from 0 to 1. 0 means that no spectral density compensation is used. A 1 means that a "perfect" spectral density compensation is used, exactly proportional to the spectral densities. A value between 0 and 1 means that partial compensation will be used, proportional to the spectral density, but weighted by the input value. This function is particularly useful for high-density spectra such as noesys.

Spin-spin function: The comparison of peaks within spectra is governed by three components: peak match factor, peak match exponent, and peak match slope: The scoring function is:

score=a*((a-(a*chemical_shift_delta)^b))/a)-(c-1)

where a=peak match factor, b=peak match exponent, c=peak match slope and chemical_shift_delta=the difference in ppm of the two spins being considered. The parameters can be set either by selecting one of the pre-set functions in the spin-spin control box, or selecting 0 and setting the individual components yourself. The following is a list of the preset function values and the corresponding settings for peak match factor, peak match exponent, and peak match slope.

nucleus	spin-spin function	fact	exp	slope		M
¹H	1	100	4	1	ß	o
¹H	2	400	1.9	1		s
¹H	3	1	1	20	ß	t
¹H	4	1	1	0.9
¹³C	1	120	2	1		o
¹³C	2	10	3	1	ß	f
¹³C	3	2	4	1		t
¹³C	4	3	1.5	1		e
¹³C	5	7	20	20		n
¹³C	6	1	1	12
¹³C	7	1	1	8	ß	u
¹³C	8	1	1	4	ß	s
¹³C	9	1	1	2		e
¹³C	10	1	1	1.5		d

Score vs. ppm delta curves are plotted in the main viewer window whenever a parameter is changed, so selection/design of an appropriate matching function is straightforward. Testing has shown that the most useful settings appear to be setting 1 for ¹Hs and 8 for ¹³Cs.

offset bias: If this parameter is set to 1, only spins whose offset (ie the "-1" in CA-1) will be considered. In other words, spin systems that have a match of HAàHA-1 or HAàHB-1 will score positively, while spin systems that have a HAàHA or HA-1àHA-1 willscore poorly. If this is set to 0, then the offset is ignored and all of the above examples would score the same, if their chemical shifts matched equally. Values between 0 and 1 cause proportional weighting. Set this closer to 1 for spectra or atom types where the offsets are certain (for example when an HN(CO)CA spectrum has been used to identify the CA-1 peaks in an HNCA spectrum).

Atom label bias: If this parameter is set to 1, only spins matching in atom type will be considered (ie: Cas with Cas reguardless of offset, and Cbs with Cbs, also regardless of offset). A 0 setting will cause the atom type to be irrelevant in the scoring. This is useful for spectra such as HNCACB where the Cas and the Cbs have different signs, and can be easily separated.

Labeled spin bias: When picking peaks, it is sometimes impossible to be sure which spin is, for example Ca and which is Ca-1. Therefore, it is possible to leave the picked spin unmarked. If this parameter is set to 0, all spin system matching will be independent of atom label. This is especially important for noesys, where the peaks are rarely assigned before the sequential assignments are finished. A 1 will cause all spins for which there is no label to be ignored.

Minimum fitness threshold: All scores in a given spectrum for each spin system that are below this value are rounded to 0. This is just for convenience. Useful values are between 0 and 1, since all individual spectrum scores are scaled to that range. A higher setting will result in using less computer memory, shorter output lists, and faster calculations. It also may mean a loss in accuracy if it is set too high.

Previous run file: A previous output file can be used as a basic interface to give very specific commands to AutoLink. In order to use an previous output file as an input file, the file must simply be renamed to “previous_run.inp”. This input file can be used to make notes for yourself while assigning your molecule which will be propagated through successive rounds of assignment into future output files. Comments in the data file are expected to consist of a "#" character followed by anything up to the end of the line. Note that the comments can also contain commands for the AutoLink algorythm. Four commands are currently supported: "<link>", "<unlink>", "<exclude_1>", and "<exclude_2>". The <link> and <unlink> commands link and unlink the spin system pair on whose line they occur in the data, and force AutoLink not to change their status. AutoLink can also be told to not change any links already established at the start of the algorythm (see below). The <exclude_1> and <exclude_2> commands force one spin system of the pair on the line to be excluded from possible linking/unlinking. The <exclude_1> command affects the first spin systems ability to change its successor link or link status. Likewise, the <exclude_2> command affects the ability of the second spin system to change its predecessor link or link status. These commands are generally useful for excluding spin systems from linking if they may be considered artifacts in the spectrum rather than actual spin systems.

score combination equation: This equation defines how each of the individual spin pairing scores are combined to form the total. Each score type is addressed by its common name. For example, the equation CA*CB defines the total overall score for any given spin system pair as the score from compairing the Cas times the score from comparing the Cbs. The individual scores are pre-standardized before combining them, so each score type ranges from 0 to 1. This is important when balancing the relative importance of each score type. For example, if one wanted to add ¹H data to Cas and Cbs, but wanted the Cas to be weighted twice as heavily as the other two, the equation "H+CB+2*CA" would suffice. This type of scoring combination allows a great deal of flexibility in "fuzzy-logical" spin system pair scoring. A "+" is comparable to a "fuzzy-OR" function. A "*" functions as a "fuzzy-AND" operation. The equation can be as complex as necessary, using parenthesis (though {, [, and ( are all syntactically synonymous but can be used for clarity to the user). Exponentiation is symbolized by either the "**" or the "^" operators. Numbers preceding ('s are assumed to be coefficients of whatever is in the parenthesis. To aid in complex fuzzy score combinations, three new operators have been defined, "&&", "||", and "&|". || is defined to be the average of the two operands (i.e. 1||0.5=0.75). This can be used as a quasi-OR whose product maintains the original value ranges of the operands. && is defined by the equation x&&y=(x*y)^0.5. This is a quasi-AND which also maintains the original value range of the operands. &| is a quasi AND/OR function defined by x&|y=((x+y)/2+(x*y)^0.5)/2. This function is half-way between a quasi-AND and a quasi-OR and is remarkably useful for analyzing spin system pairs. For example, the equation (H&|CB)*CA could be interpreted as "requiring >0 score in the noesy or >0 score from the Cbs, much better if both are >0, but still marginally OK if just one of the two is >0, but still requiring a >0 score in the Cas." Of course all of the operators give gradual results, with better scoring individual scores leading to higher overall scores.

It is often best to start out with very a stringent equation at first, using stringent parameters for the individual pairing scores as well, and then loosening things up later on, maintaining the spin-system links obtained from stringent rounds of autolinking during the later less stringent rounds.

Non-spectrum-based Biasing controls:

This section is primarily concerned with the control parameters of AutoLink that are involved in both spin-system-to-sequence comparison and also with those that control the rhp-based “decision-making” process.

Hold existing links: If this is on, then AutoLink will not change any of the links currently existing at the time it is started. It will instead try to finish the assignments in consideration of the current links. This is useful if the user already has some links formed, either from previous AutoLink runs or by manual analysis which are considered certain. If the user has some links but is not absolutely sure they are correct, then this parameter is best set to off. In this case, AutoLink will still try to assign the molecule with the current assignments in mind, but will be free to change the starting links if the need arises.

It is important at this point to highlight the difference in AutoLink’s behavior toward pre-defined links and pre-defined assignments. As described in the previous paragraph, AutoLink can be either directed to consider previously existing spin system links as automatically correct, or alternatively to consider them only as suggestions. This is in contrast to the way AutoLink views pre-existing residue assignments. AutoLink will always consider all existing assignments as 100% correct. This is because assignment is the goal of the program. If the user is uncertain of any existing assignments, it is best to remove these assignments prior to running the program. In fact there is little purpose for most applications in including any pre-existing assignments, since AutoLink will automatically reassign all of the spin systems that can be unambiguously assigned at the user-designated points in its analysis (see “Assigned Positions” below).

Max. # num of rounds: This is the maximum number of AutoLink cycles that will be executed. A 0 value means that there is no maximum. Note that AutoLink may stop at an earlier number of rounds if it can't make any more progress with the current settings. In general, it’s a good idea to set this to 0 and let the program self-terminate. This is because only at the end of the process is AutoLink relatively certain of it’s links.

# links/round: AutoLink works by doing a sort of "cha-cha". It accepts a certain maximum number of new links per cycle. It then removes some of the links. How many it removes depends on these control settings and how many links were formed in the same round (The program will try to avoid having a net loss of links in any given round). # links/round is the maximum number of links that will be formed per round. New links may require that old links be broken, however, so it is possible for fewer “net” links to be made.

# unlinks/round: This is the maximum number of backward steps (unlinks) per round AutoLink will take. Fewer unlinks may occur in order to insure forward progress for any given round.

link threshold: This is the relative fitness score that a spin system link must match or beat in order to be considered. Lower values will cause the program to take longer to run, while higher values may restrict the number of links that can be made.

Unlink threshold: This is the relative fitness score that a spin system link must be below in order to be considered for unlinking. Note that a spin system link that is above the threshold can still be broken if it is necessary to do so in order to form an even more favorable link.

Score delta bias: Autolink is capable of evaluating the relative importance of hypothetical links. A spin system may only have one reasonable partner for a link, while another spin system link in absolute terms may appear more favorable. If score_delta_bias is set closer to 1, AutoLink will tend to give precedence to spin system links that are relatively fit compared to other spin system links involving the same spin systems and other spin systems. In general it is a good idea to always set this parameter to 1 (fully on). Testing shows that it just makes the program “smarter”.

Random factor bias: If this value is set >0 then a random factor will be added to the evaluation of the fitness scores to determine links and unlinks. The relative importance of the random factor is controlled by this factor. Values closer to 0 will result in less importance of the random factor. Values closer to 1 will result in more importance of the random factor. The magnitude and the distribution of the random factor is controlled by the following factors: center_weighting_factor, number_of_centered_random_numbers, off_center_weighting_factor, number_of_off_center_random_numbers, exponent, max_amplitude. Currently these parameters have been preset to reasonable values. So far, for the nmr assignment problems we have tested, random factor bias has been un-necessary. It may be useful in order to use stochastic effects to speed up the processing of larger problems in the future. Currently it is recommended that it be left set to 0 (fully off), or at least close to 0 (0.01).

juggernaut_mode: This is a binary switch (off or on). If juggernaut_mode is turned on, then AutoLink will be forced to make a new link each round, if possible, even at the expense of breaking several other links. This mode is only used to experiment with new link hypothesis in rare cases when The user believes that something is wrong with the current assignments and wants AutoLink to suggest something new. It is highly experimental and the recommended setting for most applications is off.

Assigned positions: This control affects the way AutoLink handles spin system fragment to protein sequence matching involving spin systems and residues that have been assigned during previous AutoLink rounds. The program can be instructed to consider already existing assignments at three stages—when it attempts to make new links, when it considers which links to break, and/or after all linking/unlinking cycles have been completed. Testing shows that letting AutoLink consider assignments at all three stages is best. The program will generally run faster if it is told not to consider assignments during the cycles (the first two points listed above), but may be less accurate for assignment problems with much ambiguity. If AutoLink is directed to consider assignments at the end of the cycles, it will assign whichever fragments can be unambiguously assigned before terminating. Otherwise, the fragments will be left unassigned.

Link repeat bias: This control is 1- the “link repeat penalty” described in the paper. The control setting has been reversed merely to make it more consistent with the other control biases (0=off, 1=on). Link repeat penalty is a factor which affects repeated attempts to form a link involving the same spin system. AutoLink remembers how many times a spin system is linked (both as a predecessor and as a successor). All link hypothesis are biased by multiplication by link_repeat_penalty^(number of repeated attempts to find match). This is important for avoiding getting stuck in a loop. Sometimes once a link is formed, it causes a later re-evaluation of its own score such that it becomes disfavored. Then a new link forms that again makes the original link look OK again. Sometimes a cycle forms where the same set of links are repeatedly accepted and rejected. This is where link repeat penalty is especially important because it forces AutoLink to break the cycle and consider other links. The links involved in the cycle are then re-evaluated in the context of the new links. This process is repeated until either the new links determine which of the links involved in the cycle or until all of the links involved in the cycle are disfavored below the link threshold, in which case, none of the links will be accepted, being regarded as indeterminate. The bottom line is that AutoLink "saves for later" ambiguous link hypotheses until they are no longer ambiguous. For this reason, it is important that link_repeat_penalty be set to a value <1 (link repeat bias >0). The best values are those that are nearly 1, but not quite (i.e. 0.90-0.99) (link repeat bias 0.01-0.1). Values in this range are sufficiently "not 1" enough to prevent perpetual cycling, but not enough of a penalty to significantly effect non-cyclic hypothesis re-evaluation.

AutoLink log file: This is the name of the file where AutoLink will report what it is doing during the linking/unlinking cycles. See below for some notes about the output. The progress of the program can be evaluated by viewing this file.

Sequence fit controls: These controls affect how AutoLink matches spin system fragments to protein sequences.

stan dev factor: This factor affects the penalty for chemical shift differences between the observed and the expected (predicted) chemical shifts when comparing observed spin sytems to template residue types. A higher value means a lower penalty for larger deviations. As a general rule this value should be set to 1.0 or near 1.0 for most standard CARA repositories (where the predicted chemical shifts are already associated with a wide standard deviation). If one is using nearest-neighbor predicted chemical shifts, however, a value of ~1.5 is more appropriate. This allows for the possibility of somewhat deviant chemical shift matches while still biasing moderately heavily in favor of close chemical shift matches.

sequence fit threshold: This value is the minimum fitness score that a fragment to sequence match must have in order to be considered as a possible match. Not only does the overall score of each fragment have to be higher then the sequence fit threshold, but so also must be the individual sequence fits of each spin system with the fragment to their corresponding positions within the protein sequence. In general good values for this parameter are between 0.2 and 0.4. Correct sequence matches in this range are extremely improbable (most good matches are at least 0.7 or greater), but a low value is chosen to allow for the possibility of unusual shemical shift values. In general, since AutoLink relies on only relative criteria for its decisions, a high value for sequence fit threshold is unnecessary.

Nearest neighbors: This is a toggle switch that controls whether AutoLink uses the empirical chemical shifts found in the repository’s residue types or uses nearest-neighbor-predicted chemical shifts for spin system to sequence matching. For nearest-neighbor-based chemical shift prediction, AutoLink uses the parameters published by Wang and Jardetzky (2002)* which can take into account secondary structure information.

* Wang Y, Jardetzky O. Investigation of the neighboring residue effects on protein chemical shifts. J Am Chem Soc. 2002 Nov 27;124(47):14075-84

non-matching spin bias: This bias controls how AutoLink considers matches of spin systems that have “inappropriate” spin labels to residue positions in the protein sequence. An inappropriate spin label is, for example, a “Cb” in a spin system that is currently being considered for fitting to a GLY in the protein sequence. If non-matching spin bias is set to 1, then an inappropriate spin will cause the spin system-sequence match to be considered impossible (fitness score 0). A setting of 0, on the other hand, will cause the non-matching spin sto be completely ignored. Intermediate values cause intermediate penalties. Settings other than 1 allow AutoLink to consider the possibility of user error in defining the spin labels of spin systems during sequence matching.

Secondary CA bias: This bias allows AutoLink to bias its links and assignments based on putative secondary structure elements based on the Ca chemical shifts without prior knowledge (either from the user, or from predictions) about the secondary structure. The program checks for fluctuations of the Cas from above and below the average value and biases against fragment to sequence matches with more fluctuations compared to matches with fewer. Since the direction of the Ca deviation from the average values is secondary structure dependent, a high value for secondary CA bias causes AutoLink to bias in favor of chemical shift assignments that are consistent with some kind of secondary structure elements. The penalty is actually not assessed for fluctuations across the average value itself, but rather for fluctuations around a “neutral zone” which is at a set distance from the average chemical shift values. This is necessary because magnitude of the average deviation in the Cas of the a-helices from the overall average is different from the magnitude of the average deviation of the b-sheets. Empirical testing shows that the neutral zone should be centered (neutral zone parameter in AutoLink’s user interface) at about -1 ppm from the average values. Of course minor fluctuations should be ignored, so the neutral zone must have a width >0 ppm. Testing shows that a reasonable value for the width of the neutral zone to be ~+/ 1 ppm. As with most of AutoLink’s biases, the strength of the bias ranges from 0-1, controlled by the slider in the user interface.

Sequence controls: By clicking this button (middle of the menu bar at the top of the main program window), the user can bring up another window which contains controls that affect specific positions within the protein sequence. One of them is the ability to include/exclude specific residue positions from the analysis. This is useful for assignment problems where not every spin system is evident in the spectra and the user has some idea which segments might be absent. By telling AutoLink to exclude certain positions from consideration, the program must instead only look for links that form fragments that fit into the remaining sequence positions. This function is also useful if the user thinks there may be problems in the input data (i.e. incorrectly classified peaks in spin systems). For these cases, AutoLink will generally have difficulty finding an assignment at some positions within the protein sequence, specifically those for which the spin system is incorrect. In order to deal with this, the user can loosen the input parameters stringency (see the scoring biases and scoring equation sections above) and direct AutoLink to look for only specific links.

The sequence control window can also be used to input secondary structure data. This data might be from sequence alignments with a known protein, or from secondary structure prediction algorithms or even just a guess. If secondary structure information is included, AutoLink will only look for results that are consistent with the specified secondary structure.

Alternatively, secondary structure prediction information from the program YASPIN* can be directly included in AutoLink’s analysis, allowing the uncertainties in the secondary structure determination to be propagated through AutoLink’s analysis. In order to enter the output from YASPIN into AutoLink, one must first run YASPIN (this can be done from their website, http://ibivu.cs.vu.nl/programs/yaspinwww/) and save the results into a file. This file can be simply a saved copy of the e-mail response sent by the YASPIN server. Just right click somewhere on the secondary structure prediction panels in the sequence control window and select “read from file”. This will bring up a new window from which the user can select the YASPIN output file. After the file is selected, the results will be displayed in the secondary structure prediction boxes. The user can control which parts of the prediction are included in AutoLink’s analysis by straightforward point-and-click controls. Testing shows that the best results are obtained by including the helix and sheet predictions, but disincluding the coil-predicted segments. This is because the chemical shift prediction parameters of Wang and Jardetsky are heavily biased in favor of actual random coil elements, while the secondary structure prediction from YASPIN classifies anything (including loops and turns) together with random coil elements.

*Lin K, Simossis VA, Taylor WR, Heringa J. A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics. 2004 Sep 17

AutoLink command buttons (top of right scroll window):

Display fragments button: Pressing this button brings up AutoLink’s “fragment display” in the main viewer window. The fragment display contains one entry for each currently existing CARA spin system fragment. Each entry consistes of several elements. The first element is the overall relative fitness score to either the currently position within the protein sequence or the best available position if the fragment is currently unassigned. The next element is the individual spin system to sequence fits for each spin system of the fragment (assuming a match to the same position as the overall reported score). Right clicking near one of the spin system to residue matching scores displays a list of the spins of the spin system, its chemical shift, and the average value and standard deviation for each spin. The next entry contains a list of the spin systems (by spin system ID) in the fragment and their corresponding assignments. Right clicking between two spin systems displays the overall fitness score of the link hypothesis in the spectral data and the sub-scores that were combined to calculate the overall score. The next element of the fragment entries is the delta of the observed Ca from the average Ca value. This is useful for assessing potential secondary structure elements. These latter three elements are displayed in color-coded lines to aid the usere in rapidly interpreting the display. The subsequent elements of the fragment entries display the best available sequence match, the number of available sequence matches, the number of sequence matches (available or currently assigned to other spin systems), and lastly up to two possible sequence positions where the fragment might match (also irrespective of other assignments). The fragment display thus contains most of the necessary information for the user to rapidly assess the quality of the current spin system links and assignments.

Assign fragments button: Pressing this button causes AutoLink to assign all currently existing fragments whose assignments can be unambiguously determined.

Unassign all systems button: Pressing this button and then selecting “Yes” in the resulting dialogue box unassigns all of the spin systems without affecting the links between the systems.

Unlink current links button: If this button is pressed and then the user clicks “Yes” in the resulting dialogue box, all of the current spin system links between unassigned spin systems will be deleted. Links between currently assigned spin systems will be unaffected. This is a useful function in order to start a new AutoLink run unbiased by unassigned spin system fragments.

Assign Candidates button: Pressing this button brings up two selectable options: 1) “Assign candidates for Glycine” and 2) “Assign candidates for Glycine/not Glycine”. If the first option is chosen then AutoLink will set the candidate list (see CARA user guide) of all spin systems that it will consider only as a potential match to glycine appropriately. If the second option is chosen, the AutoLink will (in addition ot marking all of the glycines) also mark all of the other spin systems as potemtially matching any residue type except glycine. This will not affect the program’s results in any way, but can save a little cpu time. It is actually more useful as a shortcut for the user in order to mark the spin system candidates appropriately in order to aid the user in inspecting spin systems and spectra in CARA.

AutoLink button: Pressing this button starts the main AutoLink algorithm, so it will cause AutoLink to evaluate and examine the spin system link hypotheses, form appropriate links, and assign fragments.

Some notes on the output: The following is a sample output segment from a log file:

line #:

______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______

1 __/0.5164\__/0.3555\__/0.3945\__/0.4377\__/0.4523\__/0.8423\__/0.4660\__/0.4995\__/0.5649\__/0.4877\__/0.4320\__/0.3986\__/0.5020\__

2 [14] |0.7741| 39 ------ 42 ------ 46 ------ 63 ------ 44 ------ 53 ------ 43 ------ 65 ------ 66 ------ 74 ------ 48 ------ 79 ------ 75 ------ 51

3 dCA from mean: -0.44 -3.92 0.90 0.06 0.28 1.48 1.11 0.48 -0.98 1.33 -1.18 -0.76 -2.10 -5.80

4 best available sequence match: THR54-->LYS67

5 # of available sequence matches: 1

6 assigned to: THR54-->LYS67

7 # of sequence matches: 1

8 top sequence matches:

9 position: THR54-->LYS67 score: 0.7741

Each round of AutoLink will display one such set of output for each fragment that is at least 2 spin systems long. Line 1 shows the scores of each spin system link (based on spectra only). Line 2 shows the length of the fragment followed by the best sequence fit score for the fragment (accounting for current assignments), followed by the spin systems of the fragment. Line 3 shows the difference of the Ca chemical shifts of the above spin systems from their theoretical means based on the proposed sequence alignment. Line 4 shows the sequence position at which the fragment gets the score in line 2. Line 5 shows how many positions the fragment matches above sequence_fit_threshold, taking prior assignments into account. Line 6 shows the currently assigned position of the fragment in the protein sequence. Lines 7+ show comparable data as lines 2, 5, and 6, except that the scores and positions are those ignoring spin system assignments. It is relatively easy to assign a protein from this data. Simply accept all assignments for which there is no alternative. Then run the program again, holding all starting links unchanged and review the fragment list again. Repeat these steps until no unassigned spin systems are left that match only one position in the protein sequence.