Matrix Science
Home Mascot Help  
   
  Help > Micromass MassLynx   
 
 

Micromass MassLynx

What if you don't have ProteinLynx?

In MassLynx 3.5, the tools for creating peak lists for database searching are part of ProteinLynx. Without this option, it is difficult to create good quality peak lists. For example, if we use the following Mass Measure options:

mass measure option dialog

to process a typical MALDI spectrum, (a combination of 13 continuum spectra), we are likely to get something similar to this:

mass measure result

The "real" peaks have been detected, but so have a large number of noise peaks. Increasing the smoothing or using a higher order polynomial for baseline subtraction may help in some cases. But, in general, there will still be ten times as many noise peaks as real peaks, which means that a Mascot peptide mass fingerprint search is unlikely to produce a useful result.

Short of peak picking by hand, the best course of action is to select a sub-set of peaks based on intensity. Make sure you are displaying the full spectrum and the peak labels have a reasonable number of decimal places. Copy the mass and intensity values to the clipboard, (Edit; Copy spectrum list). Then, paste the values into a spreadsheet, sort by descending intensity, and select the more intense peaks. The optimum number of peaks to select will have to be determined by trial and error, but is likely to be in the range 50 to 150.

Copy and paste the selected values into the Mascot search form, and you have every chance of getting an decent result. However, a simple intensity threshold can never do a perfect job because there are real peaks at the high mass end that are weaker than the noise peaks at the low mass end. Also, an ideal peak list needs to be be de-isotoped. That is, it should only include the monoisotopic peak from each isotope cluster.

Fortunately, ProteinLynx includes tools for selecting monoisotopic peaks, and also enables peak list files to be created automatically from a sample list. The remainder of this article illustrates peak detection using MassLynx 3.5 (change note 367) with ProteinLynx, BioLynx, and MaxEnt 3 options.

M@LDItm peptide mass fingerprint

This simple example is for post-acquisition processing of a single data file. First, we create a sample list in which the Process is PeptideAuto

sample list

A suitable Parameter File is created or modified using the ProteinLynx Setup wizard.

setup 1

Click on Process Setup. This example is for a M@LDItm

setup 2

Choice of combine method obviously depends on the dataset. For Centre, this example uses Mass Measure rather than MaxEnt 3, because de-charging of MALDI peptide mass fingerprint data is not usually necessary. Make sure that the Text Export checkbox, Determine monoisotopic peaks, is checked.

setup 3

The options for Auto Select are not critical

setup 4

Background subtract is essential for MALDI data. The recommended settings of 5th order and 40% seem to work well. Smoothing reduces the chance of a peak being split on noise, but oversmoothing can increase the number of baseline noise peaks that pass the Mass Measure minimum peak width criterion. Ideally, you should experiment with typical data to determine the optimum settings for smoothing. Minimum peak width and centroid top settings don't seem critical.

setup 5

In the Monoisotopic Peak Selection dialog, set the low mass threshold to something appropriate for peptide mass fingerprinting, e.g. 500 to 800 Da. Make sure that the Peaks to Search radio button is set to All monoisotopic peaks.

setup 6

In the select output dialog, check Generate text output, and (naturally) choose one of the two Mascot options. The default location for the peak list file is the current temp directory. It is better to specify an explicit path, either on a project basis or a sample list basis, because this will simplify setting up Mascot Daemon in real-time monitor mode to search the peak lists automatically.

setup 7

Press finish and save the Parameter file. You are now ready to Run the sample list. The peak list quality can be excellent. When the data file used for the earlier illustration was processed using these parameters, the peak list contains 66 mass values. A Mascot peptide mass fingerprint search showed one excellent match and two probable matches, accounting for 44 of the 66 values, (results).

QTOFtm LC-MS/MS

Processing LC-MS/MS data into a peaklist is a more complex task because the spectra need to be combined in the time domain according to precursor mass. This is handled automatically by ProteinLynx when the instrument type is Electrospray.

setup 8

The default setting for the QA filter does a good job of removing spectra that contain nothing but noise. For time domain processing, choose either Mass Measure or MaxEnt 3.

setup 9

Although QTOF electrospray data don't exhibit the high sloping background found in MALDI spectra, there are usually large numbers of single count spikes. After smoothing and centering, each spike can end up being output to the peak list. This is not a fundamental problem, because Mascot discriminates against low level noise in MS/MS data. However, it can be an inconvenience because it significantly increases the size of the peak list. If you are searching on our public web site, there is a 5 Mb limit on the size of an upload, and you may hit this limit with only a modest number of spectra. Secondly, Mascot has a limit of 10,000 peaks in any individual MS/MS spectrum. Normally, any spectrum that approaches this limit is not a peak list, but profile data, and the limit serves as a warning that the results will not be as good as if a well processed peak list was being used.

Fortunately, the majority of these single count spikes can easily be removed by using a zero order polynomial for background subtraction. The only time you might choose not to do this is when the spectra are very weak, and 1 or 2 count spikes represent real peaks. Otherwise, the Mass Measure parameters are very similar to the MALDI case.

setup 10

In the select output dialog, check Generate text output, and choose Mascot compatible .pkl format. Check to append all peak lists to a single file. As before, it is better to specify an explicit path for the peak lists, either on a project basis or a sample list basis, because this will simplify setting up Mascot Daemon in real-time monitor mode to perform searches automatically.

setup 11

Press finish and save the Parameter file. You are now ready to run the sample list and reduce a complete LC-MS/MS dataset to a single .pkl file.

Low intensity data

If the data are weak, then the above processing can be too harsh. In such cases, it may be better to use a lower QA threshold, such as 10, and not to attempt background subtraction. Smoothing could be reduced to one or two passes of 3-channel Savitzky-Golay.

MaxEnt 3

In some cases, MaxEnt 3 can produce a significantly better peak list from MS/MS data. The main drawback is processing time. This depends on data quality and processing parameters, but a ballpark figure for a 1 GHz Pentium 4 system is >4 hours to reduce a ~500 Mb RAW file to ~200 MS/MS peak lists.

Compare the results from Mascot searches of a data set processed by MassLynx Mass Measure and MaxEnt 3. For some queries, the score improvement using MaxEnt is dramatic, e.g. the score for query 200 increases from 28 to 90. In other cases, conventional processing produces the higher score, e.g. 108 versus 77 for query 76. The general trend (in red) indicates that MaxEnt is a benefit for the larger peptides, where multiply charged fragments are more abundant.

score differences

Automation using Mascot Daemon

Option 1: Real-time monitor

ProteinLynx allows peak lists to be created automatically during data acquisition. By running Mascot Daemon in real-time monitor mode, each peak list can also be searched automatically, as soon as it appears. First, create a suitable parameter set for the task:

daemon parameter tab

Second, create a real-time monitor task to monitor the directory where the .pkl files are being created. Mascot Daemon tasks run in parallel, but the searches within a task run serially. If you have multiple MassLynx projects, you may find it helps to direct the .pkl files for each project into project specific directories, with a distinct Daemon task assigned to each.

daemon task tab

Option 2: MassLynx Sample List

Alternatively, Daemon can perform a batch task to search the .PKL files from a specific MassLynx sample list. The advantage of this approach is that information from the MassLynx sample list can be transferred to Mascot and used in the results reports. For example, if the sample description is in the 'File Text' field, this can be placed into the Mascot search title by using the <FILE_TEXT> tag:

daemon parameter tab

To specify that peak list filenames are to be taken from a MassLynx sample list, choose this option from the data import filters. The location of the .pkl file directory must be specified in the filter options. Then, add the sample list name(s) to the data file list box.

daemon task tab

If the Mascot task is to be run post-acquisition, then the MassLynx sample list is best opened as a batch task. If you want to start the task running before data are acquired, or during acquisition, then you should create the task as a real-time monitor. This ensures that Daemon will wait until each .pkl file appears. Unlike a batch task, where a file that is 'missing' when the task is started is dropped from the list.

Real-time Monitor Mode with MaxEnt 3

In real-time monitor mode, it is important that Mascot Daemon waits until a file is complete before submitting it to Mascot. To avoid taking a file that is still being written, it checks the file size at intervals, and waits until it has stopped increasing. The default interval is 60 seconds, which works fine for conventional Mass Measure, but may not be long enough for MaxEnt 3, where the file size grows only slowly. To use real-time monitor mode with MaxEnt 3, increase the interval by going to the Timer Settings tab of the Preferences dialog. Increase the value of 'Delay after failing to open read-locked file' from 60 seconds to (say) 600 seconds.

daemon preferences

Acknowledgements

MassLynx, ProteinLynx, BioLynx, M@LDI, QTOF, and MaxEnt are trademarks of Micromass Ltd. We are most grateful to Darryl Pappin of Imperial College London for the example of M@LDI data and to Gavain Sweetman of Cellzome Ltd. for the example of QTOF data.
 
 
Copyright © 2012 Matrix Science Ltd. All Rights Reserved.