Result Report Overview

At the completion of a search, a summary report is displayed that provides an overview of the results. There will often be a choice of report formats and each report contains links to more detailed views of the experimental and calculated data.

Types of Summary Report

The default summary report for peptide mass fingerprint results is the Concise Protein Summary. Proteins that match the same set or a sub-set of mass values are grouped into a single hit. The intention is to provide a one page summary of the search results. You can use the format controls to switch to the original Protein Summary, where each protein hit is listed separately, together with details of individual mass value matches.

For MS/MS searches of less than 300 spectra, the default summary report is the Peptide Summary. This provides a clear picture of the peptide matches, grouped into protein hits using a simple parsimony algorithm. If there are 300 or more spectra, the default summary report is the Protein Family Summary. This groups the proteins into families based on a novel hierarchical clustering algorithm and presents these results one page at a time, initially with 20 families per page. This report is ideally suited to very large and complex MS/MS searches, where it is not practical to display all the results on a single HTML page.

For MS/MS results, you can use the format controls to switch to a Select Summary, which is similar to a Peptide Summary, but provides a more compact view of the results. The Select Summary splits the peptide matches assigned to protein hits into a separate report from the unassigned peptide matches. For searches of less than 1000 MS/MS spectra, you can also choose a Protein Summary, but it is not recommended to do so unless you are viewing the results of a combination search. If the sample is a mixture, using one of the Protein Summary reports to view MS/MS search results can give a very misleading picture.

If you are submitting MS/MS searches to an in-house Mascot server, you will also have the option to create an Archive Report. This is simply an edited version of the Peptide Summary report, that only includes the protein hits you have selected. If there are no peptide sequence matches at all from a search of MS/MS data, only molecular weight matches, then a Protein Summary report will be displayed. This indicates that the search has failed. Possibly the spectra are nothing but noise or possibly the search parameters are incorrect in some way.

In summary reports for MS/MS results, if the database was nucleic acid and one or more UniGene indexes have been configured for the database being searched, there will be the option to generate a report in which the protein matches are clustered into UniGene families.

The final choice on the list of report formats is always Export Search Results. This enables the results to be exported in a number of "machine readable" formats, including mzIdentML, the standard interchange format for search results.

Protein View

The protein view of an entry on the hit list can be displayed by clicking on an accession number in a summary report.

Information about the protein, the enzyme (if any), and any modifications are printed at the top of the page. This is followed by the formatted sequence of the protein in 1-letter code with matched peptides highlighted in bold, red type.

If the sequence database was nucleic acid, and the matches all came from a single frame, the report will be very similar to that for a protein database entry. If the matches come from multiple frames, because of a frame shift or splice, then only one frame at a time will be displayed. A drop down list can be used to switch between frames.

The sequence block is followed by a detailed table of the peptide matches. For an enzyme digest, you can also choose to display all the calculated peptides, whether matched or not, including all partials up to the limit specified by the Missed Cleavages parameter. The matched peptides are shown in bold, red type, together with a link to the corresponding peptide view. If no enzyme or a semi-specific enzyme was used, this option is not available, and the table contains only the matched peptides.

If the enzyme was a mixture of independent enzymes, and you choose to display calculated peptides, these will be shown for one enzyme component at a time. A drop down list can be used to switch between enzymes. The formatted protein sequence shows highlights for all matches at all times.

The default sort order is start residue order. Controls are provided to re-display the table sorted by increasing or decreasing peptide molecular weight

A graph displays the mass differences between the calculated and experimental mass values for the protein match in the same units as were used to specify the peptide mass error tolerance. There is also a figure for the RMS error of the set of matched mass values in ppm.

If available, at the bottom of the page, the full text of the sequence annotations is reproduced.

Genomic sequences

If the match is to a very long nucleic acid sequence, (greater than 30,000 bases by default), the conventional Protein View is impractical. In this case, Mascot will automatically generate a DDBJ/EMBL/GenBank format feature table. For example:


BLASTCDS        422..469
               /label=Q103
               /colour=2
               /note="Mascot match, ... sequence=GLGTDEDTLIEILASR"
               /blastp_file="../data/20001016/FTGrCfc.dat"
               /mass=1701.88
               /score=82
               /rank=1
               /translation="GLGTDEDTLIEILASR"
BLASTCDS        603..650
               /label=Q105
               /colour=2
               /note="Mascot match, ... sequence=SEDFGVNEDLGDSDAR"
               /blastp_file="../data/20001016/FTGrCfc.dat"
               /mass=1738.73
               /score=82
               /rank=2
               /translation="SEDFGVNEDLGDSDAR"

By default, only matches with significant scores (p < 0.05) are output. A different score threshold can be specified by appending &_featuretableminscore=X to the protein view URL, where X is the score threshold.

The feature table can be saved to a text file and read into a genome browser such as Artemis from the Sanger Centre. This provides a very flexible and powerful way to view Mascot peptide matches in genomic sequence data.

Peptide View

The Peptide View of a matched peptide can be loaded by clicking on a query number hyperlink in a summary report or an ions score hyperlink in Protein View.

The name of the protein and the 1-letter sequence of the peptide are printed at the top of the page, followed by the query title, if any. Below this is a mass spectrum labelled with fragment ions, e.g. b(6). Note that a small interval around the peptide molecular ion (±2 Da by default) is omitted from the spectrum, reflecting the suppression of these data points in the Mascot search.

Clicking the mouse within the spectrum can be used to zoom in by a factor of 2, so as to show greater detail in crowded regions. Alternatively, controls above the spectrum can be used to specify the plotted mass range directly or reset the mass scale.

In the spectrum and the table that follows, you can choose whether to label all possible matches or just the matches used for scoring.

Mascot begins by selecting a small number of experimental peaks on the basis of normalised intensity. It calculates a probability based score according to the number of matches. It then increases the number of selected peaks, re-calculates the score, and continues to iterate until it is clear that the score can only get worse. It then reports the best score it found, which should correspond to an optimum selection, taking mostly real peaks and leaving behind mostly noise.

If you choose to label all possible matches, remember that many spectra have "peak at every mass" noise, and can match any ion series from any sequence if there is no intensity discrimination.

The matched fragment ions are shown in tabular format below the spectrum. The ion series are those specified by the INSTRUMENT search parameter. If you choose to label the matches used for scoring, bold italic red means the series contributed to the score. Bold red means that the number of matches in the ion series is greater than would be expected by chance, indicating that the ion series is present. Non-bold red means that the number of matches in the ion series is no greater than would be expected by chance, so that the matches themselves may be by chance.

A graph displays the mass differences between the calculated and experimental fragment ion mass values in the units used to specify the error tolerance. A second graph shows the same points but with an axis in ppm. The root mean square (RMS) error of the set of matched mass values is given in ppm.

If any residues in the matched peptide have modifications with multiple neutral losses, the table shows the values corresponding to the dominant neutral loss(es). The text immediately above the table gives details. The labels in the spectrum are for all peaks that were selected and matched to obtain the best score, and any neutral losses form part of the label. So, for example, the spectrum might contain peaks labelled y(9) and also y(9)-98. The table will list just one of these values in the y column.

A link is provided to perform a BLAST search of the matched peptide sequence at NCBI. If NCBI is busy, then copy the sequence to the clipboard and follow the final link to a list of alternative BLAST engines.

Finally, the alternative matches to the same MS/MS spectrum are tabulated, allowing you to load Peptide View reports for other matches. If the top rank match is significant and contains one or more variable modifications for which alternative arrangements are possible, site analysis information is displayed.

UniGene

One of the drawbacks of searching an EST database is that there are very few long sequences, so that extended groupings of peptide matches into protein matches are rare. This can be rectified with UniGene, an index created by automatically partitioning GenBank sequences into a non-redundant set of gene-oriented clusters. Each UniGene cluster is a list of the GenBank sequences, including EST's, which represent a unique gene. It is not an attempt to produce a consensus sequence.

If one or more UniGene indexes have been configured for the database being searched, there will be a format control to generate a species based UniGene report.

Following a Protein View link from a UniGene report will display a list of Unigene family members in place of the standard Protein View.

URL Switches

There are a number of switches to modify the format of the result reports. Many of these have a global default, set by a parameter in the Options section of mascot.dat. These defaults can be changed in an individual report using the format controls, or by appending the relevant switch to the report URL. Switches take the form label=value and the delimiter between switches is an ampersand (&). For example, if the report URL was:

http://local-server/mascot/cgi/master_results.pl?file=../data/20040121/F001847.dat

The type of report could be changed by appending "REPTYPE=protein":

http://local-server/mascot/cgi/master_results.pl?file=../data/20040121/F001847.dat&REPTYPE=protein

Labels and values are not case sensitive. Note that many labels begin with an underscore character. Values that are not literal strings are shown in italics.

URL arguments relating to quantitation are described here

master_results.pl and master_results_2.pl

URL	mascot.dat	Value	Description
reptype		peptide	Peptide Summary
		archive	Archive Report
		concise	Concise Protein Summary
		protein	Full Protein Summary
		select	Select Summary (hits)
		unassigned	Select Summary (unassigned)
report		auto	Report all significant hits
report		N	Report N hits
_showsubsets	ShowSubSets	1	For a Peptide Summary, set the value to 1 to report all hits that match a subset of peptides. Default is 0 for no sub-set hits. Intermediate values set a threshold on the difference in protein score between the primary hit and the sub-set hit expressed as a fraction.
_requireboldred	RequireBoldRed	1	Set value to 1 to report Peptide Summary hits only if they contain at least one "bold red" peptide, (default 0).
_showallfromerrortolerant	ShowAllFromErrorTolerant	1	Set value to 1 to report all matches from an error tolerant search, including the garbage, (default 0)
_onlyerrortolerant		1	Set value to 1 to report only error tolerant matches from an automatic error tolerant search, (default 0)
_noerrortolerant		1	Set value to 1 to suppress error tolerant matches from an automatic error tolerant search, (default 0)
_show_decoy_report		1	Set value to 1 to to display the report for an automatic decoy database search, (default 0)
_sigthreshold	SigThreshold	N	Probability to use for the significance threshold. Range is 0.99 to 1E-18, (default 0.05).
_sortunassigned	SortUnassigned	scoredown	Sort unassigned matches by descending score, (default)
		queryup	Sort unassigned matches by ascending query number
		intdown	Sort unassigned matches by descending intensity
_ignoreionsscorebelow	IgnoreIonsScoreBelow	N	Values greater than 0 and less than 1 act as an expect value threshold, and the scores for any peptide matches with higher expect values are set to 0, so that they disappear from the report. Values of 1 or more act as a score threshold, and any peptide matches with lower scores suppressed. Floating point number, (default 0.0).
_showpopups		true	Show top 10 peptide matches for each query in JavaScript pop-up, (default)
_showpopups		false	Suppress JavaScript pop-ups.
_alwaysgettitle		1	Set to 1 to force reports to fetch Fasta titles from database when they are not included in the result file, (default 0 in master_results.pl, 1 in master_results_2.pl).
_server_mudpit_switch	MudpitSwitch	N	Protein score calculation switches to large search mode when the ratio between the number of queries and the number of database entries, (after any taxonomy filter), exceeds this value, (default 0.001).
percolate	Percolator	1	Set value to 1 to re-rank results using Percolator, (default 0).
percolate_rt	PercolatorUseRT	1	Set value to 1 to include retention time feature when using Percolator, (default 0).
_proteinfamilyswitch	ProteinFamilySwitch	0	The number of MS-MS spectra required for displaying the Protein Family Summary report. Set to 0 to force results to be always displayed as Protein Family Summary, (default 300).
_prefertaxonomy		N	1-based integer index into the list of taxonomies in the Mascot `taxonomy` file. 0 means no preference.

protein_view.pl

URL	mascot.dat	Value	Description
sort		startup	Sort table of peptides by ascending start residue number, (default)
		massup	Sort table of peptides by ascending mass
		massdown	Sort table of peptides by descending mass
showall		true	Show all calculated peptides, not just matched peptides
showall		false	Show just matched peptides, (default)
_showallfromerrortolerant	ShowAllFromErrorTolerant	1	Set value to 1 to report all matches from an error tolerant search, including the garbage, (default 0)
_onlyerrortolerant		1	Set value to 1 to report only error tolerant matches from an automatic error tolerant search, (default 0)
_noerrortolerant		1	Set value to 1 to suppress error tolerant matches from an automatic error tolerant search, (default 0)
_show_decoy_report		1	Set value to 1 to to display the report for an automatic decoy database search, (default 0)
_sigthreshold	SigThreshold	N	Probability to use for the significance threshold. Range is 0.99 to 1E-18. Default is 0.05.
_ignoreionsscorebelow	IgnoreIonsScoreBelow	N	Values greater than 0 and less than 1 act as an expect value threshold, and the scores for any peptide matches with higher expect values are set to 0, so that they disappear from the report. Values of 1 or more act as a score threshold, and any peptide matches with lower scores suppressed. Floating point number, (default 0.0).
_server_mudpit_switch	MudpitSwitch	N	Protein score calculation switches to large search mode when the ratio between the number of queries and the number of database entries, (after any taxonomy filter), exceeds this value, (default 0.001).
_featuretablelength	FeatureTableLength	N	Length of database entry in bases at which protein view switches to GenBank output. Default 30000
_featuretableminscore	FeatureTableMinScore	N	Score threshold for inclusion in GenBank feature table format, if undefined then report includes matches that exceed lower of homology or identity threshold
indyenzyme		N	If enzyme was independent, display cleavage products for this specificity index
frame		N	For a nucleic acid database, display matches in this frame number
percolate	Percolator	1	Set value to 1 to re-rank results using Percolator, (default 0).
percolate_rt	PercolatorUseRT	1	Set value to 1 to include retention time feature when using Percolator, (default 0).