Summary Reports for PMF
At the completion of a search, a summary report is displayed that
provides an overview of the results. There is a choice of
report formats and both reports contain links to more detailed
views of the experimental and calculated data.
The default summary report for peptide mass fingerprint results is the
Concise Protein Summary. Proteins
that match the same set or a sub-set of mass values are grouped into a single hit. The
intention is to provide a one page summary of the search results. You can use the format controls to switch
to the original Protein Summary, where each protein hit is listed
separately, together with details of individual mass value matches.
Sections of the report are described in the order in which they appear.
Use
this link to open an example report in a new browser window or tab.
At the top of the report are a few lines to identify the search uniquely:
search title, date, user name, etc. The database version is identified with either a release
number or an ISO datestamp. The score, accession and description for the top scoring protein
hit is listed.
If the search included the auto-decoy option,
false discovery rate information is displayed at this location.
Following the header, a histogram illustrates the protein score distribution. The
50 best matching proteins are divided into 16 bins according to their score,
and the heights of the bars show the number of matches in each bin.
The protein score is a measure of
the statistical significance of a PMF match. The region in which random matches may be expected
is shaded green. This region extends up to
the significance threshold, which has a default setting of 5%. If a score falls in the green
shaded area, there is greater than a 5% probability that the match was a random event, of no
significance. Conversely, a match in the unshaded part of the histogram has less
than a 5% probability of being a random event. It is quite common to see several proteins getting
the same high score. Even if the protein sequences in the database are non-identical,
the same group of matched mass values may occur in multiple proteins.
These controls enable the report format to be modified. After making changes, press the
"Format As" button to reload the report using the new settings.
For a peptide mass fingerprint search, there are just three controls:
- Report format Choose from the list of available formats
- Significance threshold The default significance threshold is p < 0.05. You can
change this to any value in the range 0.99 to 1E-18.
- Maximum number of hits This value was initially chosen when the search was submitted.
Enter a positive integer if you wish to re-specify the number of protein hits to report.
Of course, the total number of hits actually found by the search may be less. The maximum
number of hits saved to the result file is 50.
Entering the word AUTO or a value of 0 will display all of the hits
that have a protein score exceeding the significance threshold, plus one extra hit.
A search can easily be repeated, so as to investigate the effect of changes in search
parameters. Choose Re-Search All to repeat the search with all mass values or
Search Unmatched to repeat the search with only the mass values that did not
get a match in the top hit. This could be a way of invesigating whether the sample was
a protein mixture, although Mascot has a built-in
PMF mixture mode. If there is statistically significant
evidence for a second or even a third protein, this will
appear in the result report.
The body of the report contains a tabular summary of the
best matching proteins. The number of proteins shown is specified in the
search form, up to a maximum of 50. Proteins
that match the same set of mass values, or a sub-set, are grouped into a single hit.
For each protein, the first line contains the accession string,
(linked to the corresponding Protein View), the
protein molecular mass, and the protein
score. Expect is the number of times we would expect to obtain an equal or
higher score, purely by chance. The lower this expectation value, the more
significant the result. The number of mass values matched to the protein
completes the first line. The second line is the protein description taken from the
Fasta entry.
The Concise Protein Summary is intended to be brief, including only the most important
information. If you want to see details of individual mass matches for all proteins, use the format controls
to switch to the Protein Summary. Or, for a selected protein, click on
the accession string link to load a Protein View
At the foot of the report, the search parameters are summarised. Descriptions
of individual search parameters can be found here.
Sections of the report are described in the order in which they appear.
Use
this link to open an example report in a new browser window or tab.
At the top of the report are a few lines to identify the search uniquely:
search title, date, user name, etc. The database version is identified with either a release
number or an ISO datestamp. The score, accession and description for the top scoring protein
hit is listed.
If the search included the auto-decoy option,
false discovery rate information is displayed at this location.
Following the header, a histogram illustrates the protein score distribution. The
50 best matching proteins are divided into 16 bins according to their score,
and the heights of the bars show the number of matches in each bin.
The protein score is a measure of
the statistical significance of a PMF match. The region in which random matches may be expected
is shaded green. This region extends up to
the significance threshold, which has a default setting of 5%. If a score falls in the green
shaded area, there is greater than a 5% probability that the match was a random event, of no
significance. Conversely, a match in the unshaded part of the histogram has less
than a 5% probability of being a random event. It is quite common to see several proteins getting
the same high score. Even if the protein sequences in the database are non-identical,
the same group of matched mass values may occur in multiple proteins.
These controls enable the report format to be modified. After making changes, press the
"Format As" button to reload the report using the new settings.
For a peptide mass fingerprint search, there are just three controls:
- Report format Choose from the list of available formats
- Significance threshold The default significance threshold is p < 0.05. You can
change this to any value in the range 0.99 to 1E-18.
- Maximum number of hits This value was initially chosen when the search was submitted.
Enter a positive integer if you wish to re-specify the number of protein hits to report.
Of course, the total number of hits actually found by the search may be less. The maximum
number of hits saved to the result file is 50.
Entering the word AUTO or a value of 0 will display all of the hits
that have a protein score exceeding the significance threshold, plus one extra hit.
(You may see an Overview Table at this position)
A search can easily be repeated, so as to investigate the effect of changes in search
parameters. Choose Re-Search All to repeat the search with all mass values or
Search Unmatched to repeat the search with only the mass values that did not
get a match in the top hit. This could be a way of invesigating whether the sample was
a protein mixture, although Mascot has a built-in
PMF mixture mode. If there is statistically significant
evidence for a second or even a third protein, this will
appear in the result report.
Each accession string is
a hyperlink to jump down to the protein hit in the body of the report
The body of the report contains a tabular summary of the
best matching proteins. The number of proteins shown is specified in the
search form, up to a maximum of 50.
For each protein, the first line contains the accession string,
(linked to the corresponding Protein View), the
protein molecular mass, and the protein
score. Expect is the number of times we would expect to obtain an equal or
higher score, purely by chance. The lower this expectation value, the more
significant the result. The number of mass values matched to the protein
completes the first line. The second line is the protein description taken from the
Fasta entry. This is followed by a table
summarising the matched peptide masses. The table columns contain:
- Experimental m/z value
- Experimental m/z transformed to a relative molecular mass
- Relative molecular mass calculated from the matched peptide sequence
- Difference (error) between the experimental and calculated masses
- Inclusive numbering of the residues, starting with 1 for the
N-terminal residue of the intact protein
- Number of missed cleavage sites
- Sequence of the peptide in 1-letter code. The residues that
bracket the peptide sequence in the protein are also shown, delimited
by periods. If the peptide forms the protein terminus, then a dash
is shown instead.
If any variable modifications were used to get the mass match, these are listed
after the sequence string. Note that you should not take this as evidence for the
presence of any post-translational modification. Individual mass matches in a PMF can
be chance events.
Underneath the table, any unmatched mass values are listed as a comma
separated string.
Unless you particularly want to see details of the individual mass matches
for every protein in the hit list, the default
Concise Protein Summary may be a better choice.
At the foot of the report, the search parameters are summarised. Descriptions
of individual search parameters can be found here.
The (optional) overview table provides an animated summary of the results. This feature is
deprecated and cannot be selected in the search form. You are unlikely to
see it unless using older client software that requests this feature.
Each row of the overview table represents a peptide, while each
column represents a protein.
Where a protein contains a mass match, the table cell contains an LED style
indicator. This indicator will light up when it is under the mouse cursor, along with
all the other indicators in the row that correspond to the same peptide. Even when the
sequence database is non-identical, there may still be extensive homology between
entries, and the overview table indicators provide a rapid means of identifying which peptides are common
to which proteins.
In addition to lighting up the indicators, moving the mouse cursor over a cell
displays the query title (if any), the protein accession number, and the peptide
sequence in the three text fields above the table. Clicking on one of the indicators
will load a Protein View for
the selected protein. Clicking on a column header cell will jump down the page to the corresponding
protein hit in the body of the report.
The cells in the first column of the overview table identify each
query by the experimental m/z value of the peptide. When the mouse cursor is
moved over these cells, the query title (if any) is displayed in one of the text fields above
the table. Each cell also contains a check box, which can be used to select a sub-set
of the mass values for a repeat search. The repeat search buttons are modified accordingly,
offering the choices Select All, Select None, and Search Selected
Although it is essential to use MS/MS when dealing with a
complex mixture or looking for a minor component, it is sometimes possible to
detect simple mixtures using PMF.
Mascot PMF searches automatically test for the possibility that the
sample is a mixture of proteins, and any
statistically significant protein mixture
will be reported.
Mascot scores the match for the complete set of experimental mass values to the
in silico digest products of the putative protein mixture. It isn't a subtractive
approach, where the strongest match to a single protein is found, the matched values are
removed, and the remainder used to search for the next protein. It
would be very difficult to provide a true probability-based score for the subtractive
approach. Also, it is less sensitive because, in large data sets, there are likely to be
several shared mass values, that match to more than one of the proteins in the mixture.
In theory, the algorithm can detect a 6 component mixture, but we have never seen a real-life
example. You are very unlikely to see more than 3 components in real data,
even with excellent signal to noise, coverage, and and mass accuracy.
Use
this link to open an example report in a new browser window or tab.
Switch to the Protein Summary report to see which masses match to which protein.
Searching for mixtures is disabled if an intact protein mass is specified,
because this can create artefacts.
Combination searches are where the data include both MS/MS spectra and
molecular mass values. If the results from such a search
are viewed using a Protein Summary report, the protein scores will contain
contributions from both the matching of MS/MS spectra to peptide sequences and the matching
of peptide molecular masses to proteins.
Typically, a peptide mass fingerprint has a similar information content to a single
MS/MS spectrum. If you have good coverage for a particular protein, chances
are you will also have several good MS/MS spectra from the protein,
so the score contribution from the PMF matching is not critical.
On the other hand, if coverage for a particular protein is low, the
peptide mass fingerprint score will also be poor, so is of little use.
One situation where a combined search can be useful is when you have
high coverage PMF data, plus very limited amounts of low quality MS/MS
data. Then, the PMF score contribution may equal or exceed that of the
ions score. However, including poor MS/MS data in a PMF search can work
against you. Imagine that an MS/MS spectrum has a precursor mass match to
a protein, but the MS/MS spectrum is nothing but noise, and gets
random matches to peptides from other proteins. We must then
say that this mass does not 'belong' to the protein, so there should be
no contribution from the peptide mass to the PMF score. In other words,
bad MS/MS data can degrade a PMF match. It is usually safer to discard
the poor quality MS/MS data, and do a conventional PMF
Difficulty also arises when the sample contains more than one or
two proteins. A Protein Summary is limited to 50 proteins because Mascot only
saves PMF scores for the top 50. If the major components in the mixture are
well represented in the database, the whole hit list could be occupied by
variants of these proteins, excluding all the minor components. So, even
if you have a good MS/MS match to a peptide from a minor protein, it may
not appear in the report
Combination searches are useful when you are trying to do something unusual,
like locate exon-intron boundaries or splice variants. In such cases,
you aren't interested in the scores, just in whether a particular mass
match distinguishes between two possibilities.
Note: By default, the information required to create a Protein Summary report
is only saved to the result file for searches of 1000 queries or less. This is
more than adequate for a PMF. It may not be sufficient for certain combination
searches. If you need to increase this limit, search for SplitNumberOfQueries in
the Setup & Installation manual. Increasing this limit will cause searches
to use more memory, and may restrict the size of standard, MS/MS searches
or the number of simultaneous searches that can be run on your server.
|