|
Mascot Search Fields
On the free, public Mascot server, your name and email address
must be entered in these fields.
This information will not be used by us or anyone else to send you
"spam" or junk mail.
The reason for requiring this information is to allow the results of
a search to be returned by email. Usually, search results are returned
promptly to your browser window. However, if your connection to the web
site is broken before the search is complete, they will be emailed to
the supplied address.
If you become disconnected from the site after submitting a search,
please do not resubmit the search, just check your email. This facility
also means that you don't have to wait for search results if you don't
want to, particularly during peak hours when the response may be slower
than normal.
To save you having to type in this information for every search, your
browser will attempt to save it as a local "cookie". If you refuse to accept
this cookie, or your browser doesn't support cookies, the information cannot
be saved and you will have to type it in for every search. If you change
the contents of either of these fields, the new values will be saved when
the search is submitted.
With an in-house Mascot Server, use of these fields is optional.
A text string which will be printed at the top of results report pages.
Can be left blank.
Select the sequence database to be searched.
The databases available on the free, public Mascot server are:
|
Database |
Comment |
EST |
EST divisions of EMBL,
(Environmental_EST, Fungi_EST, Human_EST, Invertebrates_EST, Mammals_EST, Mus_EST, Plants_EST, Prokaryotes_EST, Rodents_EST, Vertebrates_EST) |
NCBInr |
Comprehensive, non-identical protein database |
SwissProt |
High quality, curated protein database |
contaminants |
Common contaminants compiled by Max Planck Institute of Biochemistry, Martinsried |
cRAP |
Common contaminants compiled by the Global Proteome Machine Organization |
For a Peptide Mass Fingerprint, the EST databases are not available. It
makes no sense to search a set of peptide masses against EST because the
entries are just short stretches of sequence, not complete proteins.
For a Sequence Query or an MS/MS Ions Search, on the free, public Mascot server,
you must search one of the protein
databases before searching an EST database. A search of a non-identical protein database takes
only a fraction of the time of an EST search.
If the protein database search fails to produce a positive match, the master
results page will allow you to repeat the search against an EST database.
You can multi-select more than one database for a search. This is useful when you want to
search a single organism database and include sequence of common contaminants in the search, such as BSA
and trypsin. One restriction is that all selected databases must be of the same type. That is,
all protein or all nucleic acid.
The Taxonomy parameter allows searches to be limited to entries from particular
species or groups of species. This can speed up a search, and ensures that the hit
list will only contain entries from the selected species. If the search data are
marginal, and you are completely confident of the origin of the protein, this can
help bring a weak match to the top of the list.
The top level classification, All entries, is self-explanatory. Beneath this are
a number of classifications representing taxons or species, such as Rodentia (Rodents).
The three classifications below Rodentia are Mus, Rattus, and Other
rodentia. Selecting Other rodentia would limit a search to Rodentia excluding
Mus and Rattus.
The unclassified level contains database entries for which the species is undefined or is
a species which doesn't fit into any current classification. There are about 1500 such
sequences in the NCBInr database.
The Species information unavailable level contains those database entries from which
Mascot was unable to extract taxonomy information. Taxonomy information may be present
in the entry, but Mascot was unable to find it. Thus, if a search limited to a more
selective classification than All entries fails to give a result, it may be a wise
precaution to repeat it against Species information unavailable.
For non-redundant databases, a single entry may represent identical sequences
from multiple species. The accession string and title text from the FASTA entry, listed
on the master results page, will usually describe just one of these entries. To see the
equivalent entries, and to explore their taxonomy, follow the accession number link in
the results list to the Protein View. If the hit is from a non-redundant database, and
represents multiple entries with identical sequences, the Protein View will include links to
NCBI Entrez and the
NCBI Taxonomy Browser
for all equivalent entries.
Specify whether the experimental mass values are average or monoisotopic.
If you are unsure which to choose, refer to the mass
accuracy help page.
Select any known or suspected modifications.
Mascot supports two types of modification. Fixed modifications are applied
universally, to every instance of the specified residue(s) or terminus.
There is no computational
overhead associated with a fixed modification, it is simply equivalent to using
a different mass for the modified residue(s) or terminus. For example, selecting
Carboxymethyl (C) means that all calculations will use 161 Da as the mass
of cysteine.
Variable modifications are those which may or may not be present. Mascot tests
all possible arrangements of variable modifications to find the best match. For example,
if Oxidation (M) is selected, and a peptide contains 3 methionines, Mascot will
test for a match with the experimental data for that peptide containing 0, 1, 2, or
3 oxidised methionine residues.
Variable modifications can be a very powerful means of finding a match, but there
are also dangers to be aware of. Even a single variable modification will generate many
possible additional peptides to be tested. More than one variable modification causes
the number of
arrangements to increase geometrically. This means that a search can take dramatically
longer than the same search with fixed modifications. More importantly, testing
all possible arrangements of modifications generates many more random matches, so that
discrimination can be sharply reduced.
The best advice is to use variable
modifications sparingly; never select a large number "just in case".
Mascot allows up to 9 variable modifications to be specified but, in most cases,
a better approach is to do a first pass search with a small number of variable modifications
followed by an error tolerant second pass search to
pick up additional matches to peptides containing unusual modifications.
If chemically inconsistent fixed modifications are combined, an error message
will generated by the search engine.
The 'Show all mods.' checkbox switches between a short list of the most
common modifications and a complete list of all available modifications. The
default state for this checkbox, and all search form fields, is set using the
search form defaults page.
Certain data file formats, SCIEX API III, PerSeptive (.PKS), and Bruker (.XML), do not include m/z
information for the precursor peptide. For these formats only, the Precursor field is
used to specify the m/z value of the parent peptide. The charge state is defined by
the setting of the Peptide Charge field.
The mass of the intact protein in Da applied as a sliding window.
That is, the mass of the contiguous stretch of sequence which contains
all of the matched peptide mass values. This will generally be less than
the mass of the entire sequence entry.
If this field is left blank, there is no restriction on protein mass. Follow
this link for a discussion on the correct
usage of this parameter.
The error window on experimental peptide mass values, (not the error
window for MS/MS fragment ion mass values, which is set using the MS/MS
tol. ± parameter).
Units can be selected from:
|
% |
fraction expressed as a percentage |
mmu |
absolute milli-mass units, i.e. units of .001 Da |
ppm |
fraction expressed as parts per million |
Da |
absolute units of Da |
Sometimes, peak detection chooses the 13C peak rather than the
12C. In extreme cases, it may pick the 13C2
peak. The normal test for a precursor match is:
TOL > absolute(exp - calc)
Assuming the mass values and tolerance are in Da, if this field is set to 1,
the test will also succeed for
TOL > absolute(exp - calc - 1)
If this field is set to 2, the test will succeed for the above two conditions, plus:
TOL > absolute(exp - calc - 2)
This means that you can use a tight mass tolerance and still get a
match to a 13C peak. If you are using a very high accuracy instrument,
note that the precise shifts are the carbon isotope spacings of 1.00335 and 2.00670, rather than 1 and 2.
Error window for MS/MS fragment ion mass values. Units can be either Da or mmu,
as above
Specifies whether experimental peptide mass values in a peptide mass fingerprint
search include the mass of
the charge carrier, MH+ or M-H-, or whether they correspond to neutral,
Mr values.
Used to specify the precursor peptide charge state in a sequence query or an MS/MS ions
search. The peptide mass value supplied in an MS/MS data file is usually an observed m/z value.
The charge state field is used to calculate the
relative molecular mass (Mr) of the precursor from the observed m/z unless
the data file explicitly specifies a different charge state.
N.B. The notation "1+", "2+", etc. is used to save space and
because some HTML form fields do not support the use of superscripts and subscripts.
"1+" always means MH+,
"1-" always means M-H-, "2+" always means
MH2++, etc.
For electrospray data, select "2+" if the peptide m/z data are known to be doubly
charged. If the charge state is uncertain, select "2+ and 3+" to include
both charge states in the search and see which
most clearly discriminates the score of the top matched protein.
For MALDI-PSD, the precursor peptides will generally be MH+, so the
charge state should be set to "1+".
Setting the number of allowed missed cleavage sites to zero simulates a limit digest.
If you are confident that your digest is perfect, with no partial fragments present,
this will give maximum discrimination and the highest score.
If experience shows that your digest mixtures usually include some partials,
that is, peptides with missed cleavage sites, you should choose a setting of 1, or
maybe 2 missed cleavage sites. Don't specify a higher number without good reason,
because each additional level of missed cleavages increases the number
of calculated peptide masses to be matched against the experimental data.
If the actual digest does not contain extended
partials, this simply increases the number of random matches, and so reduces discrimination.
Enter the path to a data file containing
mass data. Data for MS/MS ion searches must be supplied as an ASCII file.
Data for a Peptide Mass Fingerprint can be typed or pasted into the Query
window or supplied as an ASCII file. Details of the file format can be
found here.
N.B. If a file name is present in this field, any contents in the Query
window are ignored.
The contents of the query window are only used when no data file has
been specified.
For a Peptide Mass Fingerprint, the query window must contain a list
of peptide mass values, one per line. An intensity value after the mass value is optional.
Anything after the second numeric value on each line is ignored.
If intensity information is available, values will be selected according to their
intensity so as to get the best score. This can be disabled by setting IteratePMFIntensities
to 0 in mascot.dat
For a Sequence Query, each line entered into the query window must consist of one
experimental peptide mass value, optionally followed by
qualifiers for that peptide:
M seq(
) comp(
) ions(
) tag(
) etag(
)
M is an experimental mass value, seq(
)
is AA sequence information, comp(
) is
AA composition information, ions(
) contains
MS/MS fragment mass and (optionally) intensity values,
tag(
) is a sequence tag,
etag(
) is an error tolerant sequence tag.
A line may contain zero, one, or many qualifiers. If there are multiple sequence tag
qualifiers, and one or more is error tolerant, then all tags are treated as
error tolerant.
N.B. ions(
), tag(
), and etag(
) qualifiers are scored probabilistically.
That is, the more qualifiers that match, the higher the score, but all qualifiers are not
required to match. In contrast, seq(
) and comp(
) are treated as filters.
If a seq(
) or comp(
)
qualifier fails to match, then the entire query is discarded.
Hence, only include seq(
) or comp(
) qualifiers which are known with a
high degree of confidence. Note that using a seq(
) qualifier in a Mascot search is not
equivalent to a performing a Blast search.
If you re-Search a Sequence Query from the results page, you may notice
two additional qualifiers which are used by Mascot internally:
from(
) and
title(
).
This parameter determines the maximum number of hits displayed in a search
results report. If your connection to the internet is slow,
selecting a low number of hits will reduce the time taken to load and display
a search report.
Choose AUTO to display only protein hits with significant scores. In a protein
summary report, one additional hit is reported after the cutoff at the significant score.
This is to ensure that the report provides some feedback, even though there may
be no significant matches.
The precursor peak can often have very high intensity relative to the fragment peaks,
which may give rise to spurious fragment ion matches. It is usually best if the
precursor is removed before the search.
With the default arguments of -1,-1, a smart filter is created.
This removes peaks within the fragment ion tolerance window about each of the
precursor isotope peaks. The number of isotopes is assumed to be as follows:
|
Mr |
Number |
< 1000 |
3 |
1000 - 1999 |
4 |
2000 - 2999 |
5 |
3000 - 3999 |
6 |
4000 - 4999 |
7 |
5000 - 5999 |
8 |
6000 - 6999 |
9 |
> 7000 |
10 |
So, if the precursor m/z was 800, the charge was 2, and fragment ion tolerance
was +/- 0.1 Da, the filter would remove 4 notches of width
m/z 800.0 +/- 0.1
m/z 800.5 +/- 0.1
m/z 801.0 +/- 0.1
m/z 801.5 +/- 0.1
At first sight, this may seem a strange mix of m/z and Da. The reason is that we
need to avoid matches from 1+ fragment ions, whatever the charge on the precursor.
If the arguments are anything other than -1,-1, a single notch is used where the first
argument is the mass offset of the beginning of the notch and the second value is the
mass offset of the end of the notch. For the precursor in the last example, if the
arguments were -1,4 then the notch would run from m/z 799.5 to m/z 802.0. However,
if the precursor charge was 1, then the notch would be from m/z 799 to m/z 804.
For an MS/MS Ions Search, choose the description which best matches the type of
instrument used to acquire the data. This setting determines which fragment ion series will
be used for scoring, according to the following table. "Default" corresponds to
the configuration used in Mascot version 1.7 and earlier.
|
|
Default |
ESI QUAD TOF |
MALDI TOF PSD |
ESI TRAP |
ESI QUAD |
ESI FTICR |
MALDI TOF TOF |
ESI 4 SECT |
FTMS ECD |
ETD TRAP |
MALDI QUAD TOF |
MALDI QIT TOF |
MALDI ISD |
1+ fragments |
X |
X |
X |
X |
X |
X |
X |
X |
X |
X |
X |
X |
X |
2+ fragments if precursor 2+ or higher |
X |
X |
|
X |
X |
X |
|
X |
X |
X |
X |
|
|
2+ fragments if precursor 3+ or higher |
|
|
|
|
|
|
|
|
|
|
|
|
|
Immonium ions |
|
|
X |
|
|
|
X |
X |
|
|
X |
X |
|
a series ions |
X |
|
X |
|
|
|
X |
X |
|
|
|
X |
X |
a-NH3 if fragment includes RKNQ |
X |
|
X |
|
|
|
X |
|
|
|
|
X |
|
a-H2O if fragment includes STED |
|
|
X |
|
|
|
X |
|
|
|
|
X |
|
b series ions |
X |
X |
X |
X |
X |
X |
X |
X |
|
|
X |
X |
|
b-NH3 if fragment includes RKNQ |
X |
X |
X |
X |
X |
X |
X |
X |
|
|
X |
X |
|
b-H2O if fragment includes STED |
|
X |
X |
X |
X |
X |
X |
X |
|
|
X |
X |
|
c series ions |
|
|
|
|
|
|
|
|
X |
X |
|
|
X |
x series ions |
|
|
|
|
|
|
|
|
|
|
|
|
|
y series ions |
X |
X |
X |
X |
X |
X |
X |
X |
X |
X |
X |
X |
X |
y-NH3 if fragment includes RKNQ |
X |
X |
|
X |
X |
X |
X |
|
|
|
X |
X |
|
y-H2O if fragment includes STED |
|
X |
|
X |
X |
X |
X |
|
|
|
X |
X |
|
z series ions |
|
|
|
|
|
|
|
X |
|
|
|
|
|
z+H series ions |
|
|
|
|
|
|
|
|
X |
X |
|
|
|
z+2H series ions |
|
|
|
|
|
|
|
|
X |
X |
|
|
X |
internal yb < 700 Da |
|
|
|
|
|
|
X |
X |
|
|
X |
X |
|
internal ya < 700 Da |
|
|
|
|
|
|
X |
X |
|
|
X |
X |
|
y or y++ must be significant |
|
|
|
|
|
|
|
|
|
|
|
|
|
y or y++ must be top scoring series |
|
|
|
|
|
|
|
|
|
|
|
|
|
d or d' series ions |
|
|
|
|
|
|
X |
|
|
|
|
|
|
v series ions |
|
|
|
|
|
|
X |
|
|
|
|
|
|
w or w' series ions |
|
|
|
|
|
|
X |
|
|
|
|
|
|
Other Parameters
There are a number of other search parameters, but their default settings
should not be changed under normal circumstances.
For this reason, they are not accessible from the browser interface. The
defaults can be over-ridden by using
embedded parameters, either in a data file or in
the query window. But, be warned that you change them at your own risk!
|
|
|
|
|