Mascot Search Overview
Mascot is a powerful search engine which uses mass spectrometry
data to identify proteins from primary sequence databases.
While a number of similar programs available, Mascot is unique
in that it integrates all of the proven methods of
searching. These different search methods can be categorised as
follows:
- Peptide Mass Fingerprint
in which the only experimental data are peptide mass values,
(detailed description)
- Sequence Query in
which peptide mass data are combined with amino acid sequence
and composition information. A super-set of a sequence tag query,
(detailed description)
- MS/MS Ion Search
using uninterpreted MS/MS data from one or more peptides,
(detailed description)
The general approach for all types of search is to take a small
sample of the protein of interest and digest it with a proteolytic
enzyme, such as trypsin. The resulting digest mixture is analysed
by mass spectrometry.
Different types of mass spectrometer have different capabilities.
A simple instrument will measure a set of molecular weights for
the intact mixture of peptides. An instrument with MS/MS capability
can additionally provide structural information by recording the
fragment ion spectrum of a peptide. Usually, the digest mixture
will be separated by chromatography prior to MS/MS analysis, so
that MS/MS spectra from individual peptides can be measured.
The experimental mass values are then compared with calculated
peptide mass or fragment ion mass values, obtained by applying
cleavage rules to the entries in a comprehensive primary
sequence database. By using an appropriate scoring algorithm,
the closest match or matches can be identified. If the "unknown"
protein is present in the sequence database, then the aim is
to pull out that precise entry. If the sequence database does
not contain the unknown protein, then the aim is to pull out those
entries which exhibit the closest homology, often equivalent proteins
from related species.
Tutorials on database searching:
The sequence databases that can be searched on the Matrix Science free, public Mascot server
are:
- SwissProt is a high quality,
curated protein database. Sequences are non-redundant, rather than non-identical, so
you may get fewer matches for an MS/MS search than you would from a comprehensive database,
such as NCBInr. SwissProt is ideal for peptide mass fingerprint searches and MS/MS searches
of well characterised organisms where it isn't essential to match every single spectrum.
- NCBInr
is a comprehensive, non-identical protein database maintained by NCBI
for use with their search tools BLAST and Entrez. The entries
have been compiled from GenBank CDS translations,
PIR, SWISS-PROT, PRF, and PDB.
- EMBL EST divisions
contain
"single-pass" cDNA sequences,
or Expressed Sequence Tags, from a number of organisms. During a Mascot search, the
nucleic acid sequences are translated in all six reading
frames. There are 10 divisions: Environmental_EST, Fungi_EST, Human_EST, Invertebrates_EST,
Mammals_EST, Mus_EST, Plants_EST, Prokaryotes_EST, Rodents_EST, and Vertebrates_EST.
- contaminants
is a database of common contaminants compiled by Max Planck Institute of Biochemistry, Martinsried
- cRAP
is a database of common contaminants compiled by the Global Proteome Machine Organization
|