Of Mice and Men
A Brief History
One of the first programs for identifying proteins by
peptide mass fingerprinting, MOWSE, developed out of a collaboration
between Darryl Pappin (Imperial Cancer Research Fund, UK) and Alan
Bleasby (SERC Daresbury Laboratory, UK)
[Pappin, 1993].
The name chosen was an acronym of Molecular
Weight Search. The MOWSE databases were fully indexed so as to allow
very rapid searching and retrieval of sequence data. This approach
was inspired by the speed of Alan Bleasby's
DELPHOS package, a
structured query language developed alongside the
OWL non-redundant
protein sequence database
[Bleasby, 1990].
A second important feature of MOWSE
was the scoring algorithm. Although several groups independently
developed peptide mass fingerprint packages at around the same time,
MOWSE was the first to take account of the non-uniform distribution
of peptide sizes which result from digestion by an enzyme. The
original method of submitting a search to MOWSE was by means of an
email server.
The next major development, MOWSE II, was the addition of
amino acid sequence and composition qualifiers
[Pappin, DJC, Rahman, D, Hansen, HF, Bartlet-Jones, M, Jeffery, W and Bleasby, AJ,
Chemistry, mass spectrometry and peptide-mass databases: Evolution of methods
for the rapid identification and mapping of cellular proteins, Mass Spectrom. Biol. Sci., 135-150 (1996)].
This was made available in 1994 as a CGI program on the
UK Human Genome Mapping Project web server, but has since disappeared.
MOWSE II was fast and offered some unique functionality. However, it still
required indexed molecular weight databases to be constructed prior
to searching. The drawback of this approach was that a database had
to be built for each new enzyme and for each set of amino acid
residue masses. This made it difficult to support searching proteins
in which residues had been chemically or post-translationally
modified, since a new database was required for each combination of
modifications.
In 1997, the decision was made to restructure MOWSE so as to
compute mass values directly from FASTA sequence databases, "on the
fly". This removed the limitations on modified residues, but
necessitated a complete rewrite of the computer code. David Perkins,
working in Darryl's group at ICRF, started this work in late 1997.
Like Darryl and Alan, David came from Professor John Findlay's group
at Leeds University, where he worked on the continuing development of
OWL and new methods for protein sequence analysis and structure
visualisation.
David Perkins (left) and
Darryl Pappin (right)
outside ICRF in May 1999
From the outset, the new algorithms were coded for parallel
execution on multiprocessor platforms. One additional feature of the
new code was the facility to specify selected MS/MS fragment ion
masses as an "ions" qualifier to a peptide mass value. This turned
out to work very well, and it became clear that it was only a small
additional step to support the searching of raw MS/MS peak lists,
something which had only previously been possible using the
Sequest
program from John Yates and Jimmy Eng
[Eng, 1994].
At this stage, MOWSE III
was only available within ICRF. It supported all the proven methods
of protein identification: peptide-mass fingerprint, MS/MS fragment
ion search, and searches which combined mass data with amino acid
sequence or composition. It performed all the necessary calculations
on the fly, so that it could search any FASTA format database. And,
it used a powerful scoring algorithm based on true enzyme kinetics.
In mid-1998 it was decided that a collaboration with an
external bioinformatics company was the fastest route to distributing
MOWSE to a wider audience. Matrix Science secured a licence from
ICRF to develop and distribute MOWSE, although a name change was
suggested to avoid confusion with the earlier public domain
versions. The name chosen was MASCOT, and considerable work went
into porting MASCOT to a variety of platforms (SGI, SUN, DEC, and Windows NT),
structuring for fully automated, high throughput protein
identification, and documentation. Free and unrestricted access to
MASCOT has been available on the Matrix Science web site since early
1999. The company only seeks a licence fee from users who want to run
MASCOT on their own server. Matrix Science works closely with Darryl
and David to continue adding new functionality to MASCOT. Recent
developments include transforming the MOWSE score into a measure of
absolute probability and a powerful implementation of variable
(non-quantitative) modifications.
|