IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

SWISS-PROT release 38 available

Elisabeth Gasteiger Elisabeth.Gasteiger at isb-sib.ch
Wed Aug 18 19:52:37 EST 1999


(I) DATABASES AVAILABILITY ANNOUNCEMENT


Name        : SWISS-PROT 
Description : Protein sequence database.
Release     : 38.0 of July 1999
Statistics  : 80'000 fully annotated sequences , 29'085'965 amino acids,
              64'965 references.
Citation    : Bairoch A., Apweiler R.;
              Nucleic Acids Res. 27:49-54(1999).
Availability: FTP: ftp://ftp.expasy.ch/databases/swiss-prot
                   ftp://ftp.ebi.ac.uk/pub/databases/swissprot
              WWW: http://www.expasy.ch/sprot/
                   http://www.ebi.ac.uk/sprot/


Name        : ENZYME
Description : Enzymes nomenclature database.
Release     : 25.0 of July 1999
Statistics  : 3'704 enzymes described.
Citation    : Bairoch A.;
              Nucleic Acids Res. 27:310-311(1999).
Availability: FTP: ftp://ftp.expasy.ch/databases/enzyme
                   ftp://ftp.ebi.ac.uk/pub/databases/enzyme
              WWW: http://www.expasy.ch/enzyme/


Name        : PROSITE
Description : Protein domains and families database.
Release     : 16.0 of July 1999
Statistics  : 1'034 documentation entries;
              1'374 patterns, rules and profiles/matrices.
Citation    : Hofmann K., Bucher P., Falquet L., Bairoch A.
              Nucleic Acids Res. 27:215-219(1999).
Availability: FTP: ftp://ftp.expasy.ch/databases/prosite
                   ftp://ftp.ebi.ac.uk/pub/databases/prosite
              WWW: http://www.expasy.ch/prosite/

---------------------------------------------------------------------

(II) SUMMARY OF CURRENT CHANGES AND FUTURE DEVELOPMENTS IN SWISS-PROT,
     PROSITE AND ENZYME

Note: a much more  complete  description  of  the  changes  and  future
developments that are listed below is available from the release notes.
The release notes can be accessed from the WWW at the address:

            http://www.expasy.ch/cgi-bin/lists?relnotes.txt

or downloaded by FTP from:

    ftp://ftp.expasy.ch/databases/swiss-prot/release/relnotes.txt
    ftp://ftp.ebi.ac.uk/pub/databases/swissprot/release/relnotes.txt


A) Summary of the changes in SWISS-PROT release 38, ENZYME release 25
   and PROSITE release 16.

In SWISS-PROT:

- 2'106 sequences  have been  added, 400 sequence have been updated and
  12'576 entries have been the target of annotation updates;
- We have  gradually started  the conversion of SWISS-PROT entries from
  all UPPER  CASE' to  'MiXeD CaSe'.  The  line-types  that  have  been
  converted between  release 37  and 38  are: DT  (DaTe), OS  (Organism
  Species), OC (Organism Classification), OG (OrGanelle), RL (Reference
  Location) and  KW (KeyWord).  The RT  (Reference  Title)  lines  were
  already introduced in mixed-case at release 37;
- We have  introduced a  unique identifier for all VARIANT feature keys
  in human  sequence entries.  This change  is the  first step  towards
  providing a  unique identifier  to  all  SWISS-PROT  features.  Human
  sequence variants  were  chosen  as  a  prototype  for  this  planned
  improvement as these identifiers will allow to directly link specific
  sequence  variants  to  the  relevant  entries  in  disease  mutation
  databases as  well as  to provide  these databases  with a  method to
  implement reciprocal links;
- We have  introduced a  new 'topic'  for the  comments (CC) line type:
  'MISCELLANEOUS'. This  topic is  used for  all comments  which do not
  belong to any other already defined topic;
- Cross-references have been added to the Zebrafish Information Network
  (ZFIN) database  and to  the CarbBank  Complex Carbohydrate Structure
  Database (CCSD);
- We have  switched from  'pID' to  'protein_ID' in cross-references to
  the   DNA    sequence   databases.   The   DNA   sequence   databases
  (EMBL/GenBank/DDBJ) recently changed their referencing system for CDS
  (CoDing Sequence).  They used  to associate every CDS in the database
  with what  was called  a pID.  pID have  been replaced by what is now
  called protein_ID'  (protein  sequence  IDentifier).  The  protein_ID
  consists of  a stable ID portion (8 characters: 3 letters followed by
  5 numbers)  plus a  version number  after a  decimal point  (example:
  AAA03208.1).  The  version  number  only  changes  when  the  protein
  sequence coded  by the  CDS changes,  while the  stable part  remains
  unchanged;
- We  have   continued  to  overhaul  the  information  stored  in  the
  'SIMILARITY' comment  topic so that it can be used to select specific
  protein families  in a way complementing and supplementing the cross-
  references to the PROSITE and Pfam databases;
- The format  of RL  lines for  submissions to  the DNA  databases  was
  slightly changed  to follow  more closely the format used by the EMBL
  nucleotide sequence database;
- Two  new   documents  were  added  to  the  long  list  of  documents
  distributed  at   each  release   of   the   database.   These   are:
  "ANNBIOCH.TXT": SWISS-PROT annotation: how is biochemical information
  assigned to  sequence entries  and "HUMCHR16.TXT"  Index  of  protein
  sequence entries encoded on human chromosome 16;
- Many improvements  were carried  out on  the ExPASy  server. The most
  noteworthy in the context of SWISS-PROT are:

  o  We have  switched our  default view  of SWISS-PROT  entry to  that
     provided by  the NiceProt  tool. NiceProt  offers a  user-friendly
     tabular view  of SWISS-PROT entries. Access to the original SWISS-
     PROT format  is maintained  and is  directly  available  from  the
     NiceProt view;
  o  We have  revised the ExPASy file and directory structure, in order
     to have the vast amount of data that has accumulated on the server
     since September 1993 available in a more structured manner, and to
     facilitate replication  on  our  mirror  sites.  This  has  caused
     certain  changes  in  html  links,  and  you  should  update  your
     bookmarks and links accordingly;
  o  WWW links  have been  implemented between SWISS-PROT and CarbBank,
     EcoGene and ZFIN.
In ENZYME:

- A significant number of synonyms (AN lines) were added to a number of
  entries;
- The WWW version of ENZYME on ExPASy now provides a more user-friendly
  tabular view  of enzyme  entries through  a new tool called NiceZyme.
  NiceZyme also  provides direct  links, through Medline, to literature
  references relevant to a specific enzyme.

In PROSITE:

- 20 documentation entries have been added and 180 have been updated;
- The WWW  version of  PROSITE on  ExPASy now  provides  a  more  user-
  friendly tabular  view of  PROSITE documentation  and patterns/matrix
  entries through a set of new tools called NiceDoc and NiceSite.


B) Future developments

Here is what was announced as planned changes for release 39:

- We will continue the conversion of SWISS-PROT entries from all 'UPPER
  CASE' to  'MiXeD CaSe'.  In release 39 we are planning to convert the
  RA (Reference  Author) and RC (Reference Comment) line types. We will
  also convert  the  gene  designations  in  the  DR  (Database  cross-
  Reference) lines  for MGD, EcoGene, StyGene, SubtiList and DictyDb to
  mixed case;
- We will  introduce in the next release a new 'topic' for the comments
  (CC) line-type: 'PHARMACEUTICAL'; this topic will describe the use of
  a specific protein as a pharmaceutical drug. The information provided
  by such  a topic will include the brand name(s) under which a protein
  is available,  the name(s) of the compani(es) that produce it as well
  as a short description of the therapeutic usage of the protein;
- Introduction of  a new  FT key  'SE_CYS'. Selenocysteine  is the 21st
  natural amino acid. Very recently the joint nomenclature committee of
  the IUPAC/IUBMB  officially recommended  a three-letter  and  a  one-
  letter symbol  for selenocysteine, namely 'Sec' and 'U'. We recognize
  that introducing  a new one-letter code in the sequence records would
  disrupt most,  if not  all, sequence  analysis software. We therefore
  decided to  change the  rules used  in  SWISS-PROT  to  annotate  the
  presence of  Sec residues  in sequence  entries by  using a  specific
  feature key  (SE_CYS) to  indicate the presence of a Sec residue in a
  specific position.  In  the  sequence record, the Sec  residues  will
  continue to be represented by 'C' (Cys);
- Starting with  release 39,  there can be more than one AC (ACcession)
  line per  SWISS-PROT entry.  Strictly speaking  this is  not a format
  change and the users manual of SWISS-PROT always indicated that there
  could be more than one AC line per entry;
- Extension of  the accession  number system.  We have  now used up all
  possible numbers with 'O', 'P' and  'Q'. As  already announced in the
  last 18  months we have decided to keep a system of accession numbers
  based  on  a  six-character  code,  but  with  the  following  format
  extension:

    1        2       3          4            5            6
    [O,P,Q]  [0-9]  [A-Z, 0-9]  [A-Z, 0-9]   [A-Z, 0-9]   [0-9]

- What the  above means  is that we will keep a six-character code, but
  that in  positions 3, 4 and 5 of this code any combination of letters
  and numbers can be present;
- Change in  the syntax  of the  SQ line. The SQ (SeQuence header) line
  marks the beginning of the sequence data and gives a quick summary of
  its content. The last information item in the SQ line is a 32-bit CRC
  (Cyclic Redundancy  Check) value which is computed from the sequence.
  In the  next release we will replace the 32-bit CRC value by a 64-bit
  CRC.

There are  many things  we are  planning to  do in a longer time scale,
here is a quick overview of some of these improvements:

- Further line types will be converted to mixed case in release 40, and
  this process will probably be completed by January 1, 2000;
- To cater  for titin  and related giant proteins, we will increase the
  maximum sequence size from 9999 to 32767 amino acids;
- We want  to distribute SWISS-PROT in a relational format (in addition
  to the flat file format);
- We are  planning to  change the  length of  the entry  names from  10
  characters (4_5) to 11 (5_5).

Of course  the above  list is  far from being definitive, we await your
suggestions!

----------------------------------------------------------------------------
SWISS-PROT is copyright.  It is produced through a collaboration between
the
Swiss Institute  of  Bioinformatics   and the EMBL Outstation - the
European
Bioinformatics Institute. There are no restrictions on its use by
non-profit
institutions as long as its  content is in no way modified. Usage by and
for
commercial entities requires a license agreement.  For information about
the
licensing  scheme  see: http://www.isb-sib.ch/announce/ or send  an
email to
license at isb-sib.ch.
----------------------------------------------------------------------------

-- 
-----------------------------------------------------------------
Elisabeth Gasteiger
Swiss Institute of Bioinformatics
c/o Department of Medical Biochemistry
1, rue Michel Servet                      Tel. (+41 22) 702 54 79
CH - 1211 Geneva 4 Switzerland            Fax  (+41 22) 702 55 02
Elisabeth.Gasteiger at isb-sib.ch            http://www.expasy.ch/ 
-----------------------------------------------------------------




More information about the Bionews mailing list

Send comments to us at biosci-help [At] net.bio.net