Download Area |
STRING uses a relational database system (PostgreSQL)
to store primary data and precomputed predictions. For convenience, we provide selected data-items as flatfiles below. Please note: the complete dataset of STRING is also available - but it requires signing a license agreement (free for academics, see here for details).
Files that do not require a separate license agreement are published under a Creative Commons Attribution 3.0 License or a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.
|
| Protein mode (flatfiles) |
| | |
| - File - | - Description - | - Access - | |
| protein.sequences.v8.2.fa.gz (583.1 Mb) | sequences of all proteins in STRING |  | |
| protein.links.v8.2.txt.gz (1.1 Gb) | protein network data (scored links between proteins) |  | |
| protein.links.detailed.v8.2.txt.gz (1.6 Gb) | protein network data (incl. subscores per channel); commercial entities require a license. |  | |
| protein.actions.v8.2.txt.gz (63.1 Mb) | interaction types for protein links |  | |
| protein.actions.detailed.v8.2.txt.gz (69.7 Mb) | interaction types for protein links (incl. subscores per type); commercial entities require a license. |  | |
| protein.links.full.v8.2.txt.gz (1.8 Gb) | protein network data (incl. distinction: direct vs. interologs); all users require a license | license required | |
| |
| COG mode (flatfiles) |
| | |
| - File - | - Description - | - Access - | |
| COG.mappings.v8.2.txt.gz (43.5 Mb) | orthologous groups (COGs,NOGs,KOGs,...) and their proteins |  | |
| protein.sequences.v8.2.fa.gz (583.1 Mb) | sequences of all proteins in STRING (can be used as a blast db) |  | |
| species.mappings.v8.2.txt.gz (5.6 Mb) | presence / absence of orthologous groups in species |  | |
| COG.links.v8.2.txt.gz (64.6 Mb) | association scores between orthologous groups |  | |
| COG.links.detailed.v8.2.txt.gz (88.7 Mb) | association scores (incl. subscores per channel); commercial entities require a license. |  | |
| |
| General flatfiles & full database dumps |
| | |
| - File - | - Description - | - Access - | |
| species.v8.2.txt (44.8 Kb) | organisms in STRING |  | |
| species.tree.v8.2.txt (14.9 Kb) | STRING tree of species |  | |
| database.schema.v8.2.pdf (171.7 Kb) | STRING database schema |  | |
| protein.aliases.v8.2.txt.gz (136.6 Mb) | aliases for STRING proteins: locus names, accessions, descriptions... |  | |
| items_schema.v8.2.sql.gz (1.2 Gb) | full database, part I: the players (proteins, species, COGs,...) | license required | |
| network_schema.v8.2.sql.gz (2 Gb) | full database, part II: the networks (nodes, edges, scores,...) | license required | |
| evidence_schema.v8.2.sql.gz (3.7 Gb) FTP | full database, part III: interaction evidence (datasets, abstracts, predictions, ...) | license required | |
| homology_schema.v8.2.sql.gz (38.9 Gb) FTP | full database, part IV: homology data (all-against-all BLAST searches) | license required | |
| |
| Please note: STRING is subject to periodic updates. Therefore, do visit back on this page to get the latest associations whenever needed. Protein identifiers in the above files contain two substrings each: 'NNNNN.aaaaaa'. The first substring is the NCBI taxonomy species identifier, and the second substring is the RefSeq/Ensembl-identifier of the protein. Please note that some of the files are very large. You may experience problems downloading them, depending on your browser and/or operating system. For files larger than 2 GBytes, the best way is to download them using a unix-system and its command-line 'ftp' utility. To do that, please visit our FTP server (see here). |