SLiMAN - Documentation

Description

SLiMAN - Short Linear Motif Analysis

SLiMAN is a webserver devoted to analysis of interactomic results, to suggest possible ELM/PFam pairs within the submited list of putative interactants.

From a simple list of proteins (Uniprot Acc/Ids), SLiMAN will process the data with 3 successive levels of analysis named : SLiMIP, SLiMID and SLiMIM. These three steps are described below.

SLiMIP - Short Linear Motif Interaction Prediction

For each of the input proteins, regular expressions from the linear motifs referenced in the database ELM are used to parse de corresponding sequences. In parallel, PFam domains are matched to the same set of sequences. SLiMIP gather complementary information from Uniprot, ELM, IUpred2, BioGrid and PhosphoSitePlusⓇ. Parameters for the filtering (e.g. : ELM E-value) can be modified interactively to filter the output lists of putative pairs.

SLiMIP draws a table indicating possible pairings with information for each hit :
-> Associated PFam ID
-> ELM class name
-> ELM motif class E-value
-> ELM expremimentaly validated matchs (Verified Instances)
-> IUpred2 disorder scores of the motif
-> BioGrid data for the association of two proteins
-> PhosphoSitePlusⓇ PTMs
-> SLiMID available templates count

In addition, SLiMIP provides direct links to the corresponding information pages from PFam, ELM, and PhosphoSitePlus webservices. When possible, a link to potential SLiMID templates for a ELM/PFam pair is provided.

SLiMID - Short Linear Motif Interracting with Domain

The current database contains 5064 3D structures of protein-peptide complexes to serve as templates to model hits sharing the same ELM/PFam elements.
The motifs boundaries can be edited for the ELM motif and the PFam domain. Sequences identity and sequence coverage of the alignment is provided to help the choice of the template. Residues belonging to the protein-peptide interface are colored (red/orange/green for contact distances of 4/5.5/7 Angstroms).
Each alignment can be selected for modeling the 3D complex using SCWRL3.0. SLiMIP hits can be visualized as sequences, and alignments with corresponding SLiMID templates are performed.

SLiMIM - Short Linear Motif Interaction Modeling

From the alignments selected in SLiMID, models of the complex are performed in two steps.
First, the variable side-chains of the PFam domain are modeled using SCWRL 3.0 in the presence of the peptide found in the template.
Then, the new queried peptide sequence is modeled also with SCWRL 3.0 in the presence of the new modeled PFam domain. Generated models are deduced from the alignments performed by BLASTⓇ or MAFFT.

The 3D models of the complexes can be visualized using JSmol applet or downloaded in PDB format.
In addition, each model can be selected or discarded, and such selection is forwarded to SLiMIP results display.

Tutorials

Videos

I. Work with SLiMAN

Inputs : 0"00 -> 1"36
SLiMIP : 1"36 -> 2"50
SLiMID : 2"50 -> 3"38
SLiMIM : 3"38 -> 4"27
Modify domain segmentation : 4"27 -> 6"55
Retrieve results : 6"55 -> 7"51
Play with parameters : 7"51 -> 10"37
BioGRID extention : 10"37 -> 11"55

Textual

Download the SLiMAN_MANUAL.pdf here !

I. SLiMAN input

Input File example 1
Input File example 2
Input File example 3
Input File example 4
Input File example 5

II. Running query

III. SLiMIP results

VI. SLiMIP parameters

V. SLiMID segmentation

VI. SLiMID alignments

VII. SLiMIM results

VII. BioGRID extention Input

VIII. BioGRID extention results

Databases

ELM - The Eukaryotic Linear Motif

“ ELM is a computational biology resource for investigating candidate functional sites in eukaryotic proteins. Functional sites which fit to the description "linear motif" are currently specified as patterns using Regular Expression rules. To improve the predictive power, context-based rules and logical filters are being developed and applied to reduce the amount of false positives.
The current version of the ELM server provides core functionality including filtering by cell compartment, phylogeny, globular domain clash (using the SMART/Pfam databases) and structure. In addition, both the known ELM instances and any positionally conserved matches in sequences similar to ELM instance sequences are identified and displayed (see ELM instance mapper). Although the ELM resource contains a large collection of functional site motifs, the current set of motifs is not exhaustive. ”

More about ELM

PFam - Protein Families

“ The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).
Proteins are generally composed of one or more functional regions, commonly termed domains. Different combinations of domains give rise to the diverse range of proteins found in nature. The identification of domains that occur within proteins can therefore provide insights into their function.
Pfam also generates higher-level groupings of related entries, known as clans. A clan is a collection of Pfam entries which are related by similarity of sequence, structure or profile-HMM.
The data presented for each entry is based on the UniProt Reference Proteomes but information on individual UniProtKB sequences can still be found by entering the protein accession. Pfam full alignments are available from searching a variety of databases, either to provide different accessions (e.g. all UniProt and NCBI GI) or different levels of redundancy. ”

More about PFam

UniprotKB

“ The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). The UniProt consortium and host institutions EMBL-EBI, SIB and PIR are committed to the long-term preservation of the UniProt databases.
UniProt is a collaboration between the European Bioinformatics Institute (EMBL-EBI), the SIB Swiss Institute of Bioinformatics and the Protein Information Resource (PIR). Across the three institutes more than 100 people are involved through different tasks such as database curation, software development and support.
EMBL-EBI and SIB together used to produce Swiss-Prot and TrEMBL, while PIR produced the Protein Sequence Database (PIR-PSD). These two data sets coexisted with different protein sequence coverage and annotation priorities. TrEMBL (Translated EMBL Nucleotide Sequence Data Library) was originally created because sequence data was being generated at a pace that exceeded Swiss-Prot's ability to keep up. Meanwhile, PIR maintained the PIR-PSD and related databases, including iProClass, a database of protein sequences and curated families. In 2002 the three institutes decided to pool their resources and expertise and formed the UniProt consortium. ”

More about UniprotKB

PDB - Protein Data Bank

“ The Protein Data Bank (PDB) was established as the 1st open access digital data resource in all of biology and medicine (Historical Timeline). It is today a leading global resource for experimental data central to scientific discovery.
Through an internet information portal and downloadable data archive, the PDB provides access to 3D structure data for large biological molecules (proteins, DNA, and RNA). These are the molecules of life, found in all organisms on the planet.
Knowing the 3D structure of a biological macromolecule is essential for understanding its role in human and animal health and disease, its function in plants and food and energy production, and its importance to other topics related to global prosperity and sustainability. ”

More about PDB

BioGRID

“ The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (thebiogrid.org). BioGRID currently holds over 1,740,000 interactions curated from both high-throughput datasets and individual focused studies, as derived from over 70,000+ publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (S. cerevisiae), fission yeast (S. pombe) and thale cress (A. thaliana), and efforts to expand curation across multiple metazoan species are underway. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. ”

More about BioGRID

PhosphoSitePlus®

“ PhosphoSitePlus® provides comprehensive information and tools for the study of protein post-translational modifications (PTMs) including phosphorylation, acetylation, and more. The web use is free for everyone including commercial. ”

More : PhosphoSitePlus®

Software

IUpred2

“ Intrinsically disordered proteins (IDPs) have no single well-defined tertiary structure under native conditions. IUPred2A is a combined web interface that allows to identify disordered protein regions using IUPred2 and disordered binding regions using ANCHOR2. IUPred2A is also capable of identifying protein regions that do or do not adopt a stable structure depending on the redox state of their environment. IUPred2A supersedes the previous IUPred and ANCHOR servers. ”

More about IUpred2A

MAFFT

“ MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <~200 sequences), FFT-NS-2 (fast; for alignment of <~30,000 sequences), etc. ”

More about MAFFT

BLAST®

“ In bioinformatics, BLAST (basic local alignment search tool) is an algorithm and program for comparing primary biological sequence information, such as the amino-acid sequences of proteins or the nucleotides of DNA and/or RNA sequences. A BLAST search enables a researcher to compare a subject protein or nucleotide sequence (called a query) with a library or database of sequences, and identify database sequences that resemble the query sequence above a certain threshold. For example, following the discovery of a previously unknown gene in the mouse, a scientist will typically perform a BLAST search of the human genome to see if humans carry a similar gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on similarity of sequence. ”

More about BLAST® : wiki
More about BLAST® : server

SCWRL 3.0

“ SCWRL4 is based on a new algorithm and new potential function that results in improved accuracy at reasonable speed. This has been achieved through:

a new backbone-dependent rotamer library based on kernel density estimates
averaging over samples of conformations about the positions in the rotamer library
a fast anisotropic hydrogen bonding function
a short-range, soft van der Waals atom-atom interaction potential
fast collision detection using k-discrete oriented polytopes
a tree decomposition algorithm to solve the combinatorial problem;
and optimization of all parameters by determining the interaction graph within the crystal environment using symmetry operators of the crystallographic space group. ”

More about SCWRL

References

ELM - the eukaryotic linear motif resource in 2020. Nucleic Acids Research (2020) (PMID:31680160)
Pfam: The protein families database in 2021. J. Mistry, S. Chuguransky, L. Williams, M. Qureshi, G.A. Salazar, E.L.L. Sonnhammer, S.C.E. Tosatto, L. Paladin, S. Raj, L.J. Richardson, R.D. Finn, A. Bateman. Nucleic Acids Research (2020)
RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Research (2021)
Biogrid: A General Repository for Interaction Datasets. Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. Nucleic Acids Research (2006)
PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Hornbeck PV, Zhang B, Murray B, Kornhauser JM, Latham V, Skrzypek E. Nucleic Acids Research (2015)
UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Research (2021)
IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Bálint Mészáros, Gábor Erdős, Zsuzsanna Dosztányi Nucleic Acids Research (2018)
BLAST - Basic local alignment search tool. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Journal of Molecular Biology (1990)
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Katoh,K., Misawa,K., Kuma,K., and Miyata,T. Nucleic Acid Research (2002)
SCWRL and MolIDE: computer programs for side-chain conformation prediction and homology modeling. Wang, Qiang et al. Nature protocols (2008)
Jmol: an open-source Java viewer for chemical structures in 3D. http://www.jmol.org/
Python 3 Reference Manual. Van Rossum, G., & Drake, F. L. Scotts Valley, CA: CreateSpace. (2009)

Index