- Input Fields
- Output Files and Interpretation
- Who do I contact if there are any problems or if I want to give feedback?
- I uploaded my Data/ID/Phospho-site maps but it keeps giving me an error. What can I do?
- My phosphoproteomics data is from a species, which is not supported by SELPHI. Is there something I can do to be able to use your tool?
- I added a kinase inhibitor to my samples and even though I can see peptides going down in response, these are not correlated with the kinase. Is something wrong? What should I do?
- How does SELPHI deal with missing peptides?
- How many data points/conditions/samples do I need to have? How are replicates treated?
- Does SELPHI depend on a specific version of Uniprot?
- I can't figure out how to format my files to submit to SELPHI. What should I do?
- If kinase A phosphorylates kinase B, and kinase B phosphorylates substrate C: If only kinase A and substrate C are identified: will the correlation analysis assign kinase A as regulator of substrate C?
- SELPH-Convert is having trouble finding all my columns. Why is that and what can I do?
In case you are struggling to make SELPHI-compatible files, we have created SELPH-Convert to help you generate SELPHI-compatible files! It is still in beta testing, but give it a shot and let us know what you think or if you are having any trouble.
Please note that it is obligatory to provide 1,2 and either 6 AND 7, OR 8. If you only provide 8 SELPHI will map and id your proteins for you. If your peptides can't be mapped onto the sequences provided, they will get an nf (not found) tag. e.g. P53_Snf_Tnf. IMPORTANT: Please do not include spaces in your file names.
Required Fields for setting up the analysis
1. JobnameGive a name to your project or run that will be used to generate your results folders and page. Please avoid spaces in the name and the first character can not be a number.
2. Input Data (Please do not include spaces in your file names.) Upload the files in (tab delimited) text format or excel format (xls if Excel 95-2003 or xlsx if Excel 2007 format) including your phospho-proteomics data.
For SELPHI to recognize which columns represent the data you need to have the following column titles:
- OBLIGATORY Protein (Also recognizes: Proteins, Gene, Genes, Name, Names, protein, proteins): This column contains the hit protein names exactly as shown in either the database searched, or id map file provided (see below) in order to recover their UniprotID or GeneID
- OBLIGATORY Peptide: (Also recognizes any header that has the word Modified in it)The phosphopeptide sequence
- OPTIONAL Intensity AND/OR Score: These columns will be used to weigh the ratios of the peptides when merging the same peptides found multiple times in the screen
- OBLIGATORY At least one data column
- OPTIONAL Sample: This column contains the name by which you want this sample to be displayed
Other examples for input files can be found in the Downloads page.
3. Merge Samples File: This file contains in comma-separated lists (new line for each group), the samples that you wish to merge in the analysis and representation of your results. e.g. If you have multiple replicates of your data, or some datasets are with a similar stimulation. Let's say you have 3 sample types indicated 'Replicate1', 'Replicate2' and 'Replicate3' and the columns in those samples are M/L, H/L and H/M, this is how you would make a merge file for them:
Here is another example
4. Log Transformation: Please click this box if your phospho-data has already been transformed. Default is NOT transformed.
5. Analyze Motifs: Select if you wish SELPHI to find position-specific over-represented residues in the peptides that are correlated with the kinases in your dataset. Default is NO motif analysis.
6. Map of Phospho Sites: This file contains a map of the location of your phosphosite. The first column MUST have the Protein name as listed in the input file (see 1) separated by underscore from the phosphopeptide. The second (tab separated) column must have the name of the protein as you want it represented in the results and the phosphorylated residue and its location, separated by an underscore (e.g. TOP2B_T1397). Example: example 1, example 2
8. Fasta Database: This file contains the database against which the mass spec spectra were searched to map the peptides and the proteins in fasta format e.g. the Human RefSeq.
9. Clustering method: Choose whether you would like your peptides to be clustered and displayed into sub-groups. GO term analysis on the clusters is also performed. You can choose to define the number of subclusters using principal component analysis or mclust.
10. STRING evidence: When SELPHI maps your network to known STRING network, choose whether you want the overall and/or experimental and/or database score to be >800.
Other Fields to customize your analysis
11. Ratio Cutoff: Indicate the cutoff for your ratio (or log of ratio) that you want to use. Default is 3.
12. Kinase Cutoff: Indicate the cutoff for your ratio (or log of ratio) for the kinases in your set that are going to be used for correlation calculation. Default is same as general cutoff.
13. Correlation Cutoff: Indicate the cutoff for the correlation below which you do not want SELPHI to consider the correlations regardless of p-value (default is 0.8 for spearman and kendall correlations and 0.9 for pearson).
14. Correlation P-value Cutoff: Indicate the cutoff for the p-value of the correlation above which you do not want SELPHI to consider the correlations (default is 0.05).
15. Minimum Samples: Indicate the minimum number of samples, where you want your phospho-peptides to appear in order to be taken into account. Default is 3. This includes the different conditions e.g. M/L, H/L and H/M. For example if peptide A appears in 6 of your conditions and peptide B in 2 and this value is 3 SELPHI will ignore peptide B.
16. Merge Method: Decide how you want peptides that represent the same phosphosite to be merged. Select between MAX and AVG. Default is AVG.
17. Method: Method you want SELPHI to use to calculate the correlations. Select between Pearson, Spearman and Kendall correlation coefficients. Default is Spearman. Suggested methods (a very rough guide if you don't know what to choose): If you have few sample points use Pearson (though keep in mind that the fewer the sample points the less meaningful SELPHI's analysis will be). If you have many data points and your matrix is well covered by the majority of phosphopeptides use Pearson. If you have several data points but your matrix is sparse, use Spearman.
18. Motif File:This file includes additional motifs that you want SELPHI to search for on your sequence. First column is the name of your motif and second is your motif represented by regular expression (in perl). Default motifs it searches for are found here. They are taken from ELM and ProteoConnections.
19. Minimum Paths: Minimum number of overlapping proteins that you want your set to have with a pathway and GO term to consider them for enrichment analysis. Default is 5.
Back to Top
Output Files and InterpretationAn example of the output files is given here.
1. idsmapped: IDs maps to UniprotID, GeneID,GeneName, KEGG_ID, KEGG pathways, SMART domains, Pfam domains, Transcription Factor families, GO terms, Ensembl Gene and Protein (If id map was not provided)
This file can be used to translate all your protein ids to a range of ids from different databases, identify the protein domains in your sequences, the pathways in which your proteins are involved, their GO terms and whether they are a transcription factor and which family they belong to (information taken from AnimalTFDB nTFdb and YEASTRACT). You can check this mapping and if you don't agree with the id your protein was mapped to you can modify it and resubmit your job while uploading your corrected id map file. It would be appreciated if you further inform us of this correction so that we make sure that the error doesn't happen again.
2. idsnotmapped:IDs that we were not able to match to a Uniprot ID
These IDs were not mapped to a known GeneID or UniprotID and therefore were ignored in the GO term and Pathway enrichment analysis, and while they were included in the correlation analysis SELPHI was not able to get any additional information for them, therefore they were only kept if the correlation was very high. If you know what GeneID or UniprotID your protein should match to add it to the idsmapped file and resubmit your job while uploading your corrected id map file.
3. phosmapped: Maps of phosphosites (if phosphosite map was not provided)
This provides the information on the location of your identified phosphosites on your sequences.
4. sitesnotfound.tsv: phosphosites that we were not able to map to a sequence therefore we tagged it nf (not found)
If a lot of the sites are tagged 'nf', i.e. not found, it means that the particular peptide was not mapped onto the sequence that you provided for the specific proteins. Perhaps the database you provided was not the correct one. Please consider re-running your job with the correct database or provide a file with the mapping. If your database is correct and SELPHI still can't map your phospho-peptides please let us know.
5. datatable.tsv: text file containing filtered table with the sites mapped to gene name and phospho location
Pathway Enrichment AnalysisAn example can be found here.
1. pathways.pvalues.tsv: pvalues, odds ratio and corrected pvalue of pathways.
2. pathmtxoddsratio.tsv: table of odds ratio of pathways that made the pvalue cutoff (0.05).
3. pathmtx.tsv: table of cumulative log change from input data of pathways that made the pvalue cutoff (0.05). Note only values according to the cutoff provided were used.
4. pathdata.pdf: plot of clustered pathway enrichment with the pathmtx values represented in color and the oddsratio in size of cells
6. pathmtx0304oddsratio.tsv: table of odds ratio of signaling pathways that made the pvalue cutoff (0.05).
7. pathmtx0304.tsv: table of cumulative log change from input data of signaling pathways that made the pvalue cutoff (0.05). Note only values according to the cutoff provided were used.
8. pathdata0304.pdf: plot of clustered signaling pathway enrichment with the pathmtx values represented in color and the oddsratio, i.e. the effect of the enrichment is represented in size of the boxes.
9. pathmtx05oddsratio.tsv: table of odds ratio of disease pathways that made the pvalue cutoff (0.05).
10. pathmtx05.tsv: table of cumulative log change from input data of disease pathways that made the pvalue cutoff (0.05). Note only values according to the cutoff provided were used.
11. pathdata05.pdf: plot of clustered disease pathway enrichment with the pathmtx values represented in color and the oddsratio in size of cells.
These files are the result of the pathway enrichment analysis. This was done using the fisher's exact test and the pvalues and odds ratios are provided. For the significantly enriched pathways, SELPHI creates two matrices, one representing the additive log of the fold change of the phopho-peptides belonging to proteins of each pathway and one with the odds ratios from the enrichment analysis.
Selphi displays a) the full matrix, b)the matrix representing only signaling pathways and c) that showing only disease pathways. These matrices are clustered using 1-(pearson correlation) of the additive log(fold change) as distance and average linkage and are displayed showing that number as a color and the size of each box represents the odds ratio, i.e. the effect size of the enrichment.
Therefore pathways that are clustered together represent pathways that have proteins with similarly modulated phospho-peptides. You can focus on those that have a bigger box size to select those pathways that are not only significanly enriched but this enrichment has a bigger effect size. If you don't like this representation the raw p-values list and the matrices used to make these plots are provided.
Back to Top
GO term Enrichment AnalysisAn example can be found here.
1. goterms.pvalues.tsv: pvalues, odds ratio and corrected pvalue of GO terms.
2. gotermsbpmtxoddsratio.tsv: table of odds ratio of Biological Process GO terms that made the pvalue cutoff (0.05).
3. gotermsbpmtx.tsv: table of cumulative log change from input data of Biological Process GO terms that made the pvalue cutoff (0.05). Note only values according to the cutoff provided were used.
4. gotermsbp.pdf: plot of clustered Molecular Funcion GO terms enrichment with the gotermsbpmtx values represented in color and the oddsratio in size of cells.
5. gotermsmfmtxoddsratio.tsv: table of odds ratio of Biological Process GO terms that made the pvalue cutoff (0.05).
6. gotermsmfmtx.tsv: table of cumulative log change from input data of Molecular Function GO terms that made the pvalue cutoff (0.05). Note only values according to the cutoff provided were used.
7. gotermsmf.pdf: plot of clustered Molecular Function GO terms enrichment with the gotermsmfmtx values represented in color and the oddsratio in size of cells.
8. gotermsccmtxoddsratio.tsv: table of odds ratio of Cellular Component GO terms that made the pvalue cutoff (0.05).
9. gotermsccmtx.tsv: table of cumulative log change from input data of Cellular Component GO terms that made the pvalue cutoff (0.05). Note only values according to the cutoff provided were used.
10. gotermscc.pdf: plot of clustered Cellular Component GO terms enrichment with the gotermsccmtx values represented in color and the oddsratio in size of cells.
These files are the result of the GO term enrichment analysis which is done using FuncAssociate. The results of this analysis are displayed in a way similar to those of the pathway enrichment analysis, and the raw files are also provided.
Correlation Analysis and AnnotationExample output files can be found here.
1. correlations.tsv:All correlations between kinase/phosphatases-phosphopeptides
This file contains the calculation of the correlation (as defined by the user, default is Spearman correlation) between all peptides belonging to kinases or phosphatases and the full set of phospho-peptides. The first column (of numbers) calculates the correlations using the entire vector of samples, the second column does the calculation using only overlapping points. If the pair is indeed co-changing we expect that the correlation will improve when we use the fraction of the vector that represents the samples where they appear in common. The third column represents the p-value of the correlation. If you want to visualize the co-changing peptides as a network you can import it into cytoscape. We recommend that you decide on a filter (e.g.p-value less than 0.01 and correlation greater than 0.85) and don't import the entire file because it is rather large depending on the amount of peptides you have in your dataset.
2. allinfo.tsv:Identified associations between kinase/phosphatases-phosphopeptides and annotations for these connections.
This represents the entire 'network' identified through the SELPHI pipeline. It represents kinases and phosphatases that have peptides that are co-changing with particular phosphopeptides. This is a good place to start looking for
a) substrates of kinase/phosphatases,
b) sets of proteins involved in the same function for the particular system.
This file further contains a wealth of information on each pair extracted from:
- GeneMania (CONFUNCTIONING)
- A p-value (fisher's exact test) showing if the specific peptide is significantly similar to a position specific matrix created from known substrates of each kinase (substrates were extracted from the PhosphoSitePlus Database)
- Which samples your peptide was found in
- Whether it is a known kinase-substrate pair/site
- All known kinases that phosphorylate this protein and the specific sites (extracted from the PhosphoSitePlus Database and PhosphoELM)
- Known motifs that map to this site
- Whether it is a transcription factor and its family whether the kinase/phosphatase is known to be induced or inhibited by this phosphorylation and what is the golbal effect this is known to have on the cell (extracted from PhosphoSitePlus Database).
3. stringoverlap.tsv: Known interactions (STRING database cutoff>=800 for overall score, experimental and/or database as requested) amongst proteins in input set plus proteins that contain phospho-binding domains
This file shows the overlap of your identified proteins with the known network from STRING. The known connections with proteins containing phospho-binding domains are also provided.
This network can be used to:
a) place your network in the context of known information
b) observe the propagation of your phosphorylation signals within this known network.
If you combine it with the full or a subset (e.g.tyr kinase network) of the SELPHI-extrapolated network you can observe potential additional functional modules in your system and the suggested propagation through the network.
Back to Top
Kinase/Phosphatase relationships with phospho-peptidesAn example can be found here.
1. allkinsubtable.tsv:Correlations Table of all kinases vs associated phospho-peptides.
2. allphossubtable.tsv:Correlations Table of all phosphatases vs associated phospho-peptides.
3. tyrvsstkinsubtable.tsv:Correlations Table of Tyrosine kinases vs associated Serine-Threonine kinase phospho-peptides.
4. tyrkinsubtable.tsv:Correlations Table of Tyrosine kinases vs associated phospho-peptides.
5. tyrphossubtable.tsv:Correlations Table of Tyrosine phosphatases vs associated phospho-peptides.
6. allkinvssubcluster.pdf:Plots of clustered all kinases vs associated phospho-peptides.
7. allphosvssubcluster.pdf:Plots of clustered all phosphatases vs associated phospho-peptides.
8. tyrvsstkinsubtable.pdf:Plots of clustered Tyrosine Kinases vs associated Serine-Threonine Kinase phospho-peptides.
9. tyrkinvssubcluster.pdf:Plots of clustered Tyrosine Kinases vs associated phospho-peptides.
10. tyrphosvssubcluster.pdf:Plots of clustered Tyrosine phosphatases vs associated phospho-peptides.
These files give in a table format the relationships between the kinase and phosphatases and their associated peptides. Three versions are provided:
a) Tyrosine Kinases/Phosphatases networks
b) Tyrosine kinases vs Serine/Threonine inases networks
c) All Kinases/Phosphatases networks
From these plots, other than the extrapolated relationships between specific enzymes and potential substrates or cofunctioning proteins, you can see which of the kinase/phosphatase phosphopeptides are most likely the ones responsible for the changes in cell signaling, in your specific conditions by seeing which ones are associated with the most phospho-changes. You can also see which kinases/phosphatases/other proteins are likely working together in the same functional module(the assumption here is that the common phosphorylation patterns represent a similar functional module, or response to a similar event)from the clusters that are formed. Known information on which phosphorylation site is responsible for activating and inhibiting a kinase/phosphatase has also been taken into account in these tables.
Motif analysisAn example can be found here.
1. motfind.pvalues:P-values and odds ratios for over-representation of each residue in each position around a phospho-site.
2. [kinase].motif:Table for each kinase with odds ratios for the over-represented (p-value<0.05) residues in each position around a phospho-site (used to make above pdf file).
3. motifplot.pdf:Plot of odds ratios for over-representated (p-value<0.05) residues in each position around a phospho-site.In these plots you can see if there are any residues that are significantly over-represented in specific locations around a phospho-site for the associated peptides of each kinase. These over-representations could represent a motif that is used in your system to modulate a specific function (e.g. recruitment of a different kinase, or adaptor protein). They represent a good place to start exploring the functional relevance of these sites.
4. [kinase].peps:List of peptides that the kinase is associated with (for use as experimental peptides with IceLogo, or to make a sequence logo).
5. controlpeps:Background peptides for use as reference with IceLogo.If you are interested in a specific kinase or for a prettier representation and/or alternative analysis of the motif representation SELPHI recommends you try using the IceLogo server and provides the input files necessary. Select the Create tab and put the [kinase].peps list in the experimental set and the controlpeps file in the Reference set and try different parameters to get the visualization you prefer, either as a sequence logo or as an IceLogo(recommended).
Peptide clustering analysisAn example can be found here.
Please note that the default in SELPHI is to not run the clustering analysis, as it takes a little bit longer, so if you require it please select the appropriate field in the submission page. You can either choose to identify the number of clusters using principle components analysis and then do k-means clustering, or use mclust.
1. clusteredpeptides.pdf: All sublusters of the peptides.
2. gocldata.pdf:Plot of GO terms enrichment in each sub cluster (top 10 only).
3. clusters:The assignment of the peptides to clusters.
4. [number].cluster: Cluster groups identified (one file for each group).
5. [number].gotermstable:Funcassociate output for goterms enrichment for each cluster group.
6. gomtxcloddsratio.tsv: Table of odds ratio for GO terms enrichment in each cluster (top 10 only).
7. gomtxclpvalues.tsv: Table of pvalues for GO terms enrichment in each cluster (top 10 only).
8. gotermsclust.pvalues.tsv All significant p-values (less than 0.05) identified.
8. gotermsclust.pvaluesweb.tsv Top 10 per cluster p-values identified.
In this section you can observe how the different groups of your peptides are changing and see if any sub-groups have an over-represented function.
These files represent different subnetworks extracted from the big SELPHI network for easier visualization using Cytoscape. If you import the .net file of interest into cytoscape, you will include the edge attributes of the correlation and whether there exists support for the connection or not (from the data integration analysis). You can then import the type and phosphoinfo attributes as node attributes and you will have loaded all the information you need for creating a network visualization of the extrapolated SELPHI-networks. If you want to see all the data available and the entire SELPHI-network, import the .allinfo.tsv file generated (above), but keep in mind that depending on your dataset it might be too overwhelming.
1. kinkingenes.net: Represents the SELPHI-network between kinases shown as genes (the phosphosite info has been removed).
2. kinkin.net: Represents the SELPHI-network between kinases.
3. kinkinphosgenes.net: Represents the SELPHI-network amongst kinases and phosphatases shown as genes (the phosphosite info has been removed).
4. kinkinphos.net: Represents the SELPHI-network amongst kinases and phospohatases.
These represent the SELPHI-suggested regulatory networks in your system.
5. kinphosTFgenes.net: Represents the SELPHI-network between kinases/phosphatases and transcription factors shown as genes (the phosphosite info has been removed).
6. kinphosTF.net: Represents the SELPHI-network between kinases/phosphatases and transcription factors.
These networks suggest the propagation of your signal from the kinases to the transcription factors.
7. STkinSTkingenes.net: Represents the SELPHI-network between Serine/Threonine kinases shown as genes (the phosphosite info has been removed).
8. STkinSTkin.net: Represents the SELPHI-network between Serine/Threonine kinases.
These networks include downstream effects of signalling. As the phosphopeptide are not as clearly correlated as with the tyrosine networks, these networks are a bit noisy, but they still give a good overview of the cell responses.
9. TYRkinSTkingenes.net: Represents the SELPHI-network between Tyrosine kinases and Serine/Threonine shown as genes (the phosphosite info has been removed).
10. TYRkinSTkin.net: Represents the SELPHI-network between Tyrosine kinases and Serine/Threonine Kinases.
These networks show potential ways that the signal is transmitted from tyrosine kinases (receptor and non-receptor) to downstream serine/threonine kinases that then generate the cell's responses.
11. TYRkinTYRkingenes.net: Represents the SELPHI-network between Tyrosine kinases shown as genes (the phosphosite info has been removed).
12. TYRkinTYRkin.net: Represents the SELPHI-network between Tyrosine kinases.
13. Ykinsubstrgenes.net: Represents the SELPHI-network between Tyrosine kinases and their associated peptides shown as genes (the phosphosite info has been removed).
14. Ykinsubstr.net: Represents the SELPHI-network between Tyrosine kinases and their associated peptides.
Tyrosine kinase networks. Usually will be at the top of your cell's response and will define the downstream responses. Also can be used to find potential substrates for the tyrosine kinases.
15. stringoverlap.net: The network overlapping with STRING database plus other proteins known to contain a phospho-binding domain.
16. type.attr: The attributes file that specifies if a node is a kinase,phosphatase,transcription factor and what kind.
17. phosphoinfo.attr: The attributes file for the fold change of phosphorylation (log transformed).
18. phosphoinfogenes.attr: The attributes file for the fold change of phosphorylation (log transformed) of your proteins (phosphopeptide info has been removed: log(fold change) has been averaged or kept the absolutely maximum value depending on your indication in the input (Merge Method, default: average).
Back to Top
Frequently Asked Questions
1. Who do I contact if there are problems or if I want to give feedback? If your run takes longer than 2-3 hours, please refresh the page. If there are still no results or there is any problem with your run, or you would just like to provide feedback and suggestons please do not hesitate to contact Evangelia Petsalaki. Please provide the link to your results/submission page so that we can track/solve the issue faster.
2. I uploaded my ID/Phospho site maps but it keeps giving me an error. What can I do?
If it was a text file are you sure that it was tab delimited and that the first column corresponds to your proteins in the input files? If it was and excel file, please make sure that the file ending is correct, i.e. xls if you have Excel 95-2003 or xlsx if you have Excel 2007 or newer format. If you are still having trouble copy paste it into a text editor and try with that, and if it is still not working, please contact Evangelia Petsalaki. Please provide the link to your results/submission page so that we can track/solve the issue faster.
3.My phosphoproteomics data is from a species, which is not supported by SELPHI. Is there something I can do to be able to use your tool?
SELPHI does not support all species datasets. However you might be able to circumvent that by either a) providing an id map that maps YOUR Protein ids that are in the Protein list of your data file to uniprot or gene ids of the species supported that is closest to your species AND a phospho-site map that corresponds to these sites or b) Providing as a database the fasta sequences of all your species proteins (make sure the ids mentioned in the fasta header are the same as your data input Protein list) and choose the most orthologous one from the species list, and SELPHI will map them to the orthologous proteins. In the first case if your phospho-site map doesn't correspond to the supported proteins be aware that SELPHI might have trouble mapping your sites to known phosphosites and kinase-substrate relationships but the GO term, Pathway, Clustering and correlation analysis should proceed normally. In the (b) case it is advisable to provide a phosphosite map as well, or else be aware that SELPHI may not be able to map your phospho-peptides to the supported sequence and they will therefore be tagged _nf (not found) and treated the same if they are from the same protein. If you use your species type regularly and need SELPHI to work for it, please contact Evangelia Petsalaki and she will add your species of choice to the supported list.
4. I added a kinase inhibitor to my samples and even though I can see peptides going down in response, these are not correlated with the kinase. Is something wrong? What should I do?
SELPHI assumes that the phosphorylation of the kinase corresponds to the modulation of its activity (e.g. autophosphorylation or activation/inhibition through phosphorylation by a different kinase) and thus correlates the phosphorylation changes to those of the rest of the peptides in the dataset. If you see peptides that should be correlated with your kinase but you don't see it in the correlation table there are several reasons for this: a) There is no peptide from your kinase in your dataset b) The inhibitor or condition you used, while it affects the kinase activity it doesn't affect its phosphorylation status, therefore the latter won't correlate with its effect.
The solution to this problem if you are sure of the behavior of your kinase at your condition (and keep in mind that your inhibitor or condition might affect also other kinases that may also not be showing a correlation) is to introduce into your dataset an artificial peptide for your kinase of interest (include it also in the phospho map file etc) that behaves the way you think it is. You can get that through averaging, for example, the behaviour of its known substrates. Use this 'hack' with caution!
5. How does SELPHI deal with missing peptides?
SELPHI is basing most of the analysis on the provided datasets, in order to allow the unbiased view of the dataset. If however you are very familiar with your system under study and you know that there are specific missing peptides you can do one of two things: a) Add artificial peptides, similarly to the answer in FAQ 4. b) Check the peptide clustering and find where peptides that behave the same way as your missing peptide lie, and assume that your missing peptide will behave in a similar way both in terms of correlations and in terms of phospho-data fluctuation.
6. How many data points/conditions/samples do I need to have? How are replicates treated?
SELPHI does not process jobs with less than two data points. If you have less than 6 data points SELPHI will use the Pearson correlation coefficient for the correlation analysis. Generally the more data points you have the more accurate the association networks will be. Similar to any other tool, the results are only as good as the input data. If you have the possibility we recommend at least 2 replicates (ideally 3) of your experiments and as many conditions or time points as you can afford (at least 3 conditions, recommended 6). You can upload a file to SELPHI indicating which conditions are replicates and choose whether you want to merge them using the average or the maximum change. You can also request that SELPHI takes into account only the peptides that appear in a minimum of conditions. All these will improve the quality of your analysis and your results. Despite all these recommendations, if you only have 3 data points, SELPHI will still process your data and provide you with results enriched in known kinase/phosphatase-substrate associations, to help you design in an educated way your follow up experiments, but be aware that the analysis is performed at reduced statistical power.
7. Does SELPHI depend on a specific version of Uniprot? If you provide a UniprotID map then the IDs that don't exist in the Uniprot version used by SELPHI will be ignored for the Pathway and GO termn analysis, but will still be used for the correlation, clustering and motif analysis. If the user provides a GeneID map or the FastA file of the database searched, SELPHI will map them to the appropriate version of the database.
8. I can't figure out how to format my files to submit to SELPHI. What should I do? We have created a conversion tool: SELPH-Convert. If you are not happy with the output: a) If your report came from a MaxQuant analysis: i) To make your input data files repeat the following procedure for every experiment.
- 1) If your report is a text file copy it into an excel file
- There should be a column that is labeled 'Protein' or 'Leading Proteins'. This will be your 'Proteins' column for the SELPHI input file. Copy this column into a new excel file that will be your data file.
- There should be a column labeled 'Modified Sequence'. This will be your 'Peptide' column for the SELPHI input file. Copy it into your data excel file.
- There should be columns saying e.g. 'Ratio M/L Normalized', 'Ratio H/L Normalized' etc. Copy these into your data excel file. We recommend changing the label to whatever you want to appear as a sample tag for the specific data point in the different plots. e.g. change 'Ratio M/L Normalized' to '5minvs0min_stimulation';
- Optionally you can copy the 'Intensity' and/or 'Score' columns to your excel file too. SELPHI will then use these values to weigh your peptides when averaging ratios from same peptide identifications.
- You can either upload the excel file to SELPHI directly or if you are having troubles depending on the version of Office etc, you can copy the contents into a text file and upload that instead.
b) If your report came from a Mascot analysis:Under construction
c) If your report came from a Proteome discoverer analysis: Under construction
If your data came from a different tool and/or you need help with formatting please contact Evangelia Petsalaki with an example of a data file that you are trying to format and she will help you out.
9.If kinase A phosphorylates kinase B, and kinase B phosphorylates substrate C: If only kinase A and substrate C are identified: will the correlation analysis assign kinase A as regulator of substrate C?
SELPHI provides associations of kinase/phosphatases and other phosho-peptides. While these associations are significantly enriched in direct regulatory relationships, it could be that the relationship is not direct, but in the same line of signaling response. So in the above example kinase A would be associated with substrate C and this would mean that either A is directly regulating C or it is upstream in the same line of signaling. The association in short shows that they are co-changing.
If you converted your initial excel file to .csv or .tsv etc and one or more of your headers contained a new line (i.e. Protein Name (enter) Human) SELPHI recognizes this new line as the end of the header line and stops parsing the header there. To fix this go back to your excel file, remove the 'new line' (enter) from your headers and remake your .csv or .tsv files and try again. If the problem persists please contact Evangelia Petsalaki with an example of a data file that you are trying to format and she will help you out.
Back to Top