Summary
Here, we developed a web serve called Tumor online Prognostic analyses Platform (ToPP), which collected multiomics (mutation, CAN, gene fusion, mRNA, miRNA, lncRNA, protein and methylation) and clinical data from 56 types of tumors datasets from The Cancer Genome Atlas (TCGA) as The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC)project.
ToPP provide a userfriendly web interface to reduce user barriers, as well as the customizable high resolution charts. In addition, except to conventional univariate or multivariate analysis, it also provides subgroup survival analysis, prognostic modeling, pancaner survival analysis, and allows users to upload their own data for prognostic analysis.
ToPP aims to provide more convenient and reliable prognostic analysis services for tumor researchers no login requirement.
Browser compatibility
OS  Version  Chrome  Firefox  Microsoft Edge  Safari 
Linux  CentOS 7  87.0.4280.88  60.3  n/a  n/a 
MacOS  Mojave 10.14.6  87.0.4280.67  72.0.2  n/a  14.0.1 
Windows  10  86.0.4240.198  82.0.3  44.18362.449.0  n/a 
Contact us
If any question about the ToPP, please contact us: ouyangjian12@163.com
Update news:
 Version 1.2 : Time 2020930,Collected multiomics data and clinical data of 41 tumors in ICGC.
 Version 1.1 : Time 202051,Collected the MS proteomics data in Breast invasive carcinoma (BRCA), Colon adenocarcinoma (COAD), Ovarian serous cystadenocarcinoma (OV) and Rectum adenocarcinoma (READ) from CPTAC.
 Version 1.0 : Time 20191231,construct the basic analysis tools of the webserver and add the multiomics data and clinical data of 33 tumors in TCGA.
Univariate analysis
Univariate analysis was performed by logrank test.KaplanMeier(KM) survival curves [1] will draw by the divided groups, the hazard ratio and the 95% confidence interval information will also be included in the survival plot.
Since the researchers may also be interested in the differential expression between tumor tissue and the paracancerous tissue, ToPP also provides boxplot for the two groups and wilcoxon test was performed test if there is significant difference between the two groups.
Parameters
Data selection
 Select Dataset: Select the dataset, there are a total of 33 datasets here. See the Supplemental table for more information.
 Gene: Input a Gene symbol,Gene ID or ensembl_gene_id with fuzzy query.
 Data tpye: Select a certain omics data. Each dataset selected above may contain 8 types of omics data (Gene expression,lncRNA expression,miRNA expression,Somatic mutation,Protein expression,Methylation,Copy number variation,Gene fusion).
 Survival tpye: There are a total of five survival types to choose from: OS: overall survival, PFI: progressionfree interval, DSS:diseasespecific survival,DFI:diseasefree interval and RFS:Relapse Free Survival.
 Cutoff: For continuous variables such gene or protein expression, variable will be divided into two groups by the median, quantile or the ‘bestcut’ value. Here, we calculated all the logrank p value by the cut of each sample and keep at least 10% of samples for each group and the cut with the lowest p value was represented as the ‘bestcut’. While for categorical variable such mutation, CNA or gene fusion, variable will be divided into groups by the classification.
 Conditions: Subgroup screening based on omics data,such as Somatic mutation,Methylation,Copy number variation.
 cliConditions: Subgroup screening based on clinical data,such as gender,race,tumor stage,or diseasespecific clinical features such as HER2+ in breast cancer.
 Differential expression: Whether to perform differential expression analysis between cancer and adjacent cancer.
Draw designs
 Risk.table: Draw risktable under the KM curve.
 Unit Time: Choose a unit of time for survival such as years,months and days.
 Median.line: Plot the survival rate for the median survival time
 95% Conf.int: Show the 95% confidence interval of the KM curve.
 Hazards Ratio: Show the Hazards Ratio between the two groups.
 Group color: Select the different color for the two groups.
Result
Typical univariate analysis, for subgroup analysis please choose the Conditions or cliConditions selector.
KM curves: includeing pvalues for the lograng test (p<0.05 was considered to be significantly different between the two groups),Hazards Ratio,risk table, description of the results, etc.
Boxplot :The wilcox test was used to detect whether the two groups were significantly different in terms of expression, *** for p<0.001,** for p<0.01,* for p<0.05,NA for no significant difference between the two groups Difference. All resulting images are downloadable including .pdf, .png, .tiff formats with 300 DPI.
Multivariate analyses
Multivariate analysis was performed by cox regression analyses (or Cox proportional hazards model) [2].When ToPP performs multivariate analysis, it will not only give the p value of HR and logrank test for each gene in multivariate analysis, but also calculate the p value of HR and logrank test for each gene as a separate prognostic factor. So that, user can determine whether this gene is an independent prognostic factor.
Parameters
 Genes: Input a gene list with Gene symbol only.
 Select Dataset: Select the dataset, there are a total of 33 datasets here. See the attached table for more information.
 Data tpye: Select a certain omics data. Each dataset selected above may contain 8 types of omics data (Gene expression,lncRNA expression,miRNA expression,Somatic mutation,Protein expression,Methylation,Copy number variation,Gene fusion).
 Survival tpye: There are a total of five survival types to choose from: OS: overall survival, PFI: progressionfree interval, DSS:diseasespecific survival,DFI:diseasefree interval and RFS:Relapse Free Survival.
 Cutoff: For continuous variables such gene or protein expression, variable will be divided into two groups by the median, quantile or the ‘bestcut’ value. Here, we calculated all the logrank p value by the cut of each sample and keep at least 10% of samples for each group and the cut with the lowest p value was represented as the ‘bestcut’. While for categorical variable such mutation, CNA or gene fusion, variable will be divided into groups by the classification.
 Conditions: Subgroup screening based on omics data,such as Somatic mutation,Methylation,Copy number variation.
 cliConditions: Subgroup screening based on clinical data,such as gender,race,tumor stage,or diseasespecific clinical features such as HER2+ in breast cancer.
 Differential expression: Whether to perform differential expression analysis between cancer and adjacent cancer.
 Draw designs: View a description of the parameters in the Univariate analysis.
Result
Typical multivariate analyses, for subgroup analysis please choose the Conditions or cliConditions selector.
KM curves: includeing pvalues for the lograng test (p<0.05 was considered to be significantly different between the two groups),Hazards Ratio,risk table, description of the results, etc also,it will demonstrate the formula for calculating risk score.All resulting images are downloadable including .pdf, .png, .tiff formats with 300 DPI.
Result table:contains HR values for each gene in univariate analysis and multifactor analyses, pvalues of logrank test and coefficients in multifactor analysis
Prognostic model
When in constructing a prognostic model, firstly, we fit a naive Cox model include all covariates. While in the real world data we should removing redundant or irrelevant variables from the model which describes the data by reducing variance on the expense of bias to make model more robust. Interaction between covariates and timedependent covariates should also be considered in the modeling process [3]. Here we select stepwise selection for variables selection, stepwise selection is a mix between forward and backward selection. We can either start with an empty model or a full model and add/remove predictors according some criteria. We will use the Akaike information criterion(AIC), which is defined as follows: AIC = 2k − 2max(loglikelihood) where k is the number of parameters in the model. Than we made Model diagnostics which mainly includes the following three aspects: 1) testing the proportional hazards(PH) assumption, 2) examining influential observations (or outliers), 3) detecting nonlinearity in relationship between the log hazard and the covariates. In order to check these model assumptions, Residuals method are used. The common residuals for the Cox model include:1) Schoenfeld residuals to check the proportional hazards assumption, 2) Martingale residual to assess nonlinearity, 3) Deviance residual (symmetric transformation of the Martinguale residuals), to examine influential observations [4]. Covariates conversion such as nonlinear transformations can be done if it is necessary. Finally, we use Concordance index(Cindex) [5] to evaluate the effect of the model.
Parameters
 Basic parameters: Same as described in Multivariate Analyses.
 Direction: The mode of stepwise search, can be one of "none", "both", "backward", or "forward", with a default of "none".
 coxdiagnostics type: The type of residuals to present on Y axis of a diagnostic plot. The same as in residuals.coxph: character string indicating the type of residual desired. Possible values are "martingale", "deviance", "score", "schoenfeld", "dfbeta", "dfbetas" and "scaledsch".
 Validation Dataset: Selecting a Validation Set for Model Testing.
 formula: The variables contained in the model and their corresponding conversion formulas.
 variable transformation: Nonlinear transformation of variables if necessary.
Result
The three and five year calibration curve for prognostic model, the closer the curve is to the diagonal, the more accurate the model
Nomogram for prognostic model,including the weight of each feature and threeyear and fiveyear survival rates
PH assumption test(right),and displays diagnostics graphs presenting goodness of Cox Proportional Hazards Model fit(left)
Displays graphs of continuous explanatory variable against martingale residuals of null cox proportional hazards model, for each term in of the right side of formula. This might help to properly choose the functional form of continuous variable in cox model (coxph). Fitted lines with lowess function should be linear to satisfy cox proportional hazards model assumptions.
Pancancer
Pancancer Analysis was designed to investigate the prognostic effects of factors in a variety of tumors. All data were from the PanCancer Atlas [6] with unified normalization and standardization. Users can select all type of cancer or just choose a subset of pancancer such as urologic (bladder urothelial carcinoma [BLCA], prostate adenocarcinoma [PRAD], testicular germ cell tumors [TGCT], kidney renal clear cell carcinoma [KIRC], kidney chromophobe [KICH], and kidney renal papillary cell carcinoma [KIRP]) which was designed by PanCancer Atlas project. After that users can filtering subsets by conductions described in univariate module, besides ToPP set a somatic alterations (copynumber alterations, mutations, fusions or epigenetic silencing) in ten canonical pathways: cell cycle, Hippo, Myc, Notch, Nrf2, PI3Kinase/Akt, RTKRAS, TGFb signaling, p53 and bcatenin/Wnt[7] as the pathway condition screening.
Parameters
 Select Dataset: Select the dataset, there are a total of 33 datasets here. See the attached table for more information.Users can select multiple types of datasets.
 Gene: Input a Gene symbol,Gene ID or ensembl_gene_id with fuzzy query.
 Data tpye: Select a certain omics data. Each dataset selected above may contain 8 types of omics data (Gene expression,lncRNA expression,miRNA expression,Somatic mutation,Protein expression,Methylation,Copy number variation,Gene fusion).
 Survival tpye: There are a total of five survival types to choose from: OS: overall survival, PFI: progressionfree interval, DSS:diseasespecific survival,DFI:diseasefree interval and RFS:Relapse Free Survival.
 Cutoff: For continuous variables such gene or protein expression, variable will be divided into two groups by the median, quantile or the ‘bestcut’ value. Here, we calculated all the logrank p value by the cut of each sample and keep at least 10% of samples for each group and the cut with the lowest p value was represented as the ‘bestcut’. While for categorical variable such mutation, CNA or gene fusion, variable will be divided into groups by the classification.
 geneConditions: Subgroup screening based on omics data,such as Somatic mutation,Methylation,Copy number variation and 10 canonical signaling pathway.
 cliConditions: Subgroup screening based on clinical data,such as gender,race,tumor stage,or diseasespecific clinical features such as HER2+ in breast cancer.
 Draw designs: View a description of the parameters in the Univariate analysis.
Result
Typical pancancer analysis, for subgroup analysis please choose the Conditions or cliConditions selector.
KM curves: includeing pvalues for the lograng test (p<0.05 was considered to be significantly different between the two groups),Hazards Ratio,risk table, description of the results, etc.
Forest plot:Metaanalysis of multiple tumors and pancancer,includeing lograng p value and HR(95%CI)
Combination Analysis
Combination analysis is to analyze the effect of the synergy of two factors on the prognosis. These two factors can be the same level of data such as the expression of two genes, or they can be different levels of data. For example, one is to check the effect of gene expression on prognosis, the other is to check the effect of gene mutation on prognosis. Then all the patients will be divided into 4 groups (high expression + mutation group, high expression + wild type group, low expression + mutation group and low expression + wild type) according to the threshold of two factors and logrank test was performed to test whether there was significant difference between the two groups, separately. In this way, researchers can assess the synthetic lethality effects of two genes.
Parameters
Data selection
 Dataset: Select the dataset, there are a total of 33 datasets here. See the attached table for more information.
 Survival tpye: There are a total of five survival types to choose from: OS: overall survival, PFI: progressionfree interval, DSS:diseasespecific survival,DFI:diseasefree interval and RFS:Relapse Free Survival.
 Gene1: Input a Gene symbol,Gene ID or ensembl_gene_id with fuzzy query.
 Data tpye1: Select a certain omics data. Each dataset selected above may contain 8 types of omics data (Gene expression,lncRNA expression,miRNA expression,Somatic mutation,Protein expression,Methylation,Copy number variation,Gene fusion).
 Gene2: Input another Gene symbol,Gene ID or ensembl_gene_id with fuzzy query.
 Data tpye2: Select another certain omics data. Each dataset selected above may contain 8 types of omics data (Gene expression,lncRNA expression,miRNA expression,Somatic mutation,Protein expression,Methylation,Copy number variation,Gene fusion).
 Cutoff: For continuous variables such gene or protein expression, variable will be divided into two groups by the median, quantile or the ‘bestcut’ value. Here, we calculated all the logrank p value by the cut of each sample and keep at least 10% of samples for each group and the cut with the lowest p value was represented as the ‘bestcut’. While for categorical variable such mutation, CNA or gene fusion, variable will be divided into groups by the classification.
 Draw designs: View a description of the parameters in the Univariate analysis.
Result
Combination Analysis for multiple genes with multiple omics.
KM curves: includeing pvalues for the lograng test (p<0.05 was considered to be significantly different between the two groups),Hazards Ratio,risk table, description of the results, etc.
Upload Your Data
Data upload module allow users to upload their own data with survival time and status to do all kind of analysis in ToPP. Also user should set the permission for their own data and keep an email for connect if necessary.
Parameters

upload seq data:
Data format requirements: Example data can be downloaded here
Accept only data in TXT format and columns separate by \t;
Column names should be the survival state,survival time, gene symbol or clinical feature, rownames should be sample ID;
Column names of Survival state should be be in the "OS,PFI,DSS,DFI,RFS",Corresponding column names of survival time should be in the 'OS.time,PFI.time,DSS.time,DFI.time,RFS.time'.
The Survival state should be in 1 or 0 ,1:event occurrence,such as Death, recurrence, metastasis,censoring, etc. 0: event not occurrence such alive, no metastasis, etc;
The survival time should be in days.
Column names should contain "sample","group",while in group the tumor sample must be the same as file name,such as if your file is LIHC.txt,than the tumor patient in group column must be 'LIHC'
sample group cancer.type gender OS OS.time DSS DSS.time TP53 EGFR ERBB2 EZH2 TCGA.2V.A95S.01 TCGALIHC LIHC MALE 0 NA 0 NA 0 1.2 9.08 6.2 TCGA.2Y.A9GS.01 TCGALIHC LIHC MALE 1 724 1 724 0 2.57 8.03 2.5 TCGA.2Y.A9GT.01 TCGALIHC LIHC MALE 1 1624 1 1624 0 0 8.57 1.53 TCGA.2Y.A9GU.01 TCGALIHC LIHC FEMALE 0 1939 0 1939 2.77 1.42 8.82 0.56 TCGA.2Y.A9GV.01 TCGALIHC LIHC FEMALE 1 2532 1 2532 5.99 0.88 7.6 9.31 TCGA.2Y.A9GW.01 TCGALIHC LIHC MALE 1 1271 1 1271 0 0.82 8.33 8.95 TCGA.2Y.A9GY.01 TCGALIHC LIHC FEMALE 1 757 1 757 0 0 8.4 8.59 TCGA.2Y.A9GZ.01 TCGALIHC LIHC FEMALE 1 848 1 848 7.41 6.1 8.74 9.22 TCGA.2Y.A9H0.11 normal LIHC MALE 0 3675 0 3675 0 0 7.65 8.86  Select data type: Select a certain omics data. Each dataset selected above may contain 8 types of omics data (Gene expression,lncRNA expression,miRNA expression,Somatic mutation,Protein expression,Methylation,Copy number variation,Gene fusion).
 cancer type: Select cancer type for your study .
 Permission: Private means that the data can only be accessed through the IP address of the computer currently submitted. Public means that everyone can access the data. Of course, if someone uses the data you have public, we hope that he can join it to the reference.
 Reference: Input the reference documents corresponding to the dataset for reference.
 Study design: Give a brief description of the study, especially information about sample collection and study design.
 Email: Input email address which we can contact with your.
Result
Supplemental table
Supplemental table for all the cancer type and the data type.
ID  Genome  Transcriptome  Proteome  Epigenome  Clinical data  

ID  Project  中文名称  Mutation  CNA  Fusion  mRNA  miRNA  LncRNA  RPPA  MS  Methylation  Phenotype 
TCGAACC  Adrenocortical carcinoma  肾上腺皮质癌  √  √  √  √  √  NA  √  NA  √  √ 
TCGABLCA  Bladder Urothelial Carcinoma  膀胱尿路上皮癌  √  √  √  √  √  √  √  NA  √  √ 
TCGABRCA  Breast invasive carcinoma  乳腺浸润癌  √  √  √  √  √  √  √  √  √  √ 
TCGACESC  Cervical squamous cell carcinoma and endocervical adenocarcinoma  宫颈鳞癌和腺癌  √  √  √  √  √  √  √  NA  √  √ 
TCGACHOL  Cholangiocarcinoma  胆管癌  √  √  √  √  √  NA  √  NA  √  √ 
TCGACOAD  Colon adenocarcinoma  结肠癌  √  √  √  √  √  √  √  √  √  √ 
TCGADLBC  Lymphoid Neoplasm Diffuse Large Bcell Lymphoma  弥漫性大B细胞淋巴瘤  √  √  √  √  √  NA  √  NA  √  √ 
TCGAESCA  Esophageal carcinoma  食管癌  √  √  √  √  √  NA  √  NA  √  √ 
TCGAGBM  Glioblastoma multiforme  多形成性胶质细胞瘤  √  √  √  √  √  √  √  NA  √  √ 
TCGAHNSC  Head and Neck squamous cell carcinoma  头颈鳞状细胞癌  √  √  √  √  √  √  √  NA  √  √ 
TCGAKICH  Kidney Chromophobe  肾嫌色细胞癌  √  √  √  √  √  √  √  NA  √  √ 
TCGAKIRC  Kidney renal clear cell carcinoma  肾透明细胞癌  √  √  √  √  √  √  √  NA  √  √ 
TCGAKIRP  Kidney renal papillary cell carcinoma  肾乳头状细胞癌  √  √  √  √  √  √  √  NA  √  √ 
TCGALAML  Acute Myeloid Leukemia  急性髓细胞样白血病  √  √  √  √  √  NA  NA  NA  √  √ 
TCGALGG  Brain Lower Grade Glioma  脑低级别胶质瘤  √  √  √  √  √  √  √  NA  √  √ 
TCGALIHC  Liver hepatocellular carcinoma  肝细胞肝癌  √  √  √  √  √  √  √  NA  √  √ 
TCGALUAD  Lung adenocarcinoma  肺腺癌  √  √  √  √  √  √  √  NA  √  √ 
TCGALUSC  Lung squamous cell carcinoma  肺鳞癌  √  √  √  √  √  √  √  NA  √  √ 
TCGAMESO  Mesothelioma  间皮瘤  √  √  √  √  √  NA  √  NA  √  √ 
TCGAOV  Ovarian serous cystadenocarcinoma  卵巢浆液性囊腺癌  √  √  √  √  √  √  √  √  √  √ 
TCGAPAAD  Pancreatic adenocarcinoma  胰腺癌  √  √  √  √  √  NA  √  NA  √  √ 
TCGAPCPG  Pheochromocytoma and Paraganglioma  嗜铬细胞瘤和副神经节瘤  √  √  √  √  √  NA  √  NA  √  √ 
TCGAPRAD  Prostate adenocarcinoma  前列腺癌  √  √  √  √  √  √  √  NA  √  √ 
TCGAREAD  Rectum adenocarcinoma  直肠腺癌  √  √  √  √  √  √  √  √  √  √ 
TCGASARC  Sarcoma  肉瘤  √  √  √  √  √  NA  √  NA  √  √ 
TCGASKCM  Skin Cutaneous Melanoma  皮肤黑色素瘤  √  √  √  √  √  √  √  NA  √  √ 
TCGASTAD  Stomach adenocarcinoma  胃癌  √  √  √  √  √  √  √  NA  √  √ 
TCGATGCT  Testicular Germ Cell Tumors  睾丸癌  √  √  √  √  √  NA  √  NA  √  √ 
TCGATHCA  Thyroid carcinoma  甲状腺癌  √  √  √  √  √  √  √  NA  √  √ 
TCGATHYM  Thymoma  胸腺癌  √  √  √  √  √  NA  √  NA  √  √ 
TCGAUCEC  Uterine Corpus Endometrial Carcinoma  子宫内膜癌  √  √  √  √  √  √  √  NA  √  √ 
TCGAUCS  Uterine Carcinosarcoma  子宫肉瘤  √  √  √  √  √  NA  √  NA  √  √ 
TCGAUVM  Uveal Melanoma  葡萄膜黑色素瘤  √  √  √  √  √  NA  √  NA  √  √ 
Reference
[1]. Kleinbaum D G, Klein M. KaplanMeier survival curves and the logrank test[M]//Survival analysis. Springer, New York, NY, 2012: 5596.
[2]. Christensen E. Multivariate survival analysis using Cox's regression model[J]. Hepatology, 1987, 7(6): 13461358.
[3]. The Cox model in R Gardar Sveinbjornsson, Jongkil Kim, Yongsheng Wang April 18, 2011
[4]. Xue Y, Schifano E D. Diagnostics for the Cox model[J]. Communications for Statistical Applications and Methods, 2017, 24(6): 583604.
[5]. Harrell Jr F E, Lee K L, Mark D B. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors[J]. Statistics in medicine, 1996, 15(4): 361387.
[6]. Hoadley K A, Yau C, Hinoue T, et al. Celloforigin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer[J]. Cell, 2018, 173(2): 291304. e6.
[7]. SanchezVega F, Mina M, Armenia J, et al. Oncogenic signaling pathways in the cancer genome atlas[J]. Cell, 2018, 173(2): 321337. e10.