Integration and comparison of different genomic data for outcome prediction in cancer

Hugo Gómez-Rueda, Emmanuel Martínez-Ledesma, Antonio Martínez-Torteya, Rebeca Palacios-Corona, Victor Trevino

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

© 2015 Gómez-Rueda et al. Background: In cancer, large-scale technologies such as next-generation sequencing and microarrays have produced a wide number of genomic features such as DNA copy number alterations (CNA), mRNA expression (EXPR), microRNA expression (MIRNA), and DNA somatic mutations (MUT), among others. Several analyses of a specific type of these genomic data have generated many prognostic biomarkers in cancer. However, it is uncertain which of these data is more powerful and whether the best data-type is cancer-type dependent. Therefore, our purpose is to characterize the prognostic power of models obtained from different genomic data types, cancer types, and algorithms. For this, we compared the prognostic power using the concordance and prognostic index of models obtained from EXPR, MIRNA, CNA, MUT data and their integration for ovarian serous cystadenocarcinoma (OV), multiform glioblastoma (GBM), lung adenocarcinoma (LUAD), and breast cancer (BRCA) datasets from The Cancer Genome Atlas repository. We used three different algorithms for prognostic model selection based on constrained particle swarm optimization (CPSO), network feature selection (NFS), and least absolute shrinkage and selection operator (LASSO). Results: The integration of the four genomic data produced models having slightly higher performance than any single genomic data. From the genomic data types, we observed better prediction using EXPR closely followed by MIRNA and CNA depending on the cancer type and method. We observed higher concordance index in BRCA, followed by LUAD, OV, and GBM. We observed very similar results between LASSO and CPSO but smaller values in NFS. Importantly, we observed that model predictions highly concur between algorithms but are highly discordant between data types, which seems to be dependent on the censoring rate of the dataset. Conclusions: Gene expression (mRNA) generated higher performances, which is marginally improved when other type of genomic data is considered. The level of concordance in prognosis generated from different genomic data types seems to be dependent on censoring rate.
Original languageEnglish
JournalBioData Mining
DOIs
Publication statusPublished - 29 Oct 2015
Externally publishedYes

Fingerprint

Genomics
Cancer
MicroRNAs
Prediction
Constrained optimization
Neoplasms
Glioblastoma
Particle swarm optimization (PSO)
Mathematical operators
Feature extraction
Lung Neoplasms
MicroRNA
DNA
Concordance
Serous Cystadenocarcinoma
Breast Neoplasms
Messenger RNA
Mutation
Atlases
Biomarkers

All Science Journal Classification (ASJC) codes

  • Biochemistry
  • Molecular Biology
  • Genetics
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Gómez-Rueda, Hugo ; Martínez-Ledesma, Emmanuel ; Martínez-Torteya, Antonio ; Palacios-Corona, Rebeca ; Trevino, Victor. / Integration and comparison of different genomic data for outcome prediction in cancer. In: BioData Mining. 2015.
@article{6d51d182dd9b4d9c8eda543b6760c709,
title = "Integration and comparison of different genomic data for outcome prediction in cancer",
abstract = "{\circledC} 2015 G{\'o}mez-Rueda et al. Background: In cancer, large-scale technologies such as next-generation sequencing and microarrays have produced a wide number of genomic features such as DNA copy number alterations (CNA), mRNA expression (EXPR), microRNA expression (MIRNA), and DNA somatic mutations (MUT), among others. Several analyses of a specific type of these genomic data have generated many prognostic biomarkers in cancer. However, it is uncertain which of these data is more powerful and whether the best data-type is cancer-type dependent. Therefore, our purpose is to characterize the prognostic power of models obtained from different genomic data types, cancer types, and algorithms. For this, we compared the prognostic power using the concordance and prognostic index of models obtained from EXPR, MIRNA, CNA, MUT data and their integration for ovarian serous cystadenocarcinoma (OV), multiform glioblastoma (GBM), lung adenocarcinoma (LUAD), and breast cancer (BRCA) datasets from The Cancer Genome Atlas repository. We used three different algorithms for prognostic model selection based on constrained particle swarm optimization (CPSO), network feature selection (NFS), and least absolute shrinkage and selection operator (LASSO). Results: The integration of the four genomic data produced models having slightly higher performance than any single genomic data. From the genomic data types, we observed better prediction using EXPR closely followed by MIRNA and CNA depending on the cancer type and method. We observed higher concordance index in BRCA, followed by LUAD, OV, and GBM. We observed very similar results between LASSO and CPSO but smaller values in NFS. Importantly, we observed that model predictions highly concur between algorithms but are highly discordant between data types, which seems to be dependent on the censoring rate of the dataset. Conclusions: Gene expression (mRNA) generated higher performances, which is marginally improved when other type of genomic data is considered. The level of concordance in prognosis generated from different genomic data types seems to be dependent on censoring rate.",
author = "Hugo G{\'o}mez-Rueda and Emmanuel Mart{\'i}nez-Ledesma and Antonio Mart{\'i}nez-Torteya and Rebeca Palacios-Corona and Victor Trevino",
year = "2015",
month = "10",
day = "29",
doi = "10.1186/s13040-015-0065-1",
language = "English",
journal = "BioData Mining",
issn = "1756-0381",
publisher = "BioMed Central",

}

Integration and comparison of different genomic data for outcome prediction in cancer. / Gómez-Rueda, Hugo; Martínez-Ledesma, Emmanuel; Martínez-Torteya, Antonio; Palacios-Corona, Rebeca; Trevino, Victor.

In: BioData Mining, 29.10.2015.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Integration and comparison of different genomic data for outcome prediction in cancer

AU - Gómez-Rueda, Hugo

AU - Martínez-Ledesma, Emmanuel

AU - Martínez-Torteya, Antonio

AU - Palacios-Corona, Rebeca

AU - Trevino, Victor

PY - 2015/10/29

Y1 - 2015/10/29

N2 - © 2015 Gómez-Rueda et al. Background: In cancer, large-scale technologies such as next-generation sequencing and microarrays have produced a wide number of genomic features such as DNA copy number alterations (CNA), mRNA expression (EXPR), microRNA expression (MIRNA), and DNA somatic mutations (MUT), among others. Several analyses of a specific type of these genomic data have generated many prognostic biomarkers in cancer. However, it is uncertain which of these data is more powerful and whether the best data-type is cancer-type dependent. Therefore, our purpose is to characterize the prognostic power of models obtained from different genomic data types, cancer types, and algorithms. For this, we compared the prognostic power using the concordance and prognostic index of models obtained from EXPR, MIRNA, CNA, MUT data and their integration for ovarian serous cystadenocarcinoma (OV), multiform glioblastoma (GBM), lung adenocarcinoma (LUAD), and breast cancer (BRCA) datasets from The Cancer Genome Atlas repository. We used three different algorithms for prognostic model selection based on constrained particle swarm optimization (CPSO), network feature selection (NFS), and least absolute shrinkage and selection operator (LASSO). Results: The integration of the four genomic data produced models having slightly higher performance than any single genomic data. From the genomic data types, we observed better prediction using EXPR closely followed by MIRNA and CNA depending on the cancer type and method. We observed higher concordance index in BRCA, followed by LUAD, OV, and GBM. We observed very similar results between LASSO and CPSO but smaller values in NFS. Importantly, we observed that model predictions highly concur between algorithms but are highly discordant between data types, which seems to be dependent on the censoring rate of the dataset. Conclusions: Gene expression (mRNA) generated higher performances, which is marginally improved when other type of genomic data is considered. The level of concordance in prognosis generated from different genomic data types seems to be dependent on censoring rate.

AB - © 2015 Gómez-Rueda et al. Background: In cancer, large-scale technologies such as next-generation sequencing and microarrays have produced a wide number of genomic features such as DNA copy number alterations (CNA), mRNA expression (EXPR), microRNA expression (MIRNA), and DNA somatic mutations (MUT), among others. Several analyses of a specific type of these genomic data have generated many prognostic biomarkers in cancer. However, it is uncertain which of these data is more powerful and whether the best data-type is cancer-type dependent. Therefore, our purpose is to characterize the prognostic power of models obtained from different genomic data types, cancer types, and algorithms. For this, we compared the prognostic power using the concordance and prognostic index of models obtained from EXPR, MIRNA, CNA, MUT data and their integration for ovarian serous cystadenocarcinoma (OV), multiform glioblastoma (GBM), lung adenocarcinoma (LUAD), and breast cancer (BRCA) datasets from The Cancer Genome Atlas repository. We used three different algorithms for prognostic model selection based on constrained particle swarm optimization (CPSO), network feature selection (NFS), and least absolute shrinkage and selection operator (LASSO). Results: The integration of the four genomic data produced models having slightly higher performance than any single genomic data. From the genomic data types, we observed better prediction using EXPR closely followed by MIRNA and CNA depending on the cancer type and method. We observed higher concordance index in BRCA, followed by LUAD, OV, and GBM. We observed very similar results between LASSO and CPSO but smaller values in NFS. Importantly, we observed that model predictions highly concur between algorithms but are highly discordant between data types, which seems to be dependent on the censoring rate of the dataset. Conclusions: Gene expression (mRNA) generated higher performances, which is marginally improved when other type of genomic data is considered. The level of concordance in prognosis generated from different genomic data types seems to be dependent on censoring rate.

U2 - 10.1186/s13040-015-0065-1

DO - 10.1186/s13040-015-0065-1

M3 - Article

JO - BioData Mining

JF - BioData Mining

SN - 1756-0381

ER -