Quantification of the heterogeneity of prognostic cellular biomarkers in ewing sarcoma using automated image and random survival forest analysis

Claudia Bühnemann, Simon Li, Haiyue Yu, Harriet Branford White, Karl L. Schäfer, Antonio Llombart-Bosch, Isidro Machado, Piero Picci, Pancras C W Hogendoorn, Nicholas A. Athanasou, J. Alison Noble, A. Bassim Hassan

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Driven by genomic somatic variation, tumour tissues are typically heterogeneous, yet unbiased quantitative methods are rarely used to analyse heterogeneity at the protein level. Motivated by this problem, we developed automated image segmentation of images of multiple biomarkers in Ewing sarcoma to generate distributions of biomarkers between and within tumour cells. We further integrate high dimensional data with patient clinical outcomes utilising random survival forest (RSF) machine learning. Using material from cohorts of genetically diagnosed Ewing sarcoma with EWSR1 chromosomal translocations, confocal images of tissue microarrays were segmented with level sets and watershed algorithms. Each cell nucleus and cytoplasm were identified in relation to DAPI and CD99, respectively, and protein biomarkers (e.g. Ki67, pS6, Foxo3a, EGR1, MAPK) localised relative to nuclear and cytoplasmic regions of each cell in order to generate image feature distributions. The image distribution features were analysed with RSF in relation to known overall patient survival from three separate cohorts (185 informative cases). Variation in pre-analytical processing resulted in elimination of a high number of non-informative images that had poor DAPI localisation or biomarker preservation (67 cases, 36%). The distribution of image features for biomarkers in the remaining high quality material (118 cases, 104features per case) were analysed by RSF with feature selection, and performance assessed using internal cross-validation, rather than a separate validation cohort. A prognostic classifier for Ewing sarcoma with low cross-validation error rates (0.36) was comprised of multiple features, including the Ki67 proliferative marker and a sub-population of cells with low cytoplasmic/ nuclear ratio of CD99. Through elimination of bias, the evaluation of high-dimensionality biomarker distribution within cell populations of a tumour using random forest analysis in quality controlled tumour material could be achieved. Such an automated and integrated methodology has potential application in the identification of prognostic classifiers based on tumour cell heterogeneity.

Original languageEnglish
Article numbere107105
JournalPLoS One
Volume9
Issue number9
DOIs
Publication statusPublished - Sep 22 2014

Fingerprint

Ewing's Sarcoma
sarcoma
Biomarkers
Survival Analysis
biomarkers
Tumors
Cells
Survival
Neoplasms
neoplasms
Classifiers
Tissue
clonal variation
Genetic Translocation
artificial intelligence
cells
cell nucleus
Microarrays
Watersheds
Cell Nucleus

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

Quantification of the heterogeneity of prognostic cellular biomarkers in ewing sarcoma using automated image and random survival forest analysis. / Bühnemann, Claudia; Li, Simon; Yu, Haiyue; White, Harriet Branford; Schäfer, Karl L.; Llombart-Bosch, Antonio; Machado, Isidro; Picci, Piero; Hogendoorn, Pancras C W; Athanasou, Nicholas A.; Noble, J. Alison; Hassan, A. Bassim.

In: PLoS One, Vol. 9, No. 9, e107105, 22.09.2014.

Research output: Contribution to journalArticle

Bühnemann, C, Li, S, Yu, H, White, HB, Schäfer, KL, Llombart-Bosch, A, Machado, I, Picci, P, Hogendoorn, PCW, Athanasou, NA, Noble, JA & Hassan, AB 2014, 'Quantification of the heterogeneity of prognostic cellular biomarkers in ewing sarcoma using automated image and random survival forest analysis', PLoS One, vol. 9, no. 9, e107105. https://doi.org/10.1371/journal.pone.0107105
Bühnemann, Claudia ; Li, Simon ; Yu, Haiyue ; White, Harriet Branford ; Schäfer, Karl L. ; Llombart-Bosch, Antonio ; Machado, Isidro ; Picci, Piero ; Hogendoorn, Pancras C W ; Athanasou, Nicholas A. ; Noble, J. Alison ; Hassan, A. Bassim. / Quantification of the heterogeneity of prognostic cellular biomarkers in ewing sarcoma using automated image and random survival forest analysis. In: PLoS One. 2014 ; Vol. 9, No. 9.
@article{be5f586592684209ab203d96032adfb6,
title = "Quantification of the heterogeneity of prognostic cellular biomarkers in ewing sarcoma using automated image and random survival forest analysis",
abstract = "Driven by genomic somatic variation, tumour tissues are typically heterogeneous, yet unbiased quantitative methods are rarely used to analyse heterogeneity at the protein level. Motivated by this problem, we developed automated image segmentation of images of multiple biomarkers in Ewing sarcoma to generate distributions of biomarkers between and within tumour cells. We further integrate high dimensional data with patient clinical outcomes utilising random survival forest (RSF) machine learning. Using material from cohorts of genetically diagnosed Ewing sarcoma with EWSR1 chromosomal translocations, confocal images of tissue microarrays were segmented with level sets and watershed algorithms. Each cell nucleus and cytoplasm were identified in relation to DAPI and CD99, respectively, and protein biomarkers (e.g. Ki67, pS6, Foxo3a, EGR1, MAPK) localised relative to nuclear and cytoplasmic regions of each cell in order to generate image feature distributions. The image distribution features were analysed with RSF in relation to known overall patient survival from three separate cohorts (185 informative cases). Variation in pre-analytical processing resulted in elimination of a high number of non-informative images that had poor DAPI localisation or biomarker preservation (67 cases, 36{\%}). The distribution of image features for biomarkers in the remaining high quality material (118 cases, 104features per case) were analysed by RSF with feature selection, and performance assessed using internal cross-validation, rather than a separate validation cohort. A prognostic classifier for Ewing sarcoma with low cross-validation error rates (0.36) was comprised of multiple features, including the Ki67 proliferative marker and a sub-population of cells with low cytoplasmic/ nuclear ratio of CD99. Through elimination of bias, the evaluation of high-dimensionality biomarker distribution within cell populations of a tumour using random forest analysis in quality controlled tumour material could be achieved. Such an automated and integrated methodology has potential application in the identification of prognostic classifiers based on tumour cell heterogeneity.",
author = "Claudia B{\"u}hnemann and Simon Li and Haiyue Yu and White, {Harriet Branford} and Sch{\"a}fer, {Karl L.} and Antonio Llombart-Bosch and Isidro Machado and Piero Picci and Hogendoorn, {Pancras C W} and Athanasou, {Nicholas A.} and Noble, {J. Alison} and Hassan, {A. Bassim}",
year = "2014",
month = "9",
day = "22",
doi = "10.1371/journal.pone.0107105",
language = "English",
volume = "9",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "9",

}

TY - JOUR

T1 - Quantification of the heterogeneity of prognostic cellular biomarkers in ewing sarcoma using automated image and random survival forest analysis

AU - Bühnemann, Claudia

AU - Li, Simon

AU - Yu, Haiyue

AU - White, Harriet Branford

AU - Schäfer, Karl L.

AU - Llombart-Bosch, Antonio

AU - Machado, Isidro

AU - Picci, Piero

AU - Hogendoorn, Pancras C W

AU - Athanasou, Nicholas A.

AU - Noble, J. Alison

AU - Hassan, A. Bassim

PY - 2014/9/22

Y1 - 2014/9/22

N2 - Driven by genomic somatic variation, tumour tissues are typically heterogeneous, yet unbiased quantitative methods are rarely used to analyse heterogeneity at the protein level. Motivated by this problem, we developed automated image segmentation of images of multiple biomarkers in Ewing sarcoma to generate distributions of biomarkers between and within tumour cells. We further integrate high dimensional data with patient clinical outcomes utilising random survival forest (RSF) machine learning. Using material from cohorts of genetically diagnosed Ewing sarcoma with EWSR1 chromosomal translocations, confocal images of tissue microarrays were segmented with level sets and watershed algorithms. Each cell nucleus and cytoplasm were identified in relation to DAPI and CD99, respectively, and protein biomarkers (e.g. Ki67, pS6, Foxo3a, EGR1, MAPK) localised relative to nuclear and cytoplasmic regions of each cell in order to generate image feature distributions. The image distribution features were analysed with RSF in relation to known overall patient survival from three separate cohorts (185 informative cases). Variation in pre-analytical processing resulted in elimination of a high number of non-informative images that had poor DAPI localisation or biomarker preservation (67 cases, 36%). The distribution of image features for biomarkers in the remaining high quality material (118 cases, 104features per case) were analysed by RSF with feature selection, and performance assessed using internal cross-validation, rather than a separate validation cohort. A prognostic classifier for Ewing sarcoma with low cross-validation error rates (0.36) was comprised of multiple features, including the Ki67 proliferative marker and a sub-population of cells with low cytoplasmic/ nuclear ratio of CD99. Through elimination of bias, the evaluation of high-dimensionality biomarker distribution within cell populations of a tumour using random forest analysis in quality controlled tumour material could be achieved. Such an automated and integrated methodology has potential application in the identification of prognostic classifiers based on tumour cell heterogeneity.

AB - Driven by genomic somatic variation, tumour tissues are typically heterogeneous, yet unbiased quantitative methods are rarely used to analyse heterogeneity at the protein level. Motivated by this problem, we developed automated image segmentation of images of multiple biomarkers in Ewing sarcoma to generate distributions of biomarkers between and within tumour cells. We further integrate high dimensional data with patient clinical outcomes utilising random survival forest (RSF) machine learning. Using material from cohorts of genetically diagnosed Ewing sarcoma with EWSR1 chromosomal translocations, confocal images of tissue microarrays were segmented with level sets and watershed algorithms. Each cell nucleus and cytoplasm were identified in relation to DAPI and CD99, respectively, and protein biomarkers (e.g. Ki67, pS6, Foxo3a, EGR1, MAPK) localised relative to nuclear and cytoplasmic regions of each cell in order to generate image feature distributions. The image distribution features were analysed with RSF in relation to known overall patient survival from three separate cohorts (185 informative cases). Variation in pre-analytical processing resulted in elimination of a high number of non-informative images that had poor DAPI localisation or biomarker preservation (67 cases, 36%). The distribution of image features for biomarkers in the remaining high quality material (118 cases, 104features per case) were analysed by RSF with feature selection, and performance assessed using internal cross-validation, rather than a separate validation cohort. A prognostic classifier for Ewing sarcoma with low cross-validation error rates (0.36) was comprised of multiple features, including the Ki67 proliferative marker and a sub-population of cells with low cytoplasmic/ nuclear ratio of CD99. Through elimination of bias, the evaluation of high-dimensionality biomarker distribution within cell populations of a tumour using random forest analysis in quality controlled tumour material could be achieved. Such an automated and integrated methodology has potential application in the identification of prognostic classifiers based on tumour cell heterogeneity.

UR - http://www.scopus.com/inward/record.url?scp=84907190330&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84907190330&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0107105

DO - 10.1371/journal.pone.0107105

M3 - Article

VL - 9

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 9

M1 - e107105

ER -