Article Abstract

A recursively partitioned mixture model for clustering time-course gene expression data

Authors: Devin C. Koestler, Carmen J. Marsit, Brock C. Christensen, Karl T. Kelsey, E. Andres Houseman

Abstract

Background: Longitudinally collected gene expression data provides an opportunity to investigate the dynamic behavior of gene expression and is crucial for establishing causal links between changes on a molecular level and disease development and progression. In terms of the analysis of such data, clustering of subjects based on time-course expression data may improve our understanding of temporal expression patterns that result in disease phenotypes. Although there are numerous existing methods for clustering subjects using gene expression data, most are not suitable when expression measurements are repeatedly collected over a time-course.
Methods: We present a modified version of the recursively partitioned mixture model (RPMM) for clustering subjects based on longitudinally collected gene expression data. In the proposed time-course RPMM (TC-RPMM), subjects are clustered on the basis of their temporal profiles of gene expression using a mixture of mixed effects models framework. This framework captures changes in gene expression over time and models the autocorrelation between repeated gene expression measurements for the same subject. We assessed the performance of TC-RPMM using extensive simulation studies and a dataset from a multi-center research study of inflammation and response to injury (www.gluegrant.org), which consisted of time-course gene expression data for 140 subjects.
Results: Our simulation studies encompassed several different scenarios and were aimed at assessing the ability of TC-RPMM to correctly recover true class memberships when the expression trajectories that characterized those classes differed. Overall, our simulation studies revealed favorable performance of TC-RPMM compared to competing approaches, however clustering performance was observed to be highly dependent on the proportion of class discriminating genes used in clustering analysis. When applied to real epidemiologic data with repeated-measures, longitudinal gene expression measurements, TC-RPMM identified clusters that had strong biological and clinical significance.
Conclusions: Methods for clustering subjects based on temporal gene expression profiles is a high priority for molecular biology and bioinformatics research. Along these lines, the proposed TC-RPMM represents a promising new approach for analyzing time-course gene expression data.

Refbacks

  • There are currently no refbacks.