Single cell analysis of cell fate
An important question to cell biology is how cells break the symmetry during mitotic divisions. During mammalian preimplantation embryonic development, the embryo has to decide how to set apart the first two cell populations. It remains an open question when and how the first cell fate decision is made. Cell-fate associated inter-blastomere differences of transcript and protein concentrations were reported from as early as the 8-16 cell stage. However, it is not clear whether these are the earliest differences. Using deep single-cell RNA-seq of matched sister blastomeres, we found highly reproducible differences among the single cells within early stage (2- and 4-cell) mouse embryos [Genome Res,
Single-cell data, especially time-course single cell transcriptomic data demand new statistical methods. We developed a time-variant clustering method for this need [PNAS,
2014]. Time-variant clustering is a Hidden Branching Process. At each
time point, this model degenerates into a finite mixture model.
Evolution of mammalian gene regulatory networks
Gene regulation involves coordinated interactions of
many proteins and DNA segments, namely gene regulatory networks
(GRNs). We study the structure, dynamics, and evolution of GRNs and
how GRNs influence cellular behaviors, including stem cell
differentiation and cancer formation.
We brought evolutionary biology ideas into elucidating
GRN structures and functions in mammals. We developed methods to
describe the evolutionary changes of different components of mammalian
GRNs, including transcription factor binding sites (TFBS) and TFBS
Res, 18:1325-1335], co-expression modules of genes [PLoS
Comp Biol, 6(3): e1000707][Nucl Acids
Res, 35: W105-W114], protein-protein interactions [Genome
Res, 20: 804-815], transcription factor (TF)-DNA interactions
Res, 20: 804-815], and epigenomes [Cell,
149: 1381-1391]. These studies contributed empirical data and derived
initial rules underlying GRN functions.
On the theoretical end, we developed an evolutionary
model that simultaneous describes the evolutionary changes of multiple
components of a GRN [PLoS Comp Biol,
7(6): e1002064]. This model enables using multi-species DNA and gene
expression data for simultaneous identification of GRNs in every species
Personal variation and evolutionary change are linked.
We developed a computational tool, perEdit [Bioinformatics,
27: 3427-3429], to assemble both alleles of a personal genome.
Personal ChIP-seq and RNA-seq data can then mapped to the individual
genome and thus identifying individual variation and allele
Genetic and epigenetic re-wiring of transcription
We reported that close to forty percent of the genes shared by humans, mice and cows have different expression patterns in the early stages of embryonic development. We traced these differences to a set of specific evolutionary changes of the genomes,
including insersion of transcription factor binding sites by transposons.
This work suggested that more than one GRN can guide mammalian
preimplantation development. See cover article in Genome Research, and research highlight in Nature.
Only a small fraction of the interspecies differences in gene
expression could be traced to genomic differences. This prompted
us to compare epigenomes. Comparing epigenomes in human, mouse,
and pig pluripotent stem cells, we found 5-mC, H3K27ac, and H3K36me3
to be conserved in both negatively and positively selected
genomic sequences. We reported the conservation of
co-localization of eigenomic marks as an indicator of
cis-regulatory sequences. Combined with cell differentiation
experiments, we identified a different class of "poised
promoters" marked by H2A.Z (repressive) and H3K4me3 (active) [Cell,
149: 1381-1391]. We developed a Comparative Epigenome
Browser to allow interactive visualization and analysis of
the multi-species epigenomes [Bioinformatics, 29: 1223-1225].
Thermodynamic modeling of interactions among
transcription factors, DNA, and epigenome
Transcription factor (TF) - DNA interaction is at the
core of transcriptional regulation. We developed methods to
efficiently calculate TF-DNA binding affinities for a long stretch
(200-500bp) of genomic sequence, taking into account interactions
between strong and weak, homotypical and heterotypic TF binding sites
ONE, 4(12): e8155][BMC
Genomics, 9:S18]. These methods led to the utility of
high-throughput sequencing data for reconstruction of a
transcription network [BMC
Genomics, 9:S19] and the discovery of a second DNA
recognition motif of Nanog [PLoS
ONE, 4(12): e8155], which was verified by subsequent studies
from our group [Genome
Res, 20: 804-815] and others [Nucl
Acids Res, 2012].
We introduced a model to calculate the TF-DNA binding
energy in the presence of epigenomic modifications. This model shows
theoretically epigenomic modifications can boost the cooperativity of
nearby binding sites, and more importantly, personal variations of TF
binding can be explained by personal epigenome and personal genome
Comp Biol, 9(12): e1003367].
Temporal epigenomic changes and dynamics of gene
We developed a probabilistic model to annotate the
genome using temporal epigenomic data. This model clusters genomic sequences based on the similarity of temporal changes of multiple epigenomic marks during a cellular differentiation process
Res, 23:352-384]. Also see cover description. With this model, we found that temporal changes of H3K4me2, unmethylated CpG, and H2A.Z were predictive of 5-hmC changes, suggesting unmethylated CpG and H3K4me2 as potential upstream signals guiding TETs to specific sequences. Several rules on combinatorial epigenomic changes and their effects on mRNA expression and ncRNA expression were derived, including a simple rule governing the relationship between 5-hmC and gene expression levels.
We developed statistical methods to model temporal
gene expression data, allowing for identifying different temporal
expression patterns [Bioinformatics 26: 2944-2951] and dissecting subpopulations of cell types is a
heterogeneous cell population [PLoS
Comp Biol, 5: e1000607]. The latter method led to the
discovery of regulatory function of chromatin remodeling protein
SMARCAD1 in embryonic stem cells.
We developed a method to minimize lab-to-lab
variations in identifying differentially expressed genes and
identified colorectal cancer specific genes [Nature
Biotech, 24(12): 6-7].
Research support from
NIH, NSF, March of Dimes Foundation