UCSD Homepage



We study gene regulation and cellular behavior by developing statistical and experimental methods. Our primary goal is to develop new technologies to map molecular networks, including RNA-RNA interactome, RNA-chromatin interactome, and protein-protein interactome. Our secondary quest is to model the variations of these networks in three axes, namely developmental time, personal difference, and evolutionary change. Our major tools include epigenomic and single-cell assays, statistical modeling, and large scale computation.   

We discovered transposon-mediated re-wiring of transcription networks that govern pre-implantation embryonic development [Genome Res, 2010, cover; Research Highlight in Nature, 2010]. We contributed to initiating "comparative epigenomics", a research field that studies genomic functions by cross-species epigenomic comparison [Cell, 2012]. We contributed to the derivation of the rules of dynamic gene regulation and temporal epigenomic changes [Genome Res, 2013, cover]. We pioneered in modeling the impact of epigenome-genome interaction to transcription factor binding, and to personal variation [PLoS Comp Biol, 2013].  


Single cell analysis of cell fate

An important question to cell biology is how cells break the symmetry during mitotic divisions. During mammalian preimplantation embryonic development, the embryo has to decide how to set apart the first two cell populations. It remains an open question when and how the first cell fate decision is made. Cell-fate associated inter-blastomere differences of transcript and protein concentrations were reported from as early as the 8-16 cell stage. However, it is not clear whether these are the earliest differences. Using deep single-cell RNA-seq of matched sister blastomeres, we found highly reproducible differences among the single cells within early stage (2- and 4-cell) mouse embryos [Genome Res, 2014, cover].

Single-cell data, especially time-course single cell transcriptomic data demand new statistical methods. We developed a time-variant clustering method for this need [PNAS, 2014]. Time-variant clustering is a Hidden Branching Process. At each time point, this model degenerates into a finite mixture model.

Evolution of mammalian gene regulatory networks

Gene regulation involves coordinated interactions of many proteins and DNA segments, namely gene regulatory networks (GRNs). We study the structure, dynamics, and evolution of GRNs and how GRNs influence cellular behaviors, including stem cell differentiation and cancer formation.

We brought evolutionary biology ideas into elucidating GRN structures and functions in mammals. We developed methods to describe the evolutionary changes of different components of mammalian GRNs, including transcription factor binding sites (TFBS) and TFBS modules [Genome Res, 18:1325-1335], co-expression modules of genes [PLoS Comp Biol, 6(3): e1000707][Nucl Acids Res, 35: W105-W114], protein-protein interactions [Genome Res, 20: 804-815], transcription factor (TF)-DNA interactions [Genome Res, 20: 804-815], and epigenomes [Cell, 149: 1381-1391]. These studies contributed empirical data and derived initial rules underlying GRN functions.

On the theoretical end, we developed an evolutionary model that simultaneous describes the evolutionary changes of multiple components of a GRN [PLoS Comp Biol, 7(6): e1002064]. This model enables using multi-species DNA and gene expression data for simultaneous identification of GRNs in every species under consideration.

Personal variation and evolutionary change are linked. We developed a computational tool, perEdit [Bioinformatics, 27: 3427-3429], to assemble both alleles of a personal genome. Personal ChIP-seq and RNA-seq data can then mapped to the individual genome and thus identifying individual variation and allele differences.  

Genetic and epigenetic re-wiring of transcription networks

We reported that close to forty percent of the genes shared by humans, mice and cows have different expression patterns in the early stages of embryonic development. We traced these differences to a set of specific evolutionary changes of the genomes, including insersion of transcription factor binding sites by transposons. This work suggested that more than one GRN can guide mammalian preimplantation development. See cover article in Genome Research, and research highlight in Nature.

Only a small fraction of the interspecies differences in gene expression could be traced to genomic differences. This prompted us to compare epigenomes. Comparing epigenomes in human, mouse, and pig pluripotent stem cells, we found 5-mC, H3K27ac, and H3K36me3 to be conserved in both negatively and positively selected genomic sequences. We reported the conservation of co-localization of eigenomic marks as an indicator of cis-regulatory sequences. Combined with cell differentiation experiments, we identified a different class of "poised promoters" marked by H2A.Z (repressive) and H3K4me3 (active) [Cell, 149: 1381-1391].  We developed a Comparative Epigenome Browser to allow interactive visualization and analysis of the multi-species epigenomes [Bioinformatics, 29: 1223-1225]. 

Thermodynamic modeling of interactions among transcription factors, DNA, and epigenome

Transcription factor (TF) - DNA interaction is at the core of transcriptional regulation. We developed methods to efficiently calculate TF-DNA binding affinities for a long stretch (200-500bp) of genomic sequence, taking into account interactions between strong and weak, homotypical and heterotypic TF binding sites [PLoS ONE, 4(12): e8155][BMC Genomics, 9:S18]. These methods led to the utility of high-throughput sequencing data for reconstruction of a transcription network [BMC Genomics, 9:S19] and the discovery of a second DNA recognition motif of Nanog [PLoS ONE, 4(12): e8155], which was verified by subsequent studies from our group [Genome Res, 20: 804-815] and others [Nucl Acids Res, 2012].

We introduced a model to calculate the TF-DNA binding energy in the presence of epigenomic modifications. This model shows theoretically epigenomic modifications can boost the cooperativity of nearby binding sites, and more importantly, personal variations of TF binding can be explained by personal epigenome and personal genome data [PLoS Comp Biol, 9(12): e1003367].

Temporal epigenomic changes and dynamics of gene expresssion

We developed a probabilistic model to annotate the genome using temporal epigenomic data. This model clusters genomic sequences based on the similarity of temporal changes of multiple epigenomic marks during a cellular differentiation process [Genome Res, 23:352-384]. Also see cover description. With this model, we found that temporal changes of H3K4me2, unmethylated CpG, and H2A.Z were predictive of 5-hmC changes, suggesting unmethylated CpG and H3K4me2 as potential upstream signals guiding TETs to specific sequences. Several rules on combinatorial epigenomic changes and their effects on mRNA expression and ncRNA expression were derived, including a simple rule governing the relationship between 5-hmC and gene expression levels.

We developed statistical methods to model temporal gene expression data, allowing for identifying different temporal expression patterns [Bioinformatics 26: 2944-2951] and dissecting subpopulations of cell types is a heterogeneous cell population [PLoS Comp Biol, 5: e1000607]. The latter method led to the discovery of regulatory function of chromatin remodeling protein SMARCAD1 in embryonic stem cells.

We developed a method to minimize lab-to-lab variations in identifying differentially expressed genes and identified colorectal cancer specific genes [Nature Biotech, 24(12): 6-7].   

Research support from

NIH, NSF, March of Dimes Foundation

Copyright 2012 Zhong Lab. All rights reserved. Last updated: Jan 12, 2012