We study gene regulation and cellular behavior by developing statistical and experimental methods. Our primary goal is to develop new technologies to map molecular networks, including RNA-RNA interactome [Nat Comm, 2016], RNA-chromatin interactome [Curr Biol, 2017], and protein-protein interactome. Our secondary quest is to model the variations of these networks in three axes, namely developmental time, personal difference, and evolutionary change. Our major tools include epigenomic and single-cell assays, single-molecule imaging, statistical modeling, and large scale computation.


RNA-RNA interaction

Mapping RNA-RNA interactions in vivo.

RNA-DNA interaction

Finding any RNA attached to any place on the genome.

Single molecule detection

Single molecules RNA FISH.

Single cell analysis

Single cell analysis of cell fate.

Big data search engine

Online search for epigenomic and transcriptomic big data.


Using deep single-cell RNA-seq of matched sister blastomeres, we found highly reproducible differences among the single cells within early stage (2- and 4-cell) pre-implantation mouse embryos [Genome Res, 2014, cover]. We developed a time-variant clustering model for analysis of time-course single-cell gene expression data [PNAS, 2014].
We discovered transposon-mediated re-wiring of transcription networks that govern pre-implantation embryonic development [Genome Res, 2010, cover; Research Highlight in Nature, 2010].
We contributed to initiating "comparative epigenomics", a research field that studies genomic functions by cross-species epigenomic comparison [Cell, 2012].
We pioneered in modeling the impact of epigenome-genome interaction to transcription factor binding, and to personal variation [PLoS Comp Biol, 2013].
We contributed to the derivation of the rules of dynamic gene regulation and temporal epigenomic changes [Genome Res , 2013, cover].

Previous work

We brought evolutionary biology ideas into elucidating structures and functions of mammalian Gene Regulatory Networks (GRN). We developed methods to describe the evolutionary changes of different components of mammalian GRNs, including transcription factor binding sites (TFBS) and TFBS modules [Genome Res, 18:1325-1335], co-expression modules of genes [PLoS Comp Biol, 6(3): e1000707] [Nucl Acids Res, 35: W105-W114], protein-protein interactions [Genome Res, 20: 804-815], transcription factor (TF)-DNA interactions [Genome Res, 20: 804-815], and epigenomes [Cell, 149: 1381-1391].
On the theoretical end, we developed an evolutionary model that simultaneous describes the evolutionary changes of multiple components of a GRN [PLoS Comp Biol, 7(6): e1002064]. This model enables using multi-species DNA and gene expression data for simultaneous identification of GRNs in every species under consideration.
We reported that close to forty percent of the genes shared by humans, mice and cows have different expression patterns in the early stages of embryonic development. We traced these differences to a set of specific evolutionary changes of the genomes, including insersion of transcription factor binding sites by transposons. See cover article in Genome Research, and research highlight in Nature.
Transcription factor (TF) - DNA interaction is at the core of transcriptional regulation. We developed methods to efficiently calculate TF-DNA binding affinities for a long stretch (200-500bp) of genomic sequence, taking into account interactions between strong and weak, homotypical and heterotypic TF binding sites [PLoS ONE, 4(12): e8155][BMC Genomics, 9:S18]. These methods led to the utility of high-throughput sequencing data for reconstruction of a transcription network [BMC Genomics, 9:S19] and the discovery of a second DNA recognition motif of Nanog [PLoS ONE, 4(12): e8155], which was verified by subsequent studies from our group [Genome Res, 20: 804-815] and others [Nucl Acids Res, 2012].
Extending from this work, we developed a thermodynamic model to calculate the TF-DNA binding affinity taking into account of the epigenome [PLoS Comp Biol, 2013].
We developed a probabilistic model (mixture of HMMs) to annotate the genome using temporal epigenomic data. This model clusters genomic sequences based on the similarity of temporal changes of multiple epigenomic marks during a cellular differentiation process [Genome Res, 23:352-384]. Also see cover description. With this model, we found that temporal changes of H3K4me2, unmethylated CpG, and H2A.Z were predictive of 5-hmC changes, as well as a simple rule governing the relationship between 5-hmC and gene expression levels.
We developed a computational tool, perEdit [Bioinformatics, 27: 3427-3429], to assemble both alleles of a personal genome. Personal ChIP-seq and RNA-seq data can then mapped to the individual genome and thus identifying individual variation and allele differences.
We developed statistical methods to model temporal gene expression data, allowing for identifying different temporal expression patterns [Bioinformatics 26: 2944-2951] and dissecting subpopulations of cell types is a heterogeneous cell population [PLoS Comp Biol, 5: e1000607]. We developed a Hidden Branching Process model to cluster time-course data [PNAS, 2014].


Complete list of publications on Google Scholar, NCBI

20 selected papers

  • Systematic mapping of RNA-chromatin interactions in vivo. Bharat Sridhar, Marcelo Rivas-Astroza, Tri C. Nguyen, Weizhong Chen, Zhangming Yan, Xiaoyi Cao, Lucie Hebert, Sheng Zhong.
    Current Biology, 2017, 27(4): 602–609. Text, Data
  • Mapping RNA-RNA interactome and RNA structure in vivo by MARIO. Tri C. Nguyen, Xiaoyi Cao, Pengfei Yu, Shu Xiao, Jia Lu, Fernando H. Biase, Bharat Sridhar, Norman Huang, Kang Zhang, Sheng Zhong.
    Nature Communications, 2016, 7:12023. Text, Software, Data
  • SMARCAD1 contributes to regulation of naïve pluripotency by interacting with histone citrullination. Shu Xiao, Jia Lu, Bharat Sridhar, Xiaoyi Cao, Pengfei Yu, Chieh-Chun Chen, Darina McDee, Laura Sloofman, Yang Wang, Marcelo Rivas-Astroza, Bhanu Prakash V.L. Telugu, Dana Levasseur, Kang Zhang, Han Liang, Jing Crystal Zhao, Tetsuya S. Tanaka, Gary Stormo, Sheng Zhong
    Cell Reports, 2017, 18:3117-3128. Text, Raw images, Artwork
  • Spatiotemporal clustering of epigenome reveals rules of dynamic gene regulation. Pengfei Yu, Shu Xiao, Xiaoyun Xin, Chun-Xiao Song, Wei Huang, Darina McDee, Tetsuya Tanaka, Ting Wang, Chuan He, Sheng Zhong.
    Genome Research, 2013, 23:352-384. Cover article, Abstract, Software, Data; Review
  • Understanding variation in transcription factor binding by modeling transcription factor genome-epigenome interactions. Chieh-Chun Chen, Shu Xiao, Dan Xie, Xiaoyi Cao, Chun-Xiao Song, Ting Wang, Chuan He, Sheng Zhong.
    PLoS Computational Biology, 2013, 9(12): e1003367. Text, Software, Supplementary Figures
  • Comparative epigenomic annotation of regulatory DNA. Shu Xiao, Dan Xie, Xiaoyi Cao, Pengfei Yu, Xiaoyun Xing, Chieh-Chun Chen, Meagan Musselman, Mingchao Xie, Franklin D. West, Harris A. Lewin, Ting Wang, Sheng Zhong.
    Cell, 2012, 49: 1381-1391. Abstract, Data, Comparative Epigenome Browser.
    Reviewed by: J Stem Cell Res Ther, 2012, S10:007. SCIENCE CHINA Life Sciences, 2013, 56(3): 213-219. WIREs Systems Biol Med, 2012, 4(6): 525-545.
  • Towards an evolutionary model of transcription networks. Dan Xie, Chieh-Chun Chen, Xin He, Xiaoyi Cao, Sheng Zhong.
    PLoS Computational Biology, 2011, 7(6): e1002064. Text. Website.
  • Modeling co-expression across species for complex traits: insights to the difference of human and mouse embryonic stem cells. Jun Cai, Dan Xie, Zhewen Fan, John Marden, Wing H. Wong, Sheng Zhong.
    PLoS Computational Biology, 2010, 6(3): e1000707. Text, Data, Software
  • Cross-species de novo identification of cis-regulatory modules with GibbsModule: application to gene regulation in embryonic stem cells. Dan Xie, Jun Cai, Na-Yu Chia, Huck H. Ng and Sheng Zhong.
    Genome Research, 2008, 18:1325-1335. Text. Software
  • Cross-species microarray analysis with the OSCAR system suggests an INSR-Pax6-NQO1 neuro-protective pathway in ageing and Alzheimer's disease. Yue Lu, Xin He and Sheng Zhong.
    Nucleic Acids Research, 2007, 35: W105-W114. TEXT.
  • Time-variant clustering model for understanding cell fate decisions. Wei Huang, Xiaoyi Cao, Fernando H. Biase, Pengfei Yu, Sheng Zhong.
    Proc Nat Acad Sci, 2014, 111(44):E4797-E4806. Abstract
  • Network based comparison of temporal gene expression patterns. Wei Huang, Xiaoyi Cao, Sheng Zhong.
    Bioinformatics, 2010, 26(23): 2944-2951. Abstract, Software
  • A biophysical model for analysis of transcription factor interaction and binding site arrangement from genome-wide binding data. Xin He, Chieh-Chun Chen, Feng Hong, Fang Fang, Saurabh Sinha, Huck-Hui Ng, Sheng Zhong.
    PLoS ONE, 2009, 4(12): e8155. Text. This paper was presented on RECOMB Regulatory Genomics 09. Software
  • Dissecting early differentially expressed genes in a mixture of differentiating embryonic stem cells. Feng Hong, Fang Fang, Xuming He, Xiaoyi Cao, Hiram Chipperfield, Dan Xie, Wing H. Wong, Huck H. Ng, Sheng Zhong.
    PLoS Computational Biology, 2009, 5(12): e1000607. Text, Data
  • Gene Ontology analysis in multiple gene clusters under multiple hypothesis testing framework. Sheng Zhong and Dan Xie.
    Artificial Intelligence in Medicine, 2007, 41:105-115. Abstract, PDF. This paper is reviewed by an Editorial on Artificial Intelligence in Medicine 41:83-86.
  • Reproducibility Probability Score - incorporating measurement variability across laboratories for gene selection. Guixian Lin, Xuming He, Hanlee Ji, Leming Shi, Ronald Davis, Sheng Zhong.
    Nature Biotechnology, 2007, 41:105-115. 24(12): 6-7. Text, Software, Supplementary Material. The article has been reviewed by: Pharmacogenomics, 2007, 8(8): 1037-1049. European Journal of Cancer, 2007, 5(5): 97-104. Current Opinion in Biotechnology Systems Biomedicine: Concepts and Perspectives, Edison Liu, Douglas Lauffenburger (editors), Elsevier, 2009, p.172. WIREs Systems Biol Med, 2012, 4(1): 39-49. WIREs Systems Biol Med, 2012, 4(6): 525-545.
  • From Genomes to Societies: A Holistic View of Determinants of Human Health. Yuyan Shi, Sheng Zhong.
    Current Opinion in Biotechnology, 2014, 28:134-142. Abstract.


Internet search for genomic big data.


Analyze RNA interaction data.

4DN Portal

Entry to NIH 4D Nucleome network.


Comparative Epigenome Browser.


Sequence mapping on personal genome.


Genome annotation using temporal epigenomic data.


We welcome applictions for postdoc, lab manager, software engineer, and graduate student.

Get in Touch

Powell-Focht Bioengineering Hall 371, University of California San Diego, 9500 Gilman Drive, MC 0412, La Jolla, CA 92093-0412

Lab Phone: (858) 822-5649