We study gene regulation and cellular behavior by developing statistical and experimental methods. Our primary goal is to develop new technologies to map molecular networks, including RNA-RNA interactome [Nat Comm, 2016], RNA-chromatin interactome [Curr Biol, 2017], and protein-protein interactome. Our secondary quest is to model the variations of these networks in three axes, namely developmental time, personal difference, and evolutionary change. Our major tools include epigenomic and single-cell assays, single-molecule imaging, statistical modeling, and large scale computation.


RNA-RNA interaction

Mapping RNA-RNA interactions in vivo.

RNA-DNA interaction

Finding any RNA attached to any place on the genome.

Single molecule detection

Single molecules RNA FISH.

Single cell analysis

Single cell analysis of cell fate.

Big data search engine

Online search for epigenomic and transcriptomic big data.


Using deep single-cell RNA-seq of matched sister blastomeres, we found highly reproducible differences among the single cells within early stage (2- and 4-cell) pre-implantation mouse embryos [Genome Res, 2014, cover]. We developed a time-variant clustering model for analysis of time-course single-cell gene expression data [PNAS, 2014].
We discovered transposon-mediated re-wiring of transcription networks that govern pre-implantation embryonic development [Genome Res, 2010, cover; Research Highlight in Nature, 2010].
We contributed to initiating "comparative epigenomics", a research field that studies genomic functions by cross-species epigenomic comparison [Cell, 2012].
We pioneered in modeling the impact of epigenome-genome interaction to transcription factor binding, and to personal variation [PLoS Comp Biol, 2013].
We contributed to the derivation of the rules of dynamic gene regulation and temporal epigenomic changes [Genome Res , 2013, cover].

Previous work

We brought evolutionary biology ideas into elucidating structures and functions of mammalian Gene Regulatory Networks (GRN). We developed methods to describe the evolutionary changes of different components of mammalian GRNs, including transcription factor binding sites (TFBS) and TFBS modules [Genome Res, 18:1325-1335], co-expression modules of genes [PLoS Comp Biol, 6(3): e1000707] [Nucl Acids Res, 35: W105-W114], protein-protein interactions [Genome Res, 20: 804-815], transcription factor (TF)-DNA interactions [Genome Res, 20: 804-815], and epigenomes [Cell, 149: 1381-1391].
On the theoretical end, we developed an evolutionary model that simultaneous describes the evolutionary changes of multiple components of a GRN [PLoS Comp Biol, 7(6): e1002064]. This model enables using multi-species DNA and gene expression data for simultaneous identification of GRNs in every species under consideration.
We reported that close to forty percent of the genes shared by humans, mice and cows have different expression patterns in the early stages of embryonic development. We traced these differences to a set of specific evolutionary changes of the genomes, including insersion of transcription factor binding sites by transposons. See cover article in Genome Research, and research highlight in Nature.
Transcription factor (TF) - DNA interaction is at the core of transcriptional regulation. We developed methods to efficiently calculate TF-DNA binding affinities for a long stretch (200-500bp) of genomic sequence, taking into account interactions between strong and weak, homotypical and heterotypic TF binding sites [PLoS ONE, 4(12): e8155][BMC Genomics, 9:S18]. These methods led to the utility of high-throughput sequencing data for reconstruction of a transcription network [BMC Genomics, 9:S19] and the discovery of a second DNA recognition motif of Nanog [PLoS ONE, 4(12): e8155], which was verified by subsequent studies from our group [Genome Res, 20: 804-815] and others [Nucl Acids Res, 2012].
Extending from this work, we developed a thermodynamic model to calculate the TF-DNA binding affinity taking into account of the epigenome [PLoS Comp Biol, 2013].
We developed a probabilistic model (mixture of HMMs) to annotate the genome using temporal epigenomic data. This model clusters genomic sequences based on the similarity of temporal changes of multiple epigenomic marks during a cellular differentiation process [Genome Res, 23:352-384]. Also see cover description. With this model, we found that temporal changes of H3K4me2, unmethylated CpG, and H2A.Z were predictive of 5-hmC changes, as well as a simple rule governing the relationship between 5-hmC and gene expression levels.
We developed a computational tool, perEdit [Bioinformatics, 27: 3427-3429], to assemble both alleles of a personal genome. Personal ChIP-seq and RNA-seq data can then mapped to the individual genome and thus identifying individual variation and allele differences.
We developed statistical methods to model temporal gene expression data, allowing for identifying different temporal expression patterns [Bioinformatics 26: 2944-2951] and dissecting subpopulations of cell types is a heterogeneous cell population [PLoS Comp Biol, 5: e1000607]. We developed a Hidden Branching Process model to cluster time-course data [PNAS, 2014].


Complete list of publications on Google Scholar, NCBI

20 selected papers

  • Systematic mapping of RNA-chromatin interactions in vivo. Bharat Sridhar, Marcelo Rivas-Astroza, Tri C. Nguyen, Weizhong Chen, Zhangming Yan, Xiaoyi Cao, Lucie Hebert, Sheng Zhong.
    Current Biology, 2017, 27(4): 602–609. Text, Data, Protocols, Bioinformatic pipeline, Access the recommendation on F1000Prime
  • Mapping RNA-RNA interactome and RNA structure in vivo by MARIO. Tri C. Nguyen, Xiaoyi Cao, Pengfei Yu, Shu Xiao, Jia Lu, Fernando H. Biase, Bharat Sridhar, Norman Huang, Kang Zhang, Sheng Zhong.
    Nature Communications, 2016, 7:12023. Text, Software, Data
  • The 4D nucleome project. Job Dekker, Andrew S. Belmont, Mitchell Guttman, Victor O. Leshyk, John T. Lis, Stavros Lomvardas, Leonid A. Mirny, Clodagh C. O’Shea, Peter J. Park, Bing Ren, Joan C. Ritland Politz, Jay Shendure, Sheng Zhong & the 4D Nucleome Network.
    Nature, 2017, 549:219–226. Text, Artwork
  • SMARCAD1 contributes to regulation of naïve pluripotency by interacting with histone citrullination. Shu Xiao, Jia Lu, Bharat Sridhar, Xiaoyi Cao, Pengfei Yu, Chieh-Chun Chen, Darina McDee, Laura Sloofman, Yang Wang, Marcelo Rivas-Astroza, Bhanu Prakash V.L. Telugu, Dana Levasseur, Kang Zhang, Han Liang, Jing Crystal Zhao, Tetsuya S. Tanaka, Gary Stormo, Sheng Zhong.
    Cell Reports, 2017, 18:3117-3128. Text Raw images, Artwork
  • Spatiotemporal clustering of epigenome reveals rules of dynamic gene regulation. Pengfei Yu, Shu Xiao, Xiaoyun Xin, Chun-Xiao Song, Wei Huang, Darina McDee, Tetsuya Tanaka, Ting Wang, Chuan He, Sheng Zhong.
    Genome Research, 2013, 23:352-384. Cover article, Abstract, Software, Data; Review
  • Understanding variation in transcription factor binding by modeling transcription factor genome-epigenome interactions. Chieh-Chun Chen, Shu Xiao, Dan Xie, Xiaoyi Cao, Chun-Xiao Song, Ting Wang, Chuan He, Sheng Zhong.
    PLoS Computational Biology, 2013, 9(12): e1003367. Text, Software, Supplementary Figures
  • Comparative epigenomic annotation of regulatory DNA. Shu Xiao, Dan Xie, Xiaoyi Cao, Pengfei Yu, Xiaoyun Xing, Chieh-Chun Chen, Meagan Musselman, Mingchao Xie, Franklin D. West, Harris A. Lewin, Ting Wang, Sheng Zhong.
    Cell, 2012, 49: 1381-1391. Abstract, Data, Comparative Epigenome Browser.
    Reviewed by: J Stem Cell Res Ther, 2012, S10:007. SCIENCE CHINA Life Sciences, 2013, 56(3): 213-219. WIREs Systems Biol Med, 2012, 4(6): 525-545.
  • Towards an evolutionary model of transcription networks. Dan Xie, Chieh-Chun Chen, Xin He, Xiaoyi Cao, Sheng Zhong.
    PLoS Computational Biology, 2011, 7(6): e1002064. Text. Website.
  • Modeling co-expression across species for complex traits: insights to the difference of human and mouse embryonic stem cells. Jun Cai, Dan Xie, Zhewen Fan, John Marden, Wing H. Wong, Sheng Zhong.
    PLoS Computational Biology, 2010, 6(3): e1000707. Text, Data, Software
  • Cross-species de novo identification of cis-regulatory modules with GibbsModule: application to gene regulation in embryonic stem cells. Dan Xie, Jun Cai, Na-Yu Chia, Huck H. Ng and Sheng Zhong.
    Genome Research, 2008, 18:1325-1335. Text. Software
  • Cross-species microarray analysis with the OSCAR system suggests an INSR-Pax6-NQO1 neuro-protective pathway in ageing and Alzheimer's disease. Yue Lu, Xin He and Sheng Zhong.
    Nucleic Acids Research, 2007, 35: W105-W114. TEXT.
  • Time-variant clustering model for understanding cell fate decisions. Wei Huang, Xiaoyi Cao, Fernando H. Biase, Pengfei Yu, Sheng Zhong.
    Proc Nat Acad Sci, 2014, 111(44):E4797-E4806. Abstract
  • Network based comparison of temporal gene expression patterns. Wei Huang, Xiaoyi Cao, Sheng Zhong.
    Bioinformatics, 2010, 26(23): 2944-2951. Abstract, Software
  • Dissecting early differentially expressed genes in a mixture of differentiating embryonic stem cells. Feng Hong, Fang Fang, Xuming He, Xiaoyi Cao, Hiram Chipperfield, Dan Xie, Wing H. Wong, Huck H. Ng, Sheng Zhong.
    PLoS Computational Biology, 2009, 5(12): e1000607. Text, Data
  • Reproducibility Probability Score - incorporating measurement variability across laboratories for gene selection. Guixian Lin, Xuming He, Hanlee Ji, Leming Shi, Ronald Davis, Sheng Zhong.
    Nature Biotechnology, 2007, 41:105-115. 24(12): 6-7. Text, Software, Supplementary Material. The article has been reviewed by: Pharmacogenomics, 2007, 8(8): 1037-1049. European Journal of Cancer, 2007, 5(5): 97-104. Current Opinion in Biotechnology Systems Biomedicine: Concepts and Perspectives, Edison Liu, Douglas Lauffenburger (editors), Elsevier, 2009, p.172. WIREs Systems Biol Med, 2012, 4(1): 39-49. WIREs Systems Biol Med, 2012, 4(6): 525-545.
  • Building a genome browser with GIVE. Xiaoyi Cao, Zhangming Yan, Qiuyang Wu, Alvin Zheng, Sheng Zhong.
    bioRxiv, 2017, doi: Text, Software, News & Comments: Nature 549:117.
  • GeNemo: a search engine for web-based functional genomic data. Yongqing Zhang, Xiaoyi Cao, Sheng Zhong.
    Nucleic Acids Research, 2016, 44: W122-W127. Text, Software
    News coverage: HIT Consultant, Science Daily, MediaPost, HealthDataManagement
  • Mapping personal functional data to personal genomes. Marcelo Rivas-Astroza, Dan Xie, Xiaoyi Cao, Sheng Zhong.
    Bioinformatics, 2011, 27(24):3427-3429. Text. Software


Build your own genome browser website.


Internet search for genomic big data.


Analyze RNA interaction data.


Comparative Epigenome Browser.


Sequence mapping on personal genome.


Genome annotation using temporal epigenomic data.

Consortium web portal

4DN Portal

Entry to NIH 4D Nucleome network.


We welcome applictions for postdoc, lab manager, software engineer, and graduate student.

Get in Touch

Powell-Focht Bioengineering Hall 371, University of California San Diego, 9500 Gilman Drive, MC 0412, La Jolla, CA 92093-0412

Lab Phone: (858) 822-5649