While CpG dinucleotides are significantly reduced compared to other dinucleotides in mammalian genomes, they can congregate and form CpG islands, which localize around the 5ʹ regions of genes, where they function as promoters. CpG-island promoters are generally unmethylated and are often found in housekeeping genes. However, their nucleotide sequences and existence per se are not conserved between humans and mice, which may be due to evolutionary gain and loss of the regulatory regions. In this study, human and rhesus monkey genomes, with moderately conserved sequences, were compared at base resolution. Using transcription start site data, we first validated our methods’ ability to identify orthologous promoters and indicated a limitation using the 5ʹ end of curated gene models, such as NCBI RefSeq, as their transcription start sites. We found that, in addition to deamination mutations, insertions and deletions of bases, repeats, and long fragments contributed to the mutations of CpG dinucleotides. We also observed that the G + C contents tended to change in CpG-poor environments, while CpG content was altered in G + C-rich environments. While loss of CpG islands can be caused by gradual decreases in CpG sites, gain of these islands appear to require two distinct nucleotide altering steps. Taken together, our findings provide novel insights into the process of acquisition and diversification of CpG-island promoters in vertebrates.
ASJC Scopus subject areas