To establish criteria for and the limitations of novel gene identification, to identify novel genes of potential relevance to Down Syndrome and to investigate features of genome organization, 6.5 Mb of DNA sequence, dispersed throughout the long arm of human chromosome 21, have been annotated computationally and experimentally. Exon prediction with four programs, protein and EST database searches, two-sequence BLAST searches and CpG island characterization identified 41 genes with known or new protein homologies. Features of these genes suggested criteria for prediction of novel genes (those lacking any protein homology) with the following characteristics: (1) exon+EST genes: genes with excellent patterns of predicted exons and one or more matches in dbEST; (2) exon-EST genes: genes with good patterns of predicted exons and no matches in dbEST; (3) EST-exon genes: genes without any patterns of reliable exon prediction but with matches in dbEST; and (4) isolated CpG island genes: genes consisting of strong CpG islands that are apparently unique sequences and found in regions lacking any consistent exon predictions within >50 kb. In total, 41 novel gene models were predicted, and for a subset of these, RT-PCR experiments helped to verify and refine the models, and were used to assess expression in early development and in adult brain regions of potential relevance to Down syndrome. Results suggest generally low and/or restricted patterns of expression, and also reveal examples of complex alternative processing, especially in brain, that may have important implications for regulation of protein function. Analysis of complete gene structures of the known genes identified a number of very large introns, a number of very short intergenic distances, and at least one potentially bi-directional promoter. At least 3/4 of known genes and 1/2 of predicted genes are associated with CpG islands. For novel genes, three cases of overlapping genes are predicted. Results of these analyses illustrate some of the complexities inherent in mammalian genome organization and some of the limitations of current sequence analysis technologies. They also doubled the number of potential genes within the region. (C) 2000 Elsevier Science B.V. All rights reserved.
ASJC Scopus subject areas