Working over a half-dozen years, over 400 scientists in 32 different laboratories examined the entire human genome (3.3 x 109 bp) for evidence of segments of DNA that perform some function in a cell.
Prior to their work, it was clear that the human genome contains only some 21,000 protein-encoding genes, and that these take up only about 3% of the DNA in a human cell. What, if anything, was the rest of our DNA doing? While there was accumulating evidence of segments of DNA outside of genes that regulate the expression of those genes (such as enhancers, promoters, and insulators), most of the genome was thought to have no function and was dubbed "junk" DNA.
However, in a set of 30 papers published in September 2012, they provide evidence that as much as 80% of the human genome performs some function in some cells at some time.Among their findings:
1. They found 20,687 protein-encoding genes occupying 2.94% of the genome.
However, while only a small (~3%) of the genome gets transcribed into pre-mRNA whose exons (encoded by 1.2% of our DNA) are translated into proteins, as much as 75% of our DNA is transcribed into a variety of other RNAs both
2. Using a variety of assay methods on over 100 different types of cells, they uncovered evidence of ~400,000 potential enhancers and ~70,000 promoters. A few thousand of these were found in all the types of cells examined, but most were found in only a subset of cells, and many were unique to one kind of cell.
3. Approximately 50% of the human genome consists of various classes of repetitive DNA, that is, closely-related sequences occurring over and over. While some functions had been identified for certain of these, most had been considered "junk" DNA with no function. However, the ENCODE project found that some 75% of our repetitive DNA occurs within, or overlaps with, sequences, like enhancers, that regulate gene expression.
|Link to a discussion of the various classes of repetitive DNA.|
Chromatin Immunoprecipitation (ChIP-Seq) was one of the most productive methods used to find regions of the genome occupied by proteins such as transcription factors, modified histones, and the protein designated CTCF ("CCCTC binding factor"; named for a nucleotide sequence found in all insulators).
Antibodies directed against epitopes on the protein of interest are used to precipitate DNA fragments to which that protein is bound. The fragments are then sequenced. The process is more fully described at this link.