Shortly after their press conferences, the two groups that had been striving for several years to map the human genome published their findings:
These achievements were monumental, but before we examine them, let us be clear as to what they were not.
- the International Human Genome Sequencing Consortium (IHGSC) in the 15 February 2001 issue of Nature;
- Celera Genomics, a company in Rockville, Maryland, in the 16 February issue of Science.
The two groups came up with slightly different estimates of the number of protein-encoding genes, but both in the range of 30 to 38 thousand:
- Neither group had determined the complete sequence of the human genome.
Each of our chromosomes is a single molecule of DNA. Some day the sequence of base pairs in each will be known from one end to the other. But in 2001, thousands of gaps remained to be filled.
What they had done was present a series of draft sequences that represented about 90% (probably the most interesting 90%) of the genome.
- Even taken together, the results did not provide an accurate count of the number of protein-encoding genes in our genome (in contrast to such genomes as those of
One reason: the
of the introns that split these genes make it difficult to recognize the open reading frames (ORFs) that encode proteins.
- large number and
- large size
- barely two times larger than the genomes of
- and representing only 1–
2% of the total DNA in the cell;
- and a third of the 100,000 genes that many had predicted would be found.
- (By 2011, the number had been reduced to some 21,000.)
Are the tiny roundworm and fruit fly almost as complex as we are?
Probably not, although we share many homologous genes (called "orthologs") with both these animals.
Although there are some giants such as
|Follow this link to a discussion of the role of changes in gene regulatory regions in the evolution of animal form.|
the average human gene contains 4 exons totaling 1,350 base pairs and thus encodes an average protein of 450 amino acids.
- dystrophin with its 79 exons spread over 2.4 million base pairs of DNA;
- titin whose 363 exons can encode a single protein with as many as ~38,000 amino acids,
The density of genes on the different chromosomes varies from
Humans, and presumably most vertebrates, have genes not found in invertebrate animals like Drosophila and C. elegans.
- 23 genes per million base pairs on chromosome 19 (for a total of 1,400 genes) to
- only 5 genes per million base pairs on chromosome 13.
These include genes encoding
Both groups added to the list of human genes that have arisen by repeated duplication (e.g., by unequal crossing over) from a single precursor gene; for examples,
Both groups verified the presence of large amounts of repetitive DNA. In fact, this DNA —
with similar sequences occurring over and over —
is one of the main obstacles to assembling the DNA sequences in proper order.
All told, repetitive DNA probably accounts for over 50% of our total genome.
- antibodies and T cell receptors for antigen (TCRs) [Discussion]
- the transplantation antigens of the major histocompatibility complex (MHC) (HLA, the MHC of humans) [Link]
- cell-signaling molecules including the many types of cytokines
- the molecules that participate in blood clotting. [Link]
- mediators of apoptosis. Although these proteins occur in Drosophila and C. elegans, we have a much richer assortment of them.
- Keep looking for genes.
As of March 2010, 19,956 protein-encoding genes had been positively identified, but there probably are a thousand or more still to be found.
- Determine the human proteome; that is, the total complement of proteins we synthesize.
- Analyze how clusters of genes are coordinately expressed
Such analysis will benefit greatly from the availability to gene chip technology and will also help us to understand how such a modest increase in gene number from Drosophila to humans could produce such a different outcome!
- in various types of cells
- at different times in the life of a cell.
- Determine the genomes of other vertebrates.
This will not only help us recognize more human genes but will give us insight into what makes us unique.
Already we know that large sections of our genome have closely-related homologs in the mouse.
- The collection of genes — and even their order — on human chromosome 17 matches closely those of mouse chromosome 11. The same is true of human chromosome 20 and mouse chromosome 2.
- Humans and mice (also rats) share several hundred absolutely identical stretches of DNA extending for 200–800 base pairs.
To have avoided any mutations for 60 million years since humans and rodents went their separate evolutionary ways suggest that these regions perform functions absolutely essential to mammalian life.
- Some are present in the exons of genes, especially genes involved in RNA processing.
- Some are found in or near the introns of genes, especially genes encoding proteins involved in DNA transcription.
- Some are found between genes — especially those, like Pax6, essential to embryonic development — and may serve as enhancers.
As for the chimpanzee, a comparison of its genome with humans is discussed at this link.
How to Sequence a Genome. Illustrated descriptions of sequencing strategies. (Requires Flash)
|(Please let me know by e-mail if you find a broken link in my pages.)|
An update: as of November 2008, the complete genomes of four men have been determined. With the rapid improvements in the speed (and cost) of sequencing, we can expect more to come.
19 April 2014