All the genomes listed on my page Genome Sizes describe the complete genome of a single species. For bacteria and archaeons, this means that the organism was grown in pure culture to provide the DNA for sequencing.
But it is now clear that the microbial world contains vast numbers of both groups that have never been grown in the laboratory and thus have escaped study. Soil, water, and the contents of our large intestine are examples of habitats that teem with unknown microorganisms.
Thanks to the recent development of sequencing machines capable of rapidly (and inexpensively) sequencing huge amounts of DNA, it is now practical to sequence the DNA extracted from complex microbial ecosystems like that found in a soil sample.
Several different approaches are used, but all depend on a first step of extracting the microbial DNA from the sample (and separating it from the far more complex DNA of any eukaryotes that may be present).
The DNA encoding the small subunit (16S) of the ribosomes of both bacteria and archaeons contain some highly conserved regions; that is, regions of identical or almost identical sequence. Using primers that target these regions, one can then produce enough material by the polymerase chain reaction PCR to sequence the entire 16S rRNA gene.
Comparing the various sequences to a database of sequences from known organisms, one can estimate how many different types of microbes are present. Because of the substantial genetic diversity found between "strains" of a single species (e.g., E. coli K-12 and E.coli O157:H7), closely-related (> 97% identity) 16S rDNA sequences are assigned to a single "phylotype" because we cannot be sure whether they belong to separate species or to two strains of the same species. In either case, the collection of 16S rDNA sequences can be arranged to form a phylogenetic tree to show the patterns of relatedness.
Analyzing the 16S rDNA genes in a sample tells us who is there, but, of course, is not a complete genome and tells us nothing about the other genes present in the various members of the population.
This information can be gained by "shotgun" sequencing of the environmental DNA sample.
The sheer diversity of organisms in most microbial ecosystems makes it virtually impossible to find enough contigs to assemble a complete genome for any one organism like those listed in Genome Sizes. What you get instead is a window into the many kinds of genes present in one inhabitant or another of that ecosystem. For example, you may discover genes that encode proteins able to degrade environmental pollutants or genes able to synthesize a new antibiotic.
Another way of exploiting metagenomics is to look for new functions in the host (e.g. E. coli) if it can express the new gene with which it was transformed. For example, screening the library of E. coli clones for the ability to resist an antibiotic can reveal genes involved in antibiotic resistance — a worrisome development in recent years.
Simpler still was the ecosystem found in water 2.8 km (1.7 miles) down in a gold mine. Only one organism turned up: an autotrophic bacterium capable of extracting energy from inorganic substances in its environment and synthesizing all the molecules needed for its life from them.