Monday, May 20, 2013

Parsing the DNA Crazy Quilt

A measure of how little we know about the real-world workings of evolution is that science still can't explain why some organisms have huge imbalances in the chemical composition of their DNA. If you look at the genome of Clostridium botulinum (the botulism germ), 72% of the bases in its DNA are either 'A' or 'T': adenine or thymine. (The four possibilities are, of course, adenine, thymine, guanine, and cytosine.) Conversely, you can find many examples of organisms in which the DNA is mostly 'G' or 'C.' The question is why A, T, G, and C don't occur in roughly equal proportions (which is what you'd expect after millions of years of genetic averaging; you'd expect some sort of regression to the mean).

Just to give you an idea of what GC/AT imbalance really looks like, here's the gene for the enzyme adenine deaminase from Clostridium botulinum, with all the A and T values in red:

ATGTATAAAAATATACAAAGAGAAATCTATAAAAATACAAAAGGAGACGGGGATATGTTTAATAAATTTGATACAAAGCCTCTTTGGGAGGTAAGTAAA ACTTTATCAAGTGTAGCACAGGGGCTTGAACCGGCTGATATGGTTATTATAAATTCAAGGCTTATAAATGTCTGTACAAGAGAAGTCATAGAAAACACA GATGTAGCAATTAGCTGTGGAAGAATTGCTTTAGTAGGTGATGCAAAACATTGCATAGGGGAAAACACAGAGGTAATTGATGCAAAAGGACAATATATT GCACCAGGTTTTTTAGATGGTCATATTCATGTTGAATCATCAATGTTAAGTGTAAGCGAATATGCTCGTTCAGTAGTTCCACATGGTACTGTCGGAATA TATATGGATCCACATGAAATTTGTAATGTACTCGGATTAAATGGTGTACGTTATATGATTGAAGATGGCAAGGGTACTCCACTTAAAAATATGGTGACC ACACCATCCTGTGTACCAGCAGTTCCAGGTTTTGAAGATACAGGAGCGGCTGTAGGACCAGAAGATGTTAGAGAAACAATGAAGTGGGATGAAATAGTT GGATTAGGAGAAATGATGAACTTCCCAGGTATACTTTATTCTACAGATCATGCTCATGGAGTAGTAGGAGAAACTTTAAAAGCTAGTAAAACAGTAACA GGACATTATTCTTTACCTGAAACAGGAAAAGGATTAAATGGATATATTGCATCAGGTGTAAGATGTTGTCATGAATCCACAAGAGCGGAAGATGCTCTT GCTAAAATGCGCCTTGGAATGTATGCAATGTTTAGAGAAGGATCTGCATGGCATGACTTAAAGGAAGTAAGTAAAGCCATTACAGAAAATAAGGTAGAT AGTAGATTTGCTGTTTTAATATCTGATGATACTCACCCACACACATTGCTTAAGGATGGACATTTAGATCATATTATAAAACGTGCTATAGAAGAAGGG ATAGAGCCATTAACTGCAATTCAAATGGTAACAATAAATTGTGCACAATGTTTCCAAATGGATCATGAATTAGGTTCTATAACTCCAGGAAAATGTGCA GATATTGTATTTATAGAAGATTTAAAAGATGTAAAAATAACAAAGGTTATTATAGATGGAAATTTAGTTGCAAAGGGTGGACTATTAACTACTTCAATA GCTAAATATGATTATCCTGAAGATGCTATGAATTCAATGCATATTAAGAATAAAATAACACCAGATTCCTTTAATATTATGGCTCCTAATAAAGAAAAA ATAACTGCAAGGGTTATTGAAATTATACCTGAAAGAGTTGGTACATATGAGAGACATGTTGAACTTAATGTTAAAGATGATAAAGTTCAATGTGATCCA AGTAAAGATGTTTTAAAAGCAGTTGTATTTGAAAGACACCATGAAACAGGAACAGCAGGATATGGTTTTGTTAAAGGTTTTGGTATTAAGAGAGGAGCT ATGGCTGCAACAGTTGCCCATGATGCTCACAACTTATTAGTTATAGGAACAAATGATGAAGATATGGCATTAGCTGCTAATACATTAATAGAATGTGGT GGAGGAATGGTAGCCGTACAAGATGGTAAAGTATTAGGCTTAGTTCCATTACCAATAGCAGGACTTATGAGTAATAAGCCTTTAGAAGAAATGGCAGAA ATGGTAGAAAAACTAGATAGTGCATGGAAAGAAATAGGATGTGATATAGTTTCACCATTTATGACAATGGCACTTATTCCACTTGCCTGCCTACCAGAA TTAAGACTAACTAATAGAGGGTTAGTTGATTGTAATAAGTTTGAATTTGTATCATTATTTGTAGAAGAATAA

View gene at FastaView.


The organism Actinomyces oris (which occurs in the film that builds up on teeth) has an adenine deaminase gene that looks like this:

ATGGCCGATCAACCGTCCGCAGACCTGCTTATCAAGGACGCGCGCATCGTCCCTTTCCGGTCCCGTACCGAACTGGGTGCGCTGCGCCGAGGTGACCCT CACCCCGGCGCCTTGGCCGCGCCGCCGCCCCCGGGTGAGCCCGTGGATGTGCGTATCAAGGCGGGCCGGGTCGTCGAGGTGGGACAGGGGCTGAGTGCT CCCGGGACACGGGTCCTTGAGGCCGAGGGCTCCTTCCTCATTCCCGGCCTGTGGGACGCTCACGCCCACCTGGACATGGAGGCGGCGCGCTCGGCACGC ATCGACACGCTGGCCACCCGCAGCGCGGAGGAGGCCCTGGAGCTGGTGGCACGGGCGCTGCGGGATCATCCGGCCGGTTCGCCTCCGGCCACGATCCAG GGCTTCGGGCACCGCCTGTCCAACTGGCCCCGGGTGCCCACGGTGGCCGAGCTCGACGCCGTCACCGGGGAGGTTCCCACGCTGCTCATCTCCGGGGAC GTGCACTCCGGGTGGCTGAACTCGGCGGCGCTGCGTGTCTTCGGCCTGCCGGGGGCCAGCGCCCAGGACCCGGGAGCACCGATGAAGGAGGACCCGTGG TTCGCCCTACTCGACCGCCTCGATGAGGTCCCGGGGACACGCGAGCTGCGGGAGTCCGGCTACCGACAGGTCCTGGCCGACATGCTGTCCCGGGGCGTC ACCGGCGTGGTGGACATGAGCTGGTCGGAGGATCCCGATGACTGGCCGCGGCGCCTGCGGGCCATGGCGGACGAGGGCGTACTCCCCCAGGTGCTGCCC CGCATCCGCATCGGGGTCTACCGCGACAAGCTGGAACGGTGGATCGCCCGGGGCCTGCGCACCGGGACCGCGCTGGCAGGCTCACCCCGCCTGCCCGAC GGTTCCCCGGTGCTGGTGCAGGGGCCGCTCAAGGTGATCGCAGACGGCTCGATGGGCTCGGGCAGCGCACACATGTGCGAGCCCTATCCCGCCGAGCTG GGCCTGGAGCACGCCTGCGGCGTGGTCAACATCGACCGGGCCGAGCTCACCGACCTCATGGCCCACGCCTCCCGGCAGGGTTATGAGATGGCCATCCAC GCCATCGGGGACGCGGCGGTCGACGACGTCGCCGCGGCCTTCGCGCACTCGGGTGCCGCCGGGCG

For whatever reason (and that's the point: we have no idea why), Actinomyces has chosen an AT-poor dialect for its DNA, even though it has to make many of the same types of genes as Clostridium.

Some people don't see this as a major puzzle: One organism evolved its DNA to a super-AT-rich state, another one didn't. So what? It's all random drift.

I disagree. It's not drift. We know of two strong forces that should keep organisms like Actinomyces from developing high G+C content. First is "AT pressure." It's known that mutations naturally tend to go in the GC-->AT direction. (One study found that in Salmonella typhimurium, GC-->AT mutations outnumbered AT-->GC mutations 50 to 1.) In the absence of corrective measures, natural mutations would very quickly lead all organisms in the direction of DNA with a very low G+C content.

A second important force is that of lateral gene transfer, which we know is common in microorganisms; common enough, certainly, to "even out" GC/AT ratios over evolutionary timescales. Random uptake of foreign genes by cells should tend to make A, G, C, and T levels equal, over time. For organisms like Clostridium and Actinomyces (and many others), this clearly hasn't happened.

In an earlier post I mentioned one possible reason organisms drift away from the 50-50 GC/AT centerline. DNA replication is more efficient when the template is biased toward one extreme (GC) or the other (AT), assuming endogenous nucleotide levels can be regulated in a similarly biased fashion (which they presumably are, in these organisms).

One might speculate that GC/AT extremism also simplifies DNA maintenance and repair. Imagine that your DNA is 70% G+C. A super-simple DNA repair tactic for deaminated purines would be to just replace every defective purine with a guanine. Seven out of ten times, blind replacement of defective purines with guanine would be the correct repair, if you're Actionymyces. And one out of three times, mistakes wouldn't matter anyway, because high-GC codons tend to be fourfold degenerate. (In a fourfold degenerate codon, you can replace the third base with anything—A, G, C, or T—without changing the codon's meaning.) Blind guanine substitution would have a better than 80% success rate in a high-GC organism that needed to replace defective purines.

It turns out there are other reasons to live "away from centerline," if you're a bacterium. I'll talk about those in another post.