Saturday, May 9, 2009

Evolution and Ockham's Razor

Or, The Perils of Parsimony.

Ockham's Razor is "is a principle apocryphally attributed to 14th-century English logician and Franciscan friar, William of Ockham." According to Wiki, it's often stated in Latin as "entia non sunt multiplicanda praeter necessitatem," roughly translated as "entities must not be multiplied beyond necessity." The key question here is "what is an entity?"

For purposes of the scientific exploration of life, that is the process and consequences of evolution, we probably shouldn't treat a mutation as an "entity". This is because the number of mutations occurring in any but the smallest population is too large for any assumption that only the mutations necessary to a particular evolutionary progress (e.g. from phenotype "A" to phenotype "B") would have occurred.

Ockham's Razor often shows up as "parsimony", especially in cladistic analysis. "Cladistics, from the ancient Greek κλάδος, klados, "branch", is the hierarchical classification of species based on phylogeny or evolutionary ancestry." Or rather, it is the attempt to build a "hierarchical classification of species based on phylogeny or evolutionary ancestry", using (presumably) scientific methodologies. Parsimony plays a very (too) important part in deciding which of many possible ancestral trees is "right" (whatever that is), based on minimizing the number of observed "changes" in morphology or DNA structure.

The obvious problem with this is that our understanding of past evolution is historical in nature, that is we are attempting to determine which of a semi-infinite number of possible evolutionary paths was actually followed in any evolutionary progress. Our estimation of relative probabilities is retrospective, and says little about which path was actually followed.

Consider the following scenario: Species "A" (disregarding problems with the definition of "species") is related to species "B", and "C", but we don't know which it's more closely related to. A cladistic analysis can list a whole bunch of features, some observable in the phenotype, others involving DNA differences such as the deletion or addition of a chunk of DNA, or a point mutation of one base into another, building a table of which species posses which values for which feature. By using several (presumably) less related species as "outgroups" to help determine the ancestral state of each feature, it's possible to build a series of "trees" based on different scenarios for changes to each feature during the evolution leading from the presumed common ancestor to the three modern species.

Parsimony enters in here as the tree with the least number of changes is considered "best", as in most likely. But is it? Suppose the lowest number of changes needed to evolve our three modern species ("A", "B", and "C") is 15. However, there is only one tree involving 15 changes but there are three involving 16, five involving 17, and twelve involving 18 changes. What is the most likely number of changes?

To answer that question, we need to know the relative probability of a change occurring. If such changes are very unlikely, then the tree with 15 changes may be most likely. If changes are fairly likely, then while the tree with 15 changes may be more probable than any single tree with 16, 17, or 18, the probability that the actual (historical) evolution had some other number of changes (probably 18, considering the number of possible paths) becomes higher than that for 15 changes.

This is not a trivial question, when we go beyond static cladistics to ask how each species came to evolve. In addition to mutations that cause the changes, we also need to consider the selective pressure that went into fixing those changes into the lineage that evolved into one of our modern species. It may be that the likelihood that the most parsimonious path was followed is low compared to one in which an ancestor of two modern species followed some sort of zig-zag path, adding changes that were later reversed, adding the same change more than once (especially for morphological features that were directly driven by selection, etc.)

We know that in real life, evolution involves more than a simple "tree" of species. Trees can describe the relationship of genetically isolated lineages even when they're capable of interbreeding, but a change in conditions can not only re-unite lineages separated long enough to accumulate changes but still interfertile, it can do so in an atmosphere of radically changed selective pressure. An example would be a climate change that removed a desert separating "islands" of habitable regions (say, grasslands), while also converting the entire area to a much wetter climate (say, forests separated by rich savanna). Separate populations that had accumulated separate changes would now be merged into a single lineage, polymorphous with respect to all those innovations. This population might well, in turn, become ancestral to several daughter lineages that continued polymorphous with respect to at least some of these changes. The ultimate descent into modern species "A", "B", and "C" might have followed a completely different path than any parsimonious analysis based on those changes would have predicted.

A more important, and potentially more controversial, sort of thing that we shouldn't consider an "entity" (for purposes of Ockham's Razor) is developmental mechanisms. Specifically, an assumption that a gene's expression will only be controlled by those Transcription Factor Binding Sites it has been proven to be affected by would be silly. (For more discussion of gene activation, see my How Smart is the Cell? Part II: The Gene Activation network as an Analog Computer.) Given that it's known that the majority (probably the vast majority) of regulatory controls remain to be elucidated, the appropriate assumption would be that any gene's activation logic probably has parts still to be discovered.

Similarly, when it comes to more recently discovered mechanisms, such as microRNA, small interfering RNA, and all the other types of non-coding RNA that have regulatory effects, even more caution is indicated. (See my Manifesto for a Kuhnian Revolution for more discussion of these mechanism, or The Fascinating World of RNA Interference by Afsar Raza Naqvi, Md. Nazrul Islam, Nirupam Roy Choudhury, and Qazi Mohd. Rizwanul Haq for a very recent (peer-reviewed) review of the subject.)

Not only caution regarding whether or not there are more microRNAs or siRNAs involved in a particular gene's regulation, but caution regarding the assumed absence of other mechanisms that haven't yet been discovered is required.

Obviously, we can't simply start jumping off after gravity waves flowing along microtubules as an explanation for human awareness. But also, it would be (IMO) short-sighted to rule out plausible mechanisms just because they haven't been discovered. Only if what current research has been done should have discovered such a mechanism, and if we can be reasonably sure that the discovery would have been published (rather than back-burnered because it didn't fit the current paradigm), can it be reasonably ruled out.

Of course, the estimate of plausibility must also be made. But there shouldn't be any interference between that and the question of whether such a mechanism could be expected to have been found (and published) by current research efforts.

Does all this sound very theoretical and hypothetical? After all, what new mechanisms do I have to propose? Well, an answer to that may be found in my series on How Smart is the Cell. The basic thesis there was that the cell is (or at least could be) a lot smarter than we would assume if all its enzymatic and genetic circuits were digital in nature. Digital circuits are ultimately made up of analog elements (see the series), but they achieve precise accuracy at the expense of a lot of computing power when it comes to analog calculations in the real world.

Another proposal of mine is that the large amount of "Alu repeats" in the human (and many mammalian) genome(s) constitutes the raw material for an extensive "analog memory" used by brain cells during the development process. Certainly, no such mechanism has been discovered. But, AFAIK nobody has looked for it, and I doubt anybody will find it with the current research methods unless they do look for it.

It's not just my own speculations I'm trying to justify (or whatever) here, it's plausible speculation in general. We all need to keep in mind that despite the usual scientific focus on parsimony, science has long track record of discovering new mechanisms and effects that had previously been "ruled out" due to parsimony. For a discussion of that, see Wiki:
[I]n many occasions Occam's razor has stifled or delayed scientific progress.[ref] For example, appeals to simplicity were used to deny the phenomena of meteorites, ball lightning, continental drift, and reverse transcriptase. It originally rejected DNA as the carrier of genetic information in favor of proteins, since proteins provided the simpler explanation. Theories that reach far beyond the available data are rare, but General Relativity provides one example.

Cladistic parsimony is used to support the hypothesis(es) that require the fewest evolutionary changes. For some types of tree, it will consistently produce the wrong results regardless of how much data is collected (this is called long branch attraction). For a full treatment of cladistic parsimony see Elliott Sober's Reconstructing the Past: Parsimony, Evolution, and Inference (1988). For a discussion of both uses of Occam's razor in Biology see Elliott Sober's article Let's Razor Ockham's Razor (1990).

I haven't found (yet) an on-line link to the Sober article, but if/when I do, I'll post it here.


  1. The evolutionary tree of life could potentially be much larger than we can view from our own earth's historical perspective. The most widely accepted interpretation of quantum mechanics, by physicists who know what they are talking about, (many worlds) basically means there are "many earths". Decoherence affects macroscopic objects and can affect evolutionary mutation processes. So basically as decoherences causes worlds to branch, you could have differing mutations unique to each new "world". So perhaps the right mutations have all aligned perfectly to create a human level intelligence merely due to the anthropic principle. If there are googols of earths, we may happen to be on the one where life evolved to the intelligence that it did. Maybe on a majority of earths, selfish replicators never came on the scene at all. Perhaps on most of the decohered earths where selfish replicators did evolve you still only have single celled organisms. The right chain of events may never have occurred for something more complex to come into existence.

    See this many world FAQ link below. The FAQ isn't all correct, but it gives a good overview.

    Also see this for more explanation.

  2. Thanks for your comment, Mike. Although my primary thrust was toward the issue of assumed parsimony of mutations, the many-worlds hypothesis has always interested me.

    For one thing, there is certainly as much joining as splitting, at least at a quantum level. I don't see why that shouldn't extend to a macro level as well: if you dig up the skeleton of a 6000-year old person, and can't tell (via DNA or anything else) whether he/she was blond or not, couldn't our world have "many pasts" with the person having different hair color in different past worlds? If an electron can go through two slits and the "split" world rejoins when it takes its place in the diffraction pattern, why not hair color genes?

    The same principle could well apply to mutations as well: we could be the descendants of many different sets of x-grandparents with different genomes, just as our descendants could have many different genomes.


    As far as I can tell, all the thought experiments that involve the wave function breaking down are actually simplistic: when the wave function breaks down it usually only does so locally, entangling itself with the wave function(s) of whatever it encounters. Only with specially designed experiments does the breakdown propagate all the way.

    This is important in considering the behavior of e.g. single molecules of enzymes in a vesicle (or cell): hasn't its wave function spread out throughout the volume available, even though it entangles itself (temporarily) with each molecule of substrate(s) it encounters?

    P.S. I won't be back online till Monday, so I'll be slow to answer any more posts.