Monday, June 15, 2009

Modularity Part I: Metabolic Pathways

This is a difficult subject and I've been putting it off, even starting a new blog, put it's time to try to explain the subject of modularity in biological networks. Wiki has little on the subject, but offers the following extract:
Bolker, however, attempts to construct a definitional list of characteristics that is more abstract, and thus more suited to multiple levels of study in biology. She argues that:
  1. A module is a biological entity (a structure, a process, or a pathway) characterized by more internal than external integration
  2. Modules are biological individuals [refs] that can be delineated from their surroundings or context, and whose behavior or function reflects the integration of their parts, not simply the arithmetical sum. That is, as a whole, the module can perform tasks that its constituent parts could not perform if dissociated.
  3. In addition to their internal integration, modules have external connectivity, yet they can also be delineated from the other entities with which they interact in some way.21
This isn't really very explanatory, although it pretty much sums up what modularity is.

Since my professional background is in computer programming and systems architecture, I'm going to approach this from an info systems perspective. ... Let's consider a computer program that has a number of variables, and a number of procedural subroutines that work on those variables. We can define a module as a collection of subroutines and variables that are isolated: a few variables are used as inputs and outputs to the subroutines involved in the module, while most of them are internal to the module and can only be seen by the subroutines that are part of it.

This concept has been improved in object-oriented programming, in which an object is defined both internally and externally: from the outside it is a black box with a few visible variables and "methods" (subroutines) accessible to outside modules, while internally it has many more variables and likely a number of internal methods all of which are used only by other elements of the object. Such an object is a well isolated module.

Now, we could take these definitions, and draw a rough analogy with the metabolic system of the cell: we'll let variables correspond to substrates and products: chemicals within the cell that participate in reactions catalyzed by enzymes. We'll let the subroutines correspond to the reaction, or perhaps the enzyme that catalyzes it.

Now, with either system, we can create a network graph, a system of nodes and edges.

In most analyses, the variables/substrates are made into nodes, and the subroutines/reactions are made into edges connecting them. However, in a paper13 still in pre-publication, the opposite has been done, allowing enzymes to be nodes, which are connected by the substrates/products they process: The conservation and evolutionary modularity of metabolism

The purpose of this paper is to analyze the conservation of enzymes throughout evolution, correlating this conservation with identified modules in the cellular metabolism.

Peregrin-Alvarez, J., Sanford, C., & Parkinson, J. (2009). The conservation and evolutionary modularity of metabolism Genome Biology, 10 (6) DOI: 10.1186/gb-2009-10-6-r63

What Is Modularity in a Network?

The hard part is pinning down modularity in a biological system. We know that, analogous to the programming example above, we should expect a module to consist of a combination of enzymes and substrates/products that are more closely connected to one another than to outside entities.

Figure 1: A large complex network with several modules identified. (From Ref 20.)

In Figure 1, you can see that several modules are pretty clear, in a high-level sense. There are groups of nodes and edges that clearly make a dense cluster of internal associations, while having few connections outside. But a precise definition is harder to pin down. Basically, modules are generally identified by complex mathematical processes that can determine clusters, in much the same way we can by eye, except they can "see" clustering our eyes can't.

The important thing to recognize about modules in metabolic networks is that what they share outside the module is less than what's shared within it. In programming, it makes it easier to change things, because when you need to change the internals of a module, you only need to verify that it works the same with the same values in the externally visible variables. You can change how internal variables are used without having to worry about anything beyond the internal subroutines. Similar advantages exist for natural selection working with biological networks, including metabolic networks. In fact, it's been verified that there's a good correlation between the variability of the environment of bacteria and the level of modularity in their metabolic networks.19

In considering metabolic modularity, we need to distinguish between the two types of network mapping. Using substrates/products as nodes (and enzymes as edges) is better in some respects, but the various network analyzes aren't really relevant to evolution, because much of the analysis depends on what happens when a particular node is removed, which is unlikely in an evolving system. Rather, what might be removed, or changed, is an enzyme.

It might happen that a mutation creates a new version enzyme that no longer creates the particular chemical (substrate/product), but it's far more likely that the enzyme will simply be removed. And, even so, if the enzyme is so changed, an analysis can simply treat it as the removal of an enzyme and its replacement with another that performs different reactions.

How This Research Uses Modularity

Thus, this is the first paper I've seen that performs a network analysis that is evolutionarily relevant. One of the things they found is that :
enzymes involved in multiple superclasses were most highly conserved and those involved in glycan metabolism were least highly conserved [ref].13
If we think of "superclasses" as modules, then the "enzymes involved in multiple superclasses" would be (very roughly) analogous to the variables used by multiple subroutines in our programming analogy: they are sort of the "inputs and outputs" of modules, the externally visible interfaces of the "black box" of the module. Enzymes that are only involved in one such module can be modified, removed, or added, without affecting anything outside that black box. But a change to the external interface will affect all the modules that use it. No wonder they are more conserved: any change has to be coordinated with more compensatory changes elsewhere.

As mentioned above, the whole subject of modularity is a difficult one. This paper represent an apparent departure (unless I've missed some in my searches) in how it applies network theory to metabolic networks, which makes it important:
The use of a global metabolic network map (Fig. 4) allowed the identification of a highly interconnected core of conserved enzymes many of which are involved in multiple pathways. Such enzymes support the notion that “enzyme recruitment” plays a large role in metabolic evolution where novel pathways can emerge through the recruitment of enzymes (and hence their metabolites) from existing pathways [refs. ...] Pathways involving carbohydrate, amino acid and energy metabolism form a distinct core network with many shared enzyme activities.
Thus, this "highly interconnected core of conserved enzymes many of which are involved in multiple pathways" is probably mostly made up of "module interfaces", allowing us a much better understanding of how modularity interacts with evolution, as well as a good analogy with existing analysis of modularity from the field of programming.

This is hardly everything about modularity, I've barely touched on the subject. In future posts I'll try to expand on it, as well as providing links to other discussions as I find them.

Links: (Not all of these are called out in the text. Use the back key if you came via clicking a link.)

1. Statistical mechanics of complex networks

2. The structure and function of complex networks

3. Collective dynamics of ‘small-world’ networks

4. Hierarchical Organization of Modularity in Metabolic Networks Requires free registration

5. Systems Biology: A Brief Overview Requires free registration

6. Reverse Engineering of Biological Complexity Requires free registration Requires subscription, otherwise go here and click the [Abstract/Free Full Text] link

7. Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms

8. Network Motifs: Simple Building Blocks of Complex Networks Requires free registration

9. Network motifs in the transcriptional regulation network of Escherichia coli

10. The metabolic world of Escherichia coli is not small

11. MANET: tracing evolution of protein architecture in metabolic networks

12. Functional cartography of complex metabolic networks

13. The conservation and evolutionary modularity of metabolism

14. Extraction of phylogenetic network modules from the metabolic network

15. The Phylogenetic Extent of Metabolic Enzymes and Pathways

16. Modular organization of cellular networks

17. Modular co-evolution of metabolic networks

18. The evolution of modularity in bacterial metabolic networks

19. Environmental variability and modularity of bacterial metabolic networks

20. Iterative Vector Diffusion for the Detection of Modularity in Large Networks

(I've included only the link(s) referenced here.)

21. Modularity in Development and Why It Matters to Evo-Devo

No comments:

Post a Comment