This post is a bit of background on the projects of two of our summer of code students: JJ and Helen.
The quintessence of modularity
For WikiPathways, people often ask for a way to compose pathways into a single large network. Currently, WikiPathways is a collection of small pathways with clear boundaries. If it were a large network, the advantage would be that you get a complete overview of the cell without arbitrary limits. On the other hand, small pathways can still be understood, edited and curated manually by a single person. Manual interaction is something that we value very much at WikiPathways, and we constantly battle the tendency to invent automatic tools for everything. Another advantage of small pathways is that we can do over-representation analysis (such as GSEA) to rank pathways depending on experimental data – of course it is not possible to make a ranking if you have only a single large network.
Jianjiong Gao is a summer student already for the second year. Last year he tried to solve this problem with the NetworkMerge plugin. It was a great result but not yet very good at merging networks from different sources, that are usually annotated with different types of identifiers. To solve that, JJ is working this year on improving ID-mapping. If you want to know more, check out JJ’s blog.
Xuemin (Helen) Liu will work on the SuperPathways project. This is similar to Network Merge, but focused on showing the relations between pathways. Thus it will create a new view of a set of pathways that shows the interaction and crosstalk between them. This will let the users of WikiPathways take an overview of the whole pathway collection while still keep the pathways themselves manageably small.
NetworkMerge and SuperPathways will no doubt be useful. But the question remains: do pathways really exist, or is that a human invention to make biology manageable? I think there is a case to be made that pathways do exist as biological entities, even though they have fuzzy boundaries. It helps to think of pathways as modules. There are two schools of thought on this issue.
Map of the phi-X174 genome
The molecular biology school thinks of pathways as separate modules that can be seen relatively independently. Although there is clearly overlap and cross-talk at times, you can still study a pathway, measure it, and talk about it as though it’s an independent entity.
On the other hand, the systems biology school considers a cell as a gigantic network of molecules all interacting with each other. There are so many influences that you can’t make any predictions about the state of a cell unless you measure all it’s components and know all it’s interactions.
The module school can be accused of simplifying the complexity of cells too much, creating artificial delineations that are not there in reality, just because they are hard to comprehend. Evolution does not favor a clean solution, it lands on a good-enough solution, and we can’t expect cellular networks to be easy to understand.
On the other hand, there is clear evidence that evolution does encourage modularity. The reasoning is that biomolecules are organized in modules because modules can evolve independently, and thus make the organism as a whole more evolveable, i.e. more flexible to adapt over the course of generations.
Consider the phage phi-X174. (A phage is a virus that targets bacteria). It has 11 genes spread over no more than 5386 nucleotides. Its genome is so condensed that several genes overlap, in a certain section even three genes overlap. That means that if in that region one nucleotide is changed, three genes will be affected. In the case of phi-X174, evolution selected a genome as short as possible, so that it can replicate extremely quickly. The downside is that phi-X174 can’t evolve anymore. It is an evolutionary dead-end. To a phage, the genes are its modules and overlapping genes destroy modularity.
There is a clear parallel in software development. Computer programmers usually strive for a modular design. Code is “smelly” when there are too many interactions across the boundaries of modules. You know you have a problem when you fix a bug and two new bugs appear in totally unrelated places. When this happens it’s time to start paying off your technical debt, or face the risk of entering a spiral of doom.
In the bazaar of open source, there are always a dozen projects in any category competing for developer resources and public attention. Thus modular development is favored in the long run. A non-modular project can probably be developed quicker initially, but this comes at the cost of flexibility to add new features. Therefore there is a simple natural selection process going on, where projects have to become modular or face being out-competed. Modularity means evolveability, and ensures long term survival.