Archive for May, 2009

Here come the Bloggers

Monday, May 25th, 2009

Google Summer of Code is well on it’s way. The GenMAPP organization selected 10 students, the community bonding period is over and now it’s time to start coding like crazy. The official start was on on May 23 (Why on a Saturday, I wonder?).

We encourage students to blog about their project, it’s a good way to keep communication going and also to engage the community with the students. 5 out of 10 students have already picked up the glove. So you can see what’s going on at a glance,  I’ve set up an aggregate blog here: http://www.bigcat.unimaas.nl/~martijn/planet-genmapp/

I thought that works nicely, but Alex found that too hard to browse and preferred a condensed view, with each post only a summary and a link. He set up this competing site using feedcluster here: http://genmapp-gsoc.feedcluster.com/

So which format do you like better?

A Harddrive made of solid rock

Sunday, May 17th, 2009

After the relay switch and the cathode ray tube, rotating magnetic storage is the final part that is to be be replaced by silicon. And thus humanity frees the machine from its shaky, creaky, cranky mechanical shackles, and has created the ultimate device: a computer made of solid rock.

LED's from solid rock

DIY LED's from solid rock, click the image for info

A few weeks ago I bought an Intel X25-M Solid State drive for my laptop, and I’m pretty happy with it. Combined with the boot improvements in Jaunty I can go from pushing the power button to seeing my home page in Firefox in just 45 seconds, and that’s including the three seconds it takes to type my password. Plus, the battery lasts significantly longer. Installation is as easy as any regular laptop drive. People say, and I would agree, that this is currently the most effective upgrade you can give to your laptop or desktop.

But don’t take my word for it. Check out this lengthy review, as well as these endorsements by Joel and Linus.

And for those who say boot time doesn’t matter: You’re an energy hog, please turn off your computer at night, and also drive your hummer into a wall.

About Modules and Superpathways

Sunday, May 10th, 2009

This post is a bit of background on the projects of two of our summer of code students: JJ and Helen.

The quintessence of modularity

The quintessence of modularity

For WikiPathways, people often ask for a way to compose pathways into a single large network. Currently, WikiPathways is a collection of small pathways with clear boundaries. If it were a large network, the advantage would be that you get a complete overview of the cell without arbitrary limits. On the other hand, small pathways can still be understood, edited and curated manually by a single person. Manual interaction is something that we value very much at WikiPathways, and we constantly battle the tendency to invent automatic tools for everything. Another advantage of small pathways is that we can do over-representation analysis (such as GSEA) to rank pathways depending on experimental data – of course it is not possible to make a ranking if you have only a single large network.

Jianjiong Gao is a summer student already for the second year. Last year he tried to solve this problem with the NetworkMerge plugin. It was a great result but not yet very good at merging networks from different sources, that are usually annotated with different types of identifiers. To solve that, JJ is working this year on improving  ID-mapping. If you want to know more, check out JJ’s blog.

Xuemin (Helen) Liu will work on the SuperPathways project. This is similar to Network Merge, but focused on showing the relations between pathways. Thus it will create a new view of a set of pathways that shows the interaction and crosstalk between them. This will let the users of WikiPathways take an overview of the whole pathway collection while still keep the pathways themselves manageably small.

NetworkMerge and SuperPathways will no doubt be useful. But the question remains: do pathways really exist, or is that a human invention to make biology manageable? I think there is a case to be made that pathways do exist as biological entities, even though they have fuzzy boundaries. It helps to think of pathways as modules. There are two schools of thought on this issue.

Map of the phi-X174 genome

Map of the phi-X174 genome

The molecular biology school thinks of pathways as separate modules that can be seen relatively independently. Although there is clearly overlap and cross-talk at times, you can still study a pathway, measure it, and talk about it as though it’s an independent entity.

On the other hand, the systems biology school considers a cell as a gigantic network of molecules all interacting with each other. There are so many influences that you can’t make any predictions about the state of a cell unless you measure all it’s components and know all it’s interactions.

The module school can be accused of simplifying the complexity of cells too much, creating artificial delineations that are not there in reality, just because they are hard to comprehend. Evolution does not favor a clean solution, it lands on a good-enough solution, and we can’t expect cellular networks to be easy to understand.

On the other hand, there is clear evidence that evolution does encourage modularity. The reasoning is that biomolecules are organized in modules because modules can evolve independently, and thus make the organism as a whole more evolveable, i.e. more flexible to adapt over the course of generations.

Consider the phage phi-X174. (A phage is a virus that targets bacteria). It has 11 genes spread over no more than 5386 nucleotides. Its genome is so condensed that several genes overlap, in a certain section even three genes overlap. That means that if in that region one nucleotide is changed, three genes will be affected. In the case of phi-X174, evolution selected a genome as short as possible, so that it can replicate extremely quickly. The downside is that phi-X174 can’t evolve anymore. It is an evolutionary dead-end. To a phage, the genes are its modules and overlapping genes destroy modularity.

There is a clear parallel in software development. Computer programmers usually strive for a modular design. Code is “smelly” when there are too many interactions across the boundaries of modules. You know you have a problem when you fix a bug and two new bugs appear in totally unrelated places. When this happens it’s time to start paying off your technical debt, or face the risk of entering a spiral of doom.

In the bazaar of open source, there are always a dozen projects in any category competing for developer resources and public attention. Thus modular development is favored in the long run. A non-modular project can probably be developed quicker initially, but this comes at the cost of flexibility to add new features. Therefore there is a simple natural selection process going on, where projects have to become modular or face being out-competed. Modularity means evolveability, and ensures long term survival.

GSOC 2009

Sunday, May 3rd, 2009

For the third year in a row, I will participate in the Google Summer of Code with the GenMAPP organization. The first time I was a student, after that two times a mentor. In the coming weeks I’ll write some things about all the different student project that are going on this year.

About the GenMAPP organization itself: GenMAPP acts as an umbrella organization for GenMAPP, Cytoscape, PathVisio and WikiPathways, all four are bioinformatics tools that help with analysis of large biological datasets. Although these four tools have diverse origins they all work in the same problem space. It is really good to do this together for two reasons: first, we can pool our mentoring resources and second, this will encourage cooperation and integration between the tools, to create cohesion between all the parts. Integration is the most important problem in bioinformatics nowadays, and to solve that it’s important that we work together.

Integration is also what Adem’s project is all about this year. Adem Bilican is a french student who is planning to write a BioPAX plugin for PathVisio. BioPAX stands for “Biological Pathway Exchange”, and is aimed as a standard for exchanging pathway data between databases. Since PathVisio is about creating and visualizing pathways, it’s natural that we support this format. A BioPAX plugin would enable PathVisio to communicate directly with large pathway databases such as Reactome.

This is not going to be easy. PathVisio has been designed primarily with the visual aspects of pathways in mind, whereas BioPAX is all about biological semantics. So a successfull BioPAX plugin would have to cross the bridge between layout and content.

Adem will be keeping a blog about his progress here: Adem’s blog