Archive for the ‘Uncategorized’ Category

Here come the Bloggers

Monday, May 25th, 2009

Google Summer of Code is well on it’s way. The GenMAPP organization selected 10 students, the community bonding period is over and now it’s time to start coding like crazy. The official start was on on May 23 (Why on a Saturday, I wonder?).

We encourage students to blog about their project, it’s a good way to keep communication going and also to engage the community with the students. 5 out of 10 students have already picked up the glove. So you can see what’s going on at a glance,  I’ve set up an aggregate blog here: http://www.bigcat.unimaas.nl/~martijn/planet-genmapp/

I thought that works nicely, but Alex found that too hard to browse and preferred a condensed view, with each post only a summary and a link. He set up this competing site using feedcluster here: http://genmapp-gsoc.feedcluster.com/

So which format do you like better?

A Harddrive made of solid rock

Sunday, May 17th, 2009

After the relay switch and the cathode ray tube, rotating magnetic storage is the final part that is to be be replaced by silicon. And thus humanity frees the machine from its shaky, creaky, cranky mechanical shackles, and has created the ultimate device: a computer made of solid rock.

LED's from solid rock

DIY LED's from solid rock, click the image for info

A few weeks ago I bought an Intel X25-M Solid State drive for my laptop, and I’m pretty happy with it. Combined with the boot improvements in Jaunty I can go from pushing the power button to seeing my home page in Firefox in just 45 seconds, and that’s including the three seconds it takes to type my password. Plus, the battery lasts significantly longer. Installation is as easy as any regular laptop drive. People say, and I would agree, that this is currently the most effective upgrade you can give to your laptop or desktop.

But don’t take my word for it. Check out this lengthy review, as well as these endorsements by Joel and Linus.

And for those who say boot time doesn’t matter: You’re an energy hog, please turn off your computer at night, and also drive your hummer into a wall.

About Modules and Superpathways

Sunday, May 10th, 2009

This post is a bit of background on the projects of two of our summer of code students: JJ and Helen.

The quintessence of modularity

The quintessence of modularity

For WikiPathways, people often ask for a way to compose pathways into a single large network. Currently, WikiPathways is a collection of small pathways with clear boundaries. If it were a large network, the advantage would be that you get a complete overview of the cell without arbitrary limits. On the other hand, small pathways can still be understood, edited and curated manually by a single person. Manual interaction is something that we value very much at WikiPathways, and we constantly battle the tendency to invent automatic tools for everything. Another advantage of small pathways is that we can do over-representation analysis (such as GSEA) to rank pathways depending on experimental data – of course it is not possible to make a ranking if you have only a single large network.

Jianjiong Gao is a summer student already for the second year. Last year he tried to solve this problem with the NetworkMerge plugin. It was a great result but not yet very good at merging networks from different sources, that are usually annotated with different types of identifiers. To solve that, JJ is working this year on improving  ID-mapping. If you want to know more, check out JJ’s blog.

Xuemin (Helen) Liu will work on the SuperPathways project. This is similar to Network Merge, but focused on showing the relations between pathways. Thus it will create a new view of a set of pathways that shows the interaction and crosstalk between them. This will let the users of WikiPathways take an overview of the whole pathway collection while still keep the pathways themselves manageably small.

NetworkMerge and SuperPathways will no doubt be useful. But the question remains: do pathways really exist, or is that a human invention to make biology manageable? I think there is a case to be made that pathways do exist as biological entities, even though they have fuzzy boundaries. It helps to think of pathways as modules. There are two schools of thought on this issue.

Map of the phi-X174 genome

Map of the phi-X174 genome

The molecular biology school thinks of pathways as separate modules that can be seen relatively independently. Although there is clearly overlap and cross-talk at times, you can still study a pathway, measure it, and talk about it as though it’s an independent entity.

On the other hand, the systems biology school considers a cell as a gigantic network of molecules all interacting with each other. There are so many influences that you can’t make any predictions about the state of a cell unless you measure all it’s components and know all it’s interactions.

The module school can be accused of simplifying the complexity of cells too much, creating artificial delineations that are not there in reality, just because they are hard to comprehend. Evolution does not favor a clean solution, it lands on a good-enough solution, and we can’t expect cellular networks to be easy to understand.

On the other hand, there is clear evidence that evolution does encourage modularity. The reasoning is that biomolecules are organized in modules because modules can evolve independently, and thus make the organism as a whole more evolveable, i.e. more flexible to adapt over the course of generations.

Consider the phage phi-X174. (A phage is a virus that targets bacteria). It has 11 genes spread over no more than 5386 nucleotides. Its genome is so condensed that several genes overlap, in a certain section even three genes overlap. That means that if in that region one nucleotide is changed, three genes will be affected. In the case of phi-X174, evolution selected a genome as short as possible, so that it can replicate extremely quickly. The downside is that phi-X174 can’t evolve anymore. It is an evolutionary dead-end. To a phage, the genes are its modules and overlapping genes destroy modularity.

There is a clear parallel in software development. Computer programmers usually strive for a modular design. Code is “smelly” when there are too many interactions across the boundaries of modules. You know you have a problem when you fix a bug and two new bugs appear in totally unrelated places. When this happens it’s time to start paying off your technical debt, or face the risk of entering a spiral of doom.

In the bazaar of open source, there are always a dozen projects in any category competing for developer resources and public attention. Thus modular development is favored in the long run. A non-modular project can probably be developed quicker initially, but this comes at the cost of flexibility to add new features. Therefore there is a simple natural selection process going on, where projects have to become modular or face being out-competed. Modularity means evolveability, and ensures long term survival.

GSOC 2009

Sunday, May 3rd, 2009

For the third year in a row, I will participate in the Google Summer of Code with the GenMAPP organization. The first time I was a student, after that two times a mentor. In the coming weeks I’ll write some things about all the different student project that are going on this year.

About the GenMAPP organization itself: GenMAPP acts as an umbrella organization for GenMAPP, Cytoscape, PathVisio and WikiPathways, all four are bioinformatics tools that help with analysis of large biological datasets. Although these four tools have diverse origins they all work in the same problem space. It is really good to do this together for two reasons: first, we can pool our mentoring resources and second, this will encourage cooperation and integration between the tools, to create cohesion between all the parts. Integration is the most important problem in bioinformatics nowadays, and to solve that it’s important that we work together.

Integration is also what Adem’s project is all about this year. Adem Bilican is a french student who is planning to write a BioPAX plugin for PathVisio. BioPAX stands for “Biological Pathway Exchange”, and is aimed as a standard for exchanging pathway data between databases. Since PathVisio is about creating and visualizing pathways, it’s natural that we support this format. A BioPAX plugin would enable PathVisio to communicate directly with large pathway databases such as Reactome.

This is not going to be easy. PathVisio has been designed primarily with the visual aspects of pathways in mind, whereas BioPAX is all about biological semantics. So a successfull BioPAX plugin would have to cross the bridge between layout and content.

Adem will be keeping a blog about his progress here: Adem’s blog

Free movement of goods and persons

Sunday, April 26th, 2009

I have visited the US several times and I always enjoy being there. What I enjoy less is entering the United states.

A hasty last sip of water before tossing the bottle, take off your shoes, your jacket, you belt, laptop out of bag, keys out of pockets, go through the security gate, collect everything, don’t forget anything, did somebody steal my cell phone? All while the poor sods of the TSA are continuously harassing you and yelling at you (oh poor TSA, are we mean to you?)

Moreover, while going through immigration they take your picture and fingerprints, and ask stupid questions like “Why do you need to be on this business trip, can’t you do that by telephone or email?”

In Brazil they also fingerprint visitors, but with a twist: they only fingerprint US citizens. The good-old eye-for-an-eye approach, how satisfying. A US citizen, going through customs in Brazil, noticing the separate line for US citizens, made a big stink about how discriminatory this is.Yes you’re right,they don’t discriminate in the US. They just treat everybody equally badly.

Discrimination irks people. In our society discrimination is really one of the strongest taboos. But unfortunately discrimination laws are only within countries, not between countries.

That’s why it was so upsetting when last.fm’s announced that their online streaming radio will be for-pay for all countries except the US, the UK and Germany. I think Last.fm is a really valuable service, I discovered lots of new music through it, and it’s probably worth paying for. But why do I pay for something that our German neighbors, less than 50 km away, get for free?

You know what, I don’t need Last.fm. I can get my digital music from many places. Let’s see, where else can I get digital music? Oh I know, let’s buy music from the amazon.co.uk mp3 store, great idea!

Uh oh…

amazon.co.uk mp3 store alert

DVD’s are region locked, hulu.com and audible.com are only available in the US, where does it end?

The root of all this discrimination is of course the opaque and complicated licensing deals that are required by the media industry. If you do not live in a large single market area (apparently everywhere but the US, UK or Germany), licensing deals become so complex that many companies don’t even bother. It’s all that is wrong with living in a permission culture

Of course it has always been like that. But in a connected world where borders are disappearing the contrast is particularly stark.

One of the founding principles of the Eurpean Union is Freedom of movement of goods and persons, that is why the EU was investigating apple for separating the iTunes music store between the UK and the rest of the continent. Last.fm and Amazon mp3 are clearly violating this principle as well.

I wonder where I can file a complaint?

OpenStreetMap revisited

Saturday, February 28th, 2009

I blogged about OpenStreetMap before, it’s a wiki intended to collect GPS datapoints and create a free map of the whole world. So how is this project holding up? Very well, as can be seen in this really cool video: http://vimeo.com/2598878 . It’s an animation of all the edits that happened in the year 2008.Very nice way of visualizing a year of progress.

Stable Identifiers for WikiPathways

Friday, November 14th, 2008

Today we did another milestone release of WikiPathways, already the 8th one in our release cycle process. The milestones have been coming steadily, roughly every four weeks. This one was slightly behind schedule, though this was to be expected as we saw some pretty heavy modifications.

So what’s new? The main new feature is “Stable Pathway Identifiers”. So that means that from now on, a pathway may be identified by e.g. “WP254” instead of “Apoptosis” While this may sound as exciting as a new cover sheet for TPS reports, this is in fact important groundwork for some interesting features in the future.

Stable identifiers?

As more and more people start linking to wikipathways, (e.g. MSIGDb as we discovered recently) it’s important to keep those links stable and reliable. The disadvantage of identifying pathways just by their names, as we did before, is that the risk is too high that you want to rename them.

You saw the same change happening with Wormbase a few years ago. Wormbase used biological names like “clk-2” or “rad-5”. The C. elegans people have a neat convention of 3-letter gene names plus a number. Clk stands for clock, a class of genes dealing with developmental timing, and rad stands for radiation sensitive. But there are always problems with this kinds of names, like genes being named based on assumptions that turn out to be wrong, or a gene being named independently by two research groups, (clk-2/rad-5 is an example of that actually). This creates all kinds of problems doing bioinformatics. So they introduced identifiers like WBGene00000537.

The problem with descriptive names is that they can be “wrong”. A non-descriptive name is arbitrary, so it’s never wrong, and there is never pressure to change it. We want to use that idea for pathways too.

By the way, old links to WikiPathways are still valid through mediawiki’s redirect mechanism. But the recommended way of referring to pathways is by making use of the new identifiers.

Free Pathway Culture

Friday, September 19th, 2008

According to this book by Lawrence Lessig, Free Culture, we are currently living in a permission culture. In our society you are required to ask permission before you copy, publish or derive from a work that was created by somebody else. If you don’t, you risk an expensive lawsuit.

But artists and inovators influence eachother. They always have. Especially in science progress comes from sharing results and building on top of the results of others. For scientists, permission culture is a nightmare.

The book is mostly about music and movies, but in the world of biological pathway databases the issues are the same. There are hundreds of collections of pathways available, but many of them are locked down. You can use a commercial pathway database to analyze and interpret your high-throughput experimental data, but you are not free to copy those pathways, extend them, derive from them, and publish the results on WikiPathways. This is understandable, after all, the makers of these commercial packages probably spent a lot of time and money on the creation of their high-quality, curated databases.

But imagine what we could do if all this information were free? Then suddenly everybody can contribute. You will never have to worry about legal details, what you can do and what you can’t do. Instead of many small databases, none of which are perfect, we can compile all free pathway information in the world into a single magnificent resource.

I think we have an opportunity here to set a standard for future generations, to set this content free once and for all. We have decided long ago that WikiPathways content will be available under a creative commons license. Our goal is to maximize the long term usefulness of the information collected in WikiPathways. The rules of the creative commons license are plain and simple. The only obligation you have is to attribute the pathway author(s), other than that you are free.

WikiPathways: Convergence

Friday, September 12th, 2008

Our recent article about WikiPathways in PLOS Biology has received an overwhelming response (mostly positive). Alex has already listed most of the responses

Most responses are positive, but some (understandably) express concern that WikiPathways is now one of the many new scientific wiki’s springing into existence. People are worried that all these different wiki’s will be terribly confusing and none will achieve critical mass without some sort of convergence. But I’m positive. Clearly, WikiPathways is an experiment, as we explain in Big Data: WikiOmics. All these wiki’s are experiments, no doubt some will fail but others will be succesful and make a real contribution to the march of progress.

The title of that article reminded me of something. There really used to be a site called WikiOmics, a wiki devoted to bioinformatics. It was a very interesting project, but I haven’t heard from them in a while. It turns out that the content of this wiki has now been folded into the hugely succesful and popular OpenWetWare.

As you can see the convergence is already happening.

Summertime!

Tuesday, February 26th, 2008

Google was actually a little dodgy on the subject of GSOC 2008. Take e.g. the following exchange on the mailing list:

> What are the chances SOC will happen in 2008?

Should we choose to run Summer of Code again in 2008, we’ll notify this list.

Leslie Hawthorn even admitted that she thought the whole notion of “Summer” of code was actually very hemispherist. Luckily many Australians and Latin Americans participated in spite of it not being summer, but it can’t be denied there is a certain disadvantage for them. This made me wonder if Google was thinking about changing the format of the program.

But yesterday this message appeared:

Google Summer of Code 2008 is on! You can find full details at http://code.google.com/soc/2008/

Great news! GenMAPP wants to participate as a mentoring organization again, and I’m dead set on joining too, this time as mentor!