Archive for March, 2010

BridgeDb: now also with metabolite identifier mapping

Thursday, March 25th, 2010

Ha, fooled ya! BridgeDb has been able to deal with metabolite identifiers since the beginning. But mapping genes is such a common problem that metabolites aren’t getting any attention. Nearly all the code examples that we have thus far are with genes.

Somebody on the mailinglist asked for an example with metabolites. Well here you go, you’ll see it’s really easy. This example takes the ChEBI identifier for methionine, and looks up the corresponding PubChem identifier.

// Using the BridgeRest webservice as mapping
// service, it does compound mapping fairly well.
// We select the human species, but it doesn't really
// matter which species we pick.
Class.forName ("org.bridgedb.webservice.bridgerest.BridgeRest");
IDMapper mapper = BridgeDb.connect(
    "idmapper-bridgerest:http://webservice.bridgedb.org/Human");

// Start with defining the Chebi identifier for
// Methionine, id 16811
Xref src = new Xref("16811", BioDataSource.CHEBI);

// the method returns a set, but actually there is only one result
for (Xref dest : mapper.mapID(src, BioDataSource.PUBCHEM))
{
    // this should print 6137,
    // the pubchem identifier for Methionine.
    System.out.println ("" + dest.getId());
}

Compile this example with org.bridgedb.jar, org.bridgedb.bio.jar and org.bridgedb.webservice.bridgerest.jar in the classpath, which can be downloaded from http://bridgedb.org/data/releases/

Google Summer of Code 2010

Thursday, March 18th, 2010

Yay! It’s official, we’re going to be in the Google Summer of Code again this year. Our application as a mentoring organization was just accepted. Cytoscape, PathVisio, WikiPathways and even BridgeDb are all joined under the GenMAPP umbrella organisation. Unfortunately I don’t have time to mentor again, so I’ll be watching from the sidelines this year. But I do want to encourage students to apply.

Students from all nations, we want to hear from you! If you’re interested in developing open source bioinformatics software, please send us a proposal. Check our ideas page, write a proposal and send it to our mailinglist. You have a chance to gain valuable development experience and earn a little money at the same time. The earlier you contact us, the better your chances.

The Downside of Modularity

Saturday, March 13th, 2010

I’m a big fan of modularity. I’ve even got a modular system in my living room. It consists of the following modules:

  • One module that converts a digital signal to a two-dimensional picture.
  • One module that reads a rotating plastic disk with a laser and produces a digital signal.
  • One module that gets a digital signal from a socket in the wall, stores it temporarily on magnetic disk, and sends it out again upon request.
  • One module that generates a digital signal based on a simulation of a virtual world, with which I can interact in real time using motion and pressure sensitive input devices.

In case you hadn’t guessed already, I was talking about my TV, DVD player, Hard-disk recorder, and Game Console.

Imagine if all of this came in one device. A TV+DVD+HDR+Console-in-one. Imagine what it would cost. If only one part broke, I would have to get everything anew. I would never be able to move it abroad, because the HDR is tied to my cable provider. I would never be able to get the games that do not involve Italian plumbers.

But to be fair, there are also disadvantages to modular systems. Just take a look at the remote control that comes with it:

How to develop Modular Software

Saturday, March 6th, 2010

It’s always good to make software modular. Modular software is strong and healthy, monolithic software is sickly and bedridden. I’ve touched before on how modularity increases adaptability. But modularity also helps to keep software small, nimble and unbloated. I’ll illustrate how we’re applying modular design in BridgeDb.

Modularity is the only known antidote against bloatware. The more features a piece of software has, the larger it has to be. When you don’t use 90% of those features, it’s perceived as a problem. Bloated software takes a long time to start, fills up your hard drive, clogs your tubes. We want bioinformatics developers to use BridgeDb as much as possible, and we don’t want them to complain that BridgeDb is bloated.

For example, BridgeDb supports identifier mapping through several different web services. Some of those webservices are based on SOAP, others on XML-RPC or on REST. For each type of webservice, you need additional libraries. If it was only one monolithic chunk, you’d always need several megabytes of library dependencies.

You may say: “A few megabyte, so what?”. When I was at mediamarkt the other day, I couldn’t even find memory sticks smaller than 2 Gb anymore. But size still matters when you expect fast download times. For example WikiPathways uses BridgeDb on each pathway page. Bigger libraries means longer load times, which means annoyed users.

We want many features, but we don’t want bloat. The solution is to cut BridgeDb up into many small pieces, where you can choose the ones you need, and ignore the rest. You also don’t need the dependencies of the parts you ignore.

So how do you decide which pieces of BridgeDb you need? I’ve compiled this handy graph. On the right side, you see all the different “features” (i.e. identifier mapping services) that you can choose. Follow the arrows to the left, and note the modules that you encounter. Those are the modules you need for that mapping service.

If you’re getting started with modular software development, I can give you a few tips. You really don’t need any of those terribly complicated frameworks like Maven or OSGi. All you need is a good IDE like Eclipse and a bit of determination.

You have to be careful to manage the boundaries between modules. Eclipse can help you a great deal with this. Put each module in its own directory. In your Eclipse workspace, set up a separate project for each module, and add dependent projects in each project build path. This way you can never introduce cyclic dependencies or go across module boundaries. Eclipse will simply refuse to find the class and flag it as a compiler error.

For example, here is how I’ve set up BridgeDb in eclipse. In the package explorer you see that I’ve defined a separate project for each module in BridgeDb.

And to complete the example, here is how I configured the build path for the org.bridgedb.bio module. As you can see, the org.bridgedb project is listed as its sole dependency.