Posts Tagged ‘wikipathways’

Pathway Visualization to the next level

Friday, February 25th, 2011

The laboratory of bioinformatics of Wageningen University has put together some really cool hardware. In the picture below you see their tiled display, consisting of 12 high-resolution monitors, powered by a single workstation.

PathVisio on tiled display

PathVisio on a tiled display

This setup gives you a lot of resolution to play with. We managed to display all major metabolic pathways from WikiPathways simultaneously, at full resolution, and map microarray data as well. When you’re standing right next to the screens, it feels like the data is all around you. That really encourages you to explore, and make connections across the pathways. That’s just much harder to do on a single screen.

WikiPathways Curation Jamboree Evaluation

Friday, April 16th, 2010

WikiPathways content is growing nicely, but it’s not growing like one of those nice exponential curves that you see in the first slide of almost every bioinformatics presentation nowadays. We want exponential curves in our presentations too, dammit, so we want to get more people actively involved.

A big challenge for WikiPathways is to get people to take the first step, to get them over that initial hump and actually start participating. Certainly a lot of people are very interested in WikiPathways, but there is some hesitation to just start working on the content. It’s something we have to work on. Besides clearing technical hurdles, we try to gently help people, simply to get started.

As an experiment, we organized a dedicated curation jamboree, a focused effort to get together and crank through a list of curation tasks. We prepared documentation, contacted several mailing lists and harassed all our colleagues. We also put together a special chat channel where newcomers can get instantaneous contact and answers to quick questions. This event happened for two days in February.

So, was it a success? Yes, if you look at edit activity. Thomas made this graph of the number of pathways tagged with either “needs reference” (for pathways that don’t have any literature references) and “missing description” (for pathways that don’t have a nice description text). As you can see, the numbers dropped quickly during the two days of the curation event, by at least 25%. (ignore the initial jump in the blue line, that’s due to a bug in the data collection script). WikiPathways gained a lot of curated data in a short period of time.

Numbers of pathway with a curation tag over time

The most active contributors were the usual suspects: Thomas, Alex, Kristina and me, the core WikiPathways team. But you can see in the graph below that other people joined in as well. Even if they did only a few curation tasks, that’s good  enough. The most important thing is to get people to take the first step. So the graph below is misleading: participating really is more important than winning.

Number of edits per user

Fixing a Pathway the Groovy Way

Tuesday, August 25th, 2009

It’s no secret that there are many mistakes in WikiPathways. A coworker notified me of a problem with the Focal Adhesion pathway. It’s exactly the sort of problem that requires a lot of repetitive action to be fixed. So I thought, instead of doing this the boring way, I write a program to do that instead, and have some fun with it at the same time. This is the perfect opportunity to get some practice with the WikiPathways webservice.

To make it a little bit more interesting, I decided to do it with Groovy, a scripting language I’m learning. It’s very similar to Java but also has a lot of cool whizzbang features such as dynamic typing, closures and multi-line strings.

For reference, you can find the full script below. But I’ll explain bit by bit. First step is to get access to the WikiPathways webservice:

def wpclient;
wpclient = new WikiPathwaysClient();

After this bit of setup, wpclient can be used to interact with WikiPathways.
In this case we want to send the updated pathway back, so we’ll need to log in. Authentication is not required if you only need read access. However, if you want to write data, you have to have a valid account with webservice permissions, which is available upon request.

wpclient.login("username", "********");

I only needed to get one pathway, and I know which one. It has identifier WP306. The code to download it is:

def wspwy = wpclient.getPathway("WP306");
def pwy = wpclient.toPathway(wspwy);

wspwy is a WSPathway object, which is a wrapper for the pathway with some meta-data from WikiPathways (species and revision number). The actual Pathway itself can be obtained with the toPathway() method.

Now is the time to manipulate the pathway, and fix it up any way imaginable. I’ll describe that below, but first I’ll show how I uploaded the pathway again using the webservice:

println "Uploading... ";
wpclient.updatePathway (wspwy.getId(), pwy,
  "autoconversion of Entrez symbols to IDs",
  wspwy.getRevision().toInteger());

So now onto the actual pathway manipulation. I didn’t explain yet what kind of problem needed to be fixed. The problem here was that none of the genes had a proper identifier. As a result, none of the links worked, and the pathway could not be linked to experimental data.

However, there is some good news because each gene is labeled with the gene name. Gene names are nice as labels because they are readable and a bit more meaningful than identifiers, but on the other hand gene names tend to be ambiguous, so we really need the identifiers as well. But gene names are a good start. Using BridgeDb we can perform a free text search for the gene name, and come up with a matching identifier.

First we need a bit of setup code for BridgeDb. In this case we’ll make use of our Human gene database, which can be downloaded from http://www.bridgedb.org/data/gene_databases

Class.forName ("org.bridgedb.rdb.IDMapperRdb");
mapper = BridgeDb.connect (
  "idmapper-pgdb:/path/to/Hs_Derby_20090720.bridge");
BioDataSource.init();

The main logic bits to translate gene names to gene identifiers is contained in the function labelToXref. It takes the label containing the gene name as argument, and returns an identifier or null if nothing was found.

def labelToXref (label) {
    // do a free search for all Xrefs that match our label
    for (ref in mapper.freeAttributeSearch(label, "Symbol", 100)) {
        // check only Xrefs that are in Entrez,
        // and that are an exact match with the label
        // free search will also return partial matches.
        if (ref.getDataSource() == BioDataSource.ENTREZ_GENE &&
            mapper.getAttributes (ref, "Symbol").contains(label)) {
            return ref;
        }
    }
    return null;
}

Now all we have to do is loop over each element in the pathway, get its label and use labelToXref to get the correct identifier.

And that’s it. It took me about an hour to write this. It would have taken less if I didn’t have to look up some bits about the Groovy syntax. As you can see from the pathway history, almost all genes were fixed.

Here is the entire script for reference purposes:

import org.pathvisio.model.Pathway
import org.bridgedb.bio.BioDataSource
import org.bridgedb.BridgeDb
import org.pathvisio.model.ObjectType
import org.pathvisio.view.Graphics
import org.pathvisio.wikipathways.WikiPathwaysClient

// Pathway WP306 (Focal Adhesion) uses Entrez symbols instead of Entrez ID's
// Thanks to Claus Mayer for reporting this problem.
public class EntrezSymbolToNumber {
    def mapper;
    def wpclient;
    // Look up entrez gene id for a given label
    // e.g. for INSR it will return L:3643
    def labelToXref (label) {
        // do a free search for all Xrefs that match our label
        for (ref in mapper.freeAttributeSearch(label, "Symbol", 100)) {
            // check only Xrefs that are in Entrez,
            // and that are an exact match with the label
            // free search will also return partial matches.
            if (ref.getDataSource() == BioDataSource.ENTREZ_GENE &&
                mapper.getAttributes (ref, "Symbol").contains(label)) {
                return ref;
            }
        }
        return null;
    }
   
    void init() {
        wpclient = new WikiPathwaysClient();
        wpclient.login("username", "********");
        Class.forName ("org.bridgedb.rdb.IDMapperRdb");
        mapper = BridgeDb.connect ("idmapper-pgdb:/path/to/Hs_Derby_20090720.bridge");
        BioDataSource.init();
    }
   
    void run() {
        def success = 0;
        def total = 0;
        // fetch pathway through the webservice
        def wspwy = wpclient.getPathway("WP306");
        def pwy = wpclient.toPathway(wspwy);

        // loop over all data nodes
        for (dn in pwy.getDataObjects()) {
            if (dn.getObjectType() == ObjectType.DATANODE) {
                total++;
                def label = dn.getTextLabel();
                def ref = dn.getXref();
                print (label + " " + ref + " -> ");
                if (ref.getDataSource() == BioDataSource.ENTREZ_GENE
                        && !mapper.xrefExists(ref)) {      
                    def result = labelToXref (label);
                    if (result != null) {
                        println "mapping to " + result;
                        dn.setGeneID (result.getId());
                        success++;
                    }
                    else
                        println "could not map";
                }
                else
                    println "OK";
            }
        }
        println success + " out of " + total + " converted";       
        println "Uploading... ";
        wpclient.updatePathway (wspwy.getId(), pwy, "autoconversion of Entrez symbols to IDs", wspwy.getRevision().toInteger());
    }
   
    static void main (args) {
        def runner = new EntrezSymbolToNumber();
        runner.init();
        runner.run();
    }  
}

By the way, I’m using the codecolorer plugin for the pretty code formatting. (thanks to rguha for the tip, and to Dmytro for being so responsive to my bug report)

Mining biological pathways using WikiPathways web services

Thursday, August 6th, 2009

A website lets people interact with computers over the Internet. A web service on the other hand, lets computers interact with computers over the Internet. We’ve created a web service for WikiPathways so people can write computer scripts to do interesting new things with WikiPathways. This is all described in great detail in an article that was recently published in PLoS One.

Mining biological pathways using WikiPathways web services.
Kelder T, Pico AR, Hanspers K, van Iersel MP, Evelo C, Conklin BR.
PLoS One. 2009 Jul 30;4(7):e6447.

Naturally it’s open access, so you can read it all online. From the article:

The WikiPathways web service provides an interface for programmatic access to community-curated pathway information. […] The web service can be used by software developers to build or extend tools for analysis and integration of pathways, interaction networks and experimental data. The web services are also useful for assisting and monitoring the community-based curation process. By providing this web service, we hope to help researchers and developers build tools for pathway-based research and data analysis.

Automated access, plus the fact that all content is available under a Creative Commons license, should make WikiPathways even more useful as a scientific resource. It will be interesting to see what kind of uses people will come up with.

Stable Identifiers for WikiPathways

Friday, November 14th, 2008

Today we did another milestone release of WikiPathways, already the 8th one in our release cycle process. The milestones have been coming steadily, roughly every four weeks. This one was slightly behind schedule, though this was to be expected as we saw some pretty heavy modifications.

So what’s new? The main new feature is “Stable Pathway Identifiers”. So that means that from now on, a pathway may be identified by e.g. “WP254” instead of “Apoptosis” While this may sound as exciting as a new cover sheet for TPS reports, this is in fact important groundwork for some interesting features in the future.

Stable identifiers?

As more and more people start linking to wikipathways, (e.g. MSIGDb as we discovered recently) it’s important to keep those links stable and reliable. The disadvantage of identifying pathways just by their names, as we did before, is that the risk is too high that you want to rename them.

You saw the same change happening with Wormbase a few years ago. Wormbase used biological names like “clk-2” or “rad-5”. The C. elegans people have a neat convention of 3-letter gene names plus a number. Clk stands for clock, a class of genes dealing with developmental timing, and rad stands for radiation sensitive. But there are always problems with this kinds of names, like genes being named based on assumptions that turn out to be wrong, or a gene being named independently by two research groups, (clk-2/rad-5 is an example of that actually). This creates all kinds of problems doing bioinformatics. So they introduced identifiers like WBGene00000537.

The problem with descriptive names is that they can be “wrong”. A non-descriptive name is arbitrary, so it’s never wrong, and there is never pressure to change it. We want to use that idea for pathways too.

By the way, old links to WikiPathways are still valid through mediawiki’s redirect mechanism. But the recommended way of referring to pathways is by making use of the new identifiers.

Free Pathway Culture

Friday, September 19th, 2008

According to this book by Lawrence Lessig, Free Culture, we are currently living in a permission culture. In our society you are required to ask permission before you copy, publish or derive from a work that was created by somebody else. If you don’t, you risk an expensive lawsuit.

But artists and inovators influence eachother. They always have. Especially in science progress comes from sharing results and building on top of the results of others. For scientists, permission culture is a nightmare.

The book is mostly about music and movies, but in the world of biological pathway databases the issues are the same. There are hundreds of collections of pathways available, but many of them are locked down. You can use a commercial pathway database to analyze and interpret your high-throughput experimental data, but you are not free to copy those pathways, extend them, derive from them, and publish the results on WikiPathways. This is understandable, after all, the makers of these commercial packages probably spent a lot of time and money on the creation of their high-quality, curated databases.

But imagine what we could do if all this information were free? Then suddenly everybody can contribute. You will never have to worry about legal details, what you can do and what you can’t do. Instead of many small databases, none of which are perfect, we can compile all free pathway information in the world into a single magnificent resource.

I think we have an opportunity here to set a standard for future generations, to set this content free once and for all. We have decided long ago that WikiPathways content will be available under a creative commons license. Our goal is to maximize the long term usefulness of the information collected in WikiPathways. The rules of the creative commons license are plain and simple. The only obligation you have is to attribute the pathway author(s), other than that you are free.

Visual Pathway Diff: First Screenshots

Wednesday, July 11th, 2007

As the mid-term evaluation is underway, I reached a major milestone: My program can now output the difference set in SVG format, meaning that it is possible to compare Biological Pathways visually.

Here are two examples (click on the thumbnails to see a larger image)

Pathway Comparison A

In this screen shot you see two versions of the Acetyl Choline Pathway (very important for transmitting electrical signals between neurons). As you can see I’ve straightened out a few arrows in the new version (on the right). The yellow color indicates things that have changed, the text balloons in the middle explain in more detail which attributes of those things have changed.

Pathway Comparison B

This pathway represents the Alcohol Dehydrogenase reaction (activated when you have a hangover 😀 ). In the image you can see I’ve added a few missing reaction compounds (in green) as well as set the label for the other compounds (in yellow).

SVG output is only one of the possible output formats of GpmlDiff. Another option is to write to an XML-type format that I dubbed DGPML. DGPML is designed in such a way that it should be possible to write a patch utility that takes a GPML Pathway and then applies a DGPML difference set to it.

In the context of wikipathways, I think there are a couple of interesting use cases for a patch utility. I didn’t think about this when I wrote my GSOC proposal, but I think this would be very useful. So useful in fact that I’m going to ask my supervisor Alex Pico if he thinks I should put it in my plan for the second half of the summer.

Pathways and Networks

Friday, May 11th, 2007

After the last post, a few people have asked me for examples of Biological Pathways. Well, if you just head over to WikiPathways you can find hundreds of them. (There are other websites with pathways too, such as KEGG)

A Biological Pathway is usually drawn fairly simple, with only a few dozen interacting components. They’re kinda like flow-charts, not with all the visual complexity that you see in gene networks. I think this is the way it should be. “pathways” and “networks” each provide complementary views of something that happens inside a cell, one view is structured but complex, the other is more free-form but easier to understand.