BatchMapper v0.1

Sunday, July 5th, 2009

I just released the first working version of a new tool called batchmapper. This tool lets you take a list of gene, protein or metabolite identifiers from one database and translate them to a different database.

Why is this useful? I’ll explain for metabolites, although the story is really the same for genes and proteins. Metabolites are the chemical compounds that you find naturally in the human body. Of course a lot of research is being done on metabolites, and the collected wisdom is available in a number of online databases, such as Kegg in Japan, PubChem in the USA, ChEBI in the UK and HMDB in Canada

The glut of online databases has lead to a tower of Babel of metabolite identifiers. Glucose, one of the most important compounds in our body, may be known as HMDB00122 in Canada, C00031 in Japan, 5793 in the USA or 17634 in the UK.

batchmapper is a spin-off from recent work done by JJ and me. It’s a command line tool, so it’s not very user friendly, but it is fast, flexible and completely automatic. The translation tables can be provided in the form of text files, relational databases or webservices, or even a combination thereof. This early release is completely functional. Check out the tutorial, and leave some comments here on this blog.

It would be nice if all the online metabolite databases worked together and merged into a single resource, but I don’t see that happening in the near future. At least batchmapper helps to make the problem a little more manageable.