Posts Tagged ‘code example’

Installing the R kernel for Jupyter on linux

Monday, March 7th, 2016

To paraphrase a saying:

Give a scientist a script and they will analyse data for a day. Teach a scientist to script and they won’t have any more time to do analysis for the rest of their lifetime.

Scripts are a great way to make reproducible workflows, but they are too technical for many situations where you have to report to scientists. Jupyter notebooks are a great way to do an analysis, and report the results at the same time. A Jupyter notebook can contain the analysis, the results, and the documentation that explains the results together in a single file, making it at once understandable and reproducible.

Jupyter started its life as IPython, or “interactive python”. But since they added support for other languages besides python, they had to rename. In principle you can install support for a wide range of scripting languages, but in practice it may be a little difficult to set up. Jupyter consists of multiple ‘kernels’, to get support for a different language you have to install that language, and then install the Jupyter kernel for it. It took me a while to get that working for the R scripting language. What follows are some notes I took during that process, in the hope that they are useful for anybody else trying to do the same thing.

So here is the ‘howto’

First you need to have R and Jupyter installed, but I’m assuming you already got that far. Anyway, that is the easy bit.

The R kernel for Jupyter is available here, and the installation instructions are on the same page. I’ve copied them here for convenience:

install.packages(c('rzmq','repr','IRkernel','IRdisplay'),
repos = c('http://irkernel.github.io/', getOption('repos')))
IRkernel::installspec()

When R is installing one of the dependencies, rzmq, on linux, you immediately run into Problem #1. It will complain:

missing header zmq.h

This happens quite often when installing R packages. It happens when the R package is a wrapper for a C library, and needs the development version of that C library to compile the wrapper. In general, the pattern to solve such a problem is simple. When you encounter:

Missing header xxx.h

the solution is to install the development version of the library with

apt-get install libxxx-dev

But then you get to Problem #2

sudo apt-get install libzmq-dev
... try installing the rzmq package again
interface.cpp:123:49: error: call to zmq::context_t::context_t(int, int) uses the default argument for parameter 2, which is not yet defined
zmq::context_t* context = new zmq::context_t(1);

Annoying! It turns out that there are multiple versions of zmq, and you need to install the right one:

sudo apt-get install libzmq3-dev

We’re not there yet. When trying to install rzmq again, we run into Problem #3:

g++ -I/usr/share/R/include -DNDEBUG -I../inst/cppzmq -fpic -O3 -pipe -g -c interface.cpp -o interface.o
interface.cpp:23:14: error: expected constructor, destructor, or type conversion before '(' token
static_assert(ZMQ_VERSION_MAJOR >= 3,"The minimum required version of libzmq is 3.0.0.");

Apparently there is a check for the version of libzmq, but it isn’t working. The installation doesn’t fail because we have the wrong version of libzmq. The installation fails because the R package doesn’t properly detect the version of libzmq. And the problem isn’t that libzmq is somehow misreporting its own version. The problem looks more like a syntax error, which is a weird internal error that shouldn’t occur in a published library. The syntax error is caused by the fact that rzmq uses C++11, (the 2011 version of C++), which is not the default version. We’ll have to fix rzmq. First get the source code:

git clone https://github.com/armstrtw/rzmq.git
cd rzmq

We have to edit one line of src/Makevars, as you can see from the following ‘patch’:

diff --git a/src/Makevars b/src/Makevars
index 7d6771c..c6fdce6 100644
--- a/src/Makevars
+++ b/src/Makevars
@@ -1,5 +1,5 @@
## -*- mode: makefile; -*-

CXX_STD = CXX11
-PKG_CPPFLAGS = -I../inst
+PKG_CPPFLAGS = -I../inst -std=c++11
PKG_LIBS = -lzmq

Now let’s install our own hacked package:

R CMD build .
R CMD INSTALL rzmq_0.8.0.tar.gz

(btw, why is build lowercase and INSTALL uppercase? this is exactly the type of thing why R is not my favourite scripting language)

We’re still not there.

Problem #4 – we installed the very latest rzmq fresh from source code, which now requires R version 3.1.0. I’m using a Long-Term-Support (LTS) version of Linux Mint and I don’t want to switch R versions as that could create a hassle elsewhere. Does rzmq really require 3.1.0, or does it merely say it does because it’s pretending to be cutting edge? Let’s hack it up some more.

git diff
diff --git a/DESCRIPTION b/DESCRIPTION
index ba50840..228e3e0 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -5,7 +5,7 @@ Maintainer: Whit Armstrong <armstrong.whit@gmail.com>
Author: Whit Armstrong <armstrong.whit@gmail.com>
Description: Interface to the ZeroMQ lightweight messaging kernel (see <http://www.zeromq.org/> for more information).
License: GPL-3
-Depends: R (>= 3.1.0)
+Depends: R (>= 3.0.0)
SystemRequirements: ZeroMQ >= 3.0.0 libraries and headers (see <http://www.zeromq.org/>; Debian packages libzmq3, libzmq3-dev, Fedora packages zeromq3, zeromq3-devel)
URL: http://github.com/armstrtw/rzmq
BugReports: http://github.com/armstrtw/rzmq/issues

And now the R kernel installs without further problems for me. I haven’t noticed any incompatibilities with earlier R versions yet, R 3.0.0 seems to be just fine.

Proxy configuration for Cytoscape

Tuesday, June 11th, 2013

In large companies, you often find that direct web access is blocked: you have to ask a proxy server to request web pages on your behalf (The proxy also does stuff like scanning for viruses and malware). As a consequence, all the software on your computer needs to be configured to be proxy-aware. This is usually done for you, but Bioinformaticians tend to use “non-standard” software that you’ll have to configure yourself.

If you are using Cytoscape 2.X or 3.0 behind a proxy, and you know your proxy settings, you may find the following useful.

Cytoscape has a “proxy server settings” dialog, as described in the manual. The problem is that it doesn’t work – it stores the proxy settings in a special way that only some bits of Cytoscape are aware of. It does not work for plug-ins (sorry, “apps”) that make use of off-the-shelf Java libraries.

Instead, go to your Cytoscape installation directory, and look for a file named Cytoscape.vmoptions. Enter the following lines at the top. Substitute the dummy host (192.168.5.130) and port (8080) values for the appropriate values of your proxy.

-DproxySet=true
-Dhttp.proxyHost=192.168.5.130
-Dhttp.proxyPort=8080
-Dhttps.proxyHost=192.168.5.130
-Dhttps.proxyPort=8080

This method works for Cytoscape internally as well as plug-ins and libraries, so you can just ignore the internal Proxy configuration dialog. I’ve tested this for Cytoscape 2.8.2 and 2.8.3, and it’s also relevant for Cytoscape 3.0. People from the Cytoscape mailinglist inform me that this will be changed in the upcoming Cytoscape 3.1.

I recommend putting the options at the top, because Cytoscape.vmoptions has a maximum of 9 options. Any more are quietly ignored.

In case you want to delete some to make space, I’ll explain the meaning of the default Cytoscape.vmoptions. The first three options increase the memory available to Cytoscape, and are potentially useful to keep if you deal with large networks:

-Xms10m
-Xmx768m
-Xss10m

The next two deal with anti-aliasing for font rendering. That’s ancient stuff, I can’t remember the last time I saw a Java application without anti-aliased fonts. I think you can remove them safely, and in the worst case you’ll just get some ugly text.

-Dswing.aatext=true
-Dawt.useSystemAAFontSettings=lcd

Finally, a note for Java developers: if you are trying to debug proxy issues, use the following snippet of code just before you make a web request. Sometimes the values of system properties are not what you think they are – with this you can confirm them.

// print out proxy settings for debugging purposes
for (String key : new String[] { "proxySet", "http.proxyHost",
        "http.proxyPort", "https.proxyHost", "https.proxyPort" })
{
    System.out.printf ("%30s: %40s\n", key, System.getProperty(key));
}

BridgeDb: now also with metabolite identifier mapping

Thursday, March 25th, 2010

Ha, fooled ya! BridgeDb has been able to deal with metabolite identifiers since the beginning. But mapping genes is such a common problem that metabolites aren’t getting any attention. Nearly all the code examples that we have thus far are with genes.

Somebody on the mailinglist asked for an example with metabolites. Well here you go, you’ll see it’s really easy. This example takes the ChEBI identifier for methionine, and looks up the corresponding PubChem identifier.

// Using the BridgeRest webservice as mapping
// service, it does compound mapping fairly well.
// We select the human species, but it doesn't really
// matter which species we pick.
Class.forName ("org.bridgedb.webservice.bridgerest.BridgeRest");
IDMapper mapper = BridgeDb.connect(
    "idmapper-bridgerest:http://webservice.bridgedb.org/Human");

// Start with defining the Chebi identifier for
// Methionine, id 16811
Xref src = new Xref("16811", BioDataSource.CHEBI);

// the method returns a set, but actually there is only one result
for (Xref dest : mapper.mapID(src, BioDataSource.PUBCHEM))
{
    // this should print 6137,
    // the pubchem identifier for Methionine.
    System.out.println ("" + dest.getId());
}

Compile this example with org.bridgedb.jar, org.bridgedb.bio.jar and org.bridgedb.webservice.bridgerest.jar in the classpath, which can be downloaded from http://bridgedb.org/data/releases/