Ports, tunnels, request types and virtual hosts

The internet is surely the most incredible machine on earth. For one thing, I use it to share code with other developers, using a program called subversion. But the other day, subversion was being blocked by a firewall. Fixing that problem was a great opportunity to get my hands dirty with the nuts and bolts of the internet, and I learned a lot too, which I'd like to share here.

First let me explain about ports, because it will be important later. An internet connection always involves two programs: one is the client, running on the local machine, and the other is the server, running on the remote machine. For example, the client could be Firefox on the wife's laptop, and the server could be Apache serving images of kittens.

Now imagine that the remote machine had both a web server and an email server installed. To distinguish the traffic for each program they are assigned a port number. The web server is listening on port 80, which is the conventional port for web traffic. The email server is listening on port 25, and both happily co-operate on the same machine [1].

The client and server must speak the same language, or protocol, to communicate. There is a whole alphabet soup of protocols such as HTTP, FTP, SMTP... Not surprisingly most of them end with the letter P. The most common one is HTTP, being the protocol used for web browsing. This protocol dictates that the browser should start by sending a request. This can be one of several request types, e.g. GET to request the latest kitty pictures, and POST to upload new ones.

Firewalls are designed to let through the ordinary, and block the unusual. Since HTTP is so common, firewalls normally let it go through unharmed. Subversion also uses HTTP, but still it was being blocked [3]. This is because subversion uses rather weird HTTP request types, such as PROPFIND [4]. This is legal according to the protocol, but it's unusual. Firewalls find that suspicious. It's not because subversion is trying to be funny. Honestly, I think that blocking PROPFIND is just the default setting on popular firewall software, and the sysadmins don't bother to change the defaults. After all, Subversion is only used by developers, who make up just a fraction of the population, and they are geeks anyway, so nothing to worry about.

So what to do? Well luckily, I had an account on this particular server for a program called SSH, and with that I set up a tunnel to bypass the firewall. Here is how I did that:

First, I instructed subversion to send its requests to localhost, instead of the subversion server, and to use port 7654 instead of 80 [2]. So instead of doing a subversion checkout from http://svn.bigcat.unimaas.nl/bridgedb/trunk, I was doing it from http://localhost:7654/bridgedb/trunk.

What is localhost? Localhost corresponds to IP address 127.0.0.1, which is a special address that sends messages right back to where they came from. Every computer, no matter how simple, can act as a server, as long as it has suitable software listening on a port. What would be the use of that? The messages are already at localhost, so there what is the point in sending them there? As mentioned above, internet communication is always between two programs. They communicate even if they are written in very different programming languages, as long as they follow the right protocol. Connecting over localhost is sometimes the easiest way to get two very different pieces of software to talk to each other.

So I instructed SSH to set up a tunnel. What this means is that SSH is listening to port 7654, where it was receiving all messages from subversion. SSH does not interpret these messages, it just encrypts them, and forwards them over the internet. The unusual PROPFIND requests are now obscured by encryption. The messages arrive at the remote server on port 22, where another copy of SSH decrypts the messages and passes them on again. They continue the journey to localhost (from the servers point of view), on port 80, where the subversion messages were expected to arrive in the first place. The beauty of this is that in spite of all the redirection, both the subversion client and server are oblivious to what is going on, they just send and receive messages as usual.

To make this trick work on windows, you can configure Putty, the windows variant of SSH:

On linux, it's a simple matter of typing

ssh -L 7654:localhost:80 username@svn.example.com

Except that in my case... it still wasn't working.

The problem is that this particular server is actually hosting two websites: http://bridgedb.org and http://svn.bigcat.unimaas.nl. This server was configured with a technique called virtual hosting, which is useful when you want to host several small websites. Putting each on a separate computer would be very inefficient. With virtual hosting, you can bundle multiple sites on a single server.

The web server listening on port 80 looks at the incoming requests to decide which of the virtual websites is going to handle the request. Normally, a subversion request for the page /bridgedb/trunk on the server svn.bigcat.unimaas.nl looks like this:

PROPFIND http://svn.bigcat.unimaas.nl/bridgedb/trunk

But because of the way I set things up earlier, subversion thinks that it is talking to localhost. Even though the messages are forwarded to the server correctly due to SSH, when they arrive, the requests still look something like:

PROPFIND http://localhost:7654/bridgedb/trunk

Which doesn't help the web server to decide if this request should be served by bridgedb.org or svn.bigcat.unimaas.nl

So what to do? Next, I tricked my local computer into thinking that svn.bigcat.unimaas.nl and localhost are the same, by adding the following line to the hosts file, which is in C:\Windows\System32\drivers\etc on windows, (you need to open notepad with sysadmin rights in order to be able to edit the file) or in /etc/hosts on linux.

127.0.0.1 svn.bigcat.unimaas.nl

This tells the operating system, that when you make a request for svn.bigcat.unimaas.nl, it should really be sent to 127.0.0.1. Which coincidentally is the IP address for localhost. This means that I can configure subversion to send to svn.bigcat.unimaas.nl, even though svn.bigcat.unimaas.nl is really localhost due to the hosts file, except that localhost really is svn.bigcat.unimaas.nl due to the SSH tunnel.

And finally it works!

Footnotes

[1] These port numbers are just conventions, and we could configure each piece of software to use a different port if we wanted. WikiPedia has a long list of conventional port numbers
[2] Why port 7654? For no reason other than that it was free on my machine. (In fact I could have used port 80, which is normally free, unless you're running a web server on your computer, which I do, but that is a different story)
[3] The blocking could also be done by a proxy instead of a firewall, but that doesn't matter for this discussion
[4] I have had problems with PROPFIND before, see also my question on stackoverflow to diagnose the problem.