Building a Virtual Private Network with Linux
Arthur Donkers
Connecting different locations used to be a question of dedicated lines, expensive routers, and a lot of proprietary protocols. Apart from being cumbersome, it cost a lot of manpower and money to keep a link like that up and running. And if the different locations were situated in different countries (or even continents), it could run into a very expensive adventure.
Now (actually for more than 20 years), there is the magical network called the Internet that spans the globe and offers world-wide connectivity for local rates. So, what would be better than connecting the aforementioned locations via the Internet? Apart from a few minor problems like security, there is nothing to keep us from doing so. To solve the security problem, and some others, you can use a so-called Virtual Private Network or VPN. With this technique, you can extend your local corporate network over the Internet without compromising your corporate security or fundamentally rearranging your network topology.
This article will discuss how to build a virtual private network using Linux, a UNIX-compatible operating system developed on the Internet. Linux is available in source and can easily be adapted to your own hardware. It is currently available for PCs based on the Intel architecture, Sun SPARC machines, and Digital AXP (Alpha) machines (among others).
Introduction Building a VPN can be a daunting task. A lot of details must be taken care of, and the task may involve a few programming skills to get the job done. I hope to get you started in the right direction. Each network environment has its own specifics, and therefore needs one or more specific solutions. Do not despair; with these techniques shown, you will be able to get a large chunk of the work done.
Public Domain Versus Commercial The first question concerns the application of public domain software in security-related applications versus the application of commercially available software. The following is my personal opinion, based on the security experience I've gained in the field and a few general security principles.
I prefer the application of public domain software over the commercially available ones. The public domain software is often of a comparable high- (or higher) quality level as the commercial solution. This is because the people who write the public domain software have no marketing-induced deadline to meet and are not forced to bring a product to the market before a competitor does. This situation results in a public domain package that is technically more mature than a (relatively new) commercial product. Sometimes the commercial product is even a repackaging of a public domain program, including the much needed support. So, in my opinion, a public domain package can stand the test with a commercial product with respect to the technical merits.
One drawback to the application of public domain software, however, is the lack of a clear and well-established support organization. Who do you consult when a bug is discovered in the software, or worse, when a security hole is discovered? You may try the author of the package, but that is no guarantee. More companies are now offering commercial support for public domain packages, so maybe in the near future these companies will support the security-related software as well. A related issue concerns a feature called vendor-lockin. Most commercial packages are very specific (i.e., they only work on a predefined set of platforms) and are only interoperable with itself. So if you need to connect a number of different systems in a number of different ways, the better choice is a public domain package.
Another, more general, argument is that the company using the software is solely responsible for its own security. Buying a commercial product will not relieve you of that burden. Imagine that someone has succeeded in breaking through your firewall, will you then be able to claim your damages to the supplier of your firewall software? Most probably not, because they only sell you the prepackaged product and you must configure it to your own security needs. The question of public domain versus commercial software then boils down to a question of trust.
Do you trust your supplier to have done his job right, or do you want to know what is going on in terms of security? If you want to know what is going on, you need access at least to the source of the applicable software, and not surprisingly, most public domain software is available in source form. Both solutions have their merits. If you have chosen one and it suits your needs, please do not change it (this is the principle of "never change a winning team"). For the reasons explained above, I will show in this article how you can use Linux and SSH (Secure Shell) to build a virtual private network.
Some Basic Principles A virtual private network (VPN) is based on two principles, which can be deducted from its name. The term "virtual" implies that it has no physical representation (like virtual reality, which has no real physical representation but exists only in a computer, or your mind). This needs a bit more explanation, since there is a real network involved. The virtuality lies in the fact that there is no dedicated network cable between the locations that you interconnect. You do not have a 2000 mile UTP 5 cable running from your east coast to your west coast office. Instead, you use an existing network, which is probably used by a lot of other users and companies. There is physical cabling involved, but it is not dedicated to your network connection alone. And, it may be made up of a great number of different local networks, each with their own specific cabling. The real virtuality lies in the fact that you create a network over this very diverse environment, as if it were made just for you.
The "private" part stems from the virtual part. Building a virtual network over a publicly accessible infrastructure does carry security implications. You do not want your sensitive corporate information to be read by other users, or worse, by your competitors. So you must establish a means of keeping that information private. The only way to do this is by applying a form of encryption, so only trusted parties can access that information. This carries some legal implications, however (see sidebar on "Encryption on Public Networks"). The second aspect is authentication (or identification if you will). Is the peer on the other side of your VPN really the peer they are claiming to be? You need a reliable way to establish their identity. For this, you can also use the encryption software.
Network Integration The first thing to do is integrate the different local networks of the remote offices into one big network. This integration must create a situation in which access to the different parts of that integrated network is completely transparent. To achieve this integration, you will need to solve a number of problems related to assigning IP addresses and routing traffic from A to B.
As you may know, each IP address on the Internet must be unique. It is the only way to identify a host, and using the same address for different machines leads to unpredictable results. (It is a nice test to try this on a local network once). This principle of uniqueness only applies to the hosts that can be reached directly via the Internet.
In most cases, an address is assigned by your ISP (Internet Service Provider). This ISP is the company that connects you to the Internet and takes care of routing all the traffic destined for your IP address (or addresses) to your router. The address can be fixed or dynamic. If it is fixed, you will always use the same address, which is convenient for configuration. Fixed addresses are normally used for dedicated Internet connections that are active 24 hours a day. If an address is dynamic, it changes each time you make contact with your ISP. The ISP normally takes the first free address from a pool and gives that to you for the time that you are connected. Dynamic addressing is normally used only for dial-up connections.
If you have allocated an IP address range (e.g., a complete B or C class) for your company, the situation is a bit different. The range you have been given is unique on the Internet and you can safely use it for building your connection. However under normal circumstances you cannot use one IP address for connecting one office and another address from the same range for connecting another office. This restriction is imposed by the way IP packets are routed across the Internet. This routing is normally not based on the actual IP address, but on the range the IP address is part of. So, all packets for that range are sent to one router, and IP packets for another range are sent to another router.
There are restrictions to using IP addresses on the Internet, and therefore most companies have adapted the strategy of freely choosing a range of IP addresses for internal use, which is perfectly legal as long as you ensure that these internal addresses never reach the Internet. This internal range is chosen so that it can easily be divided into a number of separate groups. This division is set up so that it is very easy to route traffic from one group to another. Each office is allocated one of these groups, which makes it easy to route traffic between these different offices, and equally easy to shield different groups from each other.
If your offices are connected via dedicated links, these IP addresses can be used without any problem. The traffic will never reach the Internet, so you can choose any IP address you like. However, if your connection is established via the Internet, you run into a problem. The addresses you choose cannot be used on the Internet; they might be in used by another system (which has been allocated that address for use on the Internet), and the routing on the Internet most certainly will send your traffic to the big bit bucket instead of the real destination.
To overcome those problems, you use a technique called tunneling. This means that the complete IP packet you want to send to your peers is packed into another IP packet that has a "legal" IP address. Tunneling is not restricted to the TCP/IP protocol. You can tunnel a lot of different protocols in an IP packet (e.g., Netware can be tunneled in an IP packet, so you can establish a Netware link from a client to a server over the Internet). However for this article I will discuss tunneling IP packets in IP packets, or IPIP tunneling. (See sidebar on "Other Considerations.")
Figure 1 shows the principle of IPIP tunneling. A message from host A needs to be sent to host B. However, the A and B addresses are for internal use only, and our Internet connection uses address X, while our peer uses address Y. In the tunnel, the first IP packet is stored in the data or payload area of the message that will travel via the Internet. Once the message is received on the other side, the real IP packet is retrieved from the payload area and sent across the local network.
You could say that a tunnel has been established between the two separate networks through which the network traffic can be sent. This tunnel is not restricted to messages from A to B, but can be used for all traffic that is sent between the one local network and the other. On the Internet, this looks like traffic between the hosts X and Y.
There are a number of ways to implement a Linux machine as a tunneling/detunneling box. I will show you two possible solutions and expand on one of these.
Transparent Proxies The newer Linux kernels (at least the 2.0 series) have extensive packet filtering capabilities. I have already written an article on that subject (Sys Admin 5(6); 12-35), so I will not go into too much detail about that. However, the newer kernels now support a feature called "transparent proxying," which can help in building a tunneling system.
A proxy is a dedicated network program that will relay data from one side to another and back. This relaying is done in the proxy program and thus happens in user space. This means that the proxy will receive the data in one of its own buffers, (possibly) copy it to another buffer while doing something intelligent, and then send that buffer across another network interface to its destination. Because this outgoing message is sent "manually" from the outside interface, the network packet will have the IP address of the external interface as its source address. A packet that has been forwarded by the kernel normally has the original IP address as its source address. In this way, the internal IP address has been conveniently transformed into the external IP address. Furthermore, all kinds of intelligence can be added to our proxy, such as user authentication and compression.
The one problem when using a proxy is getting your data from your internal network to the proxy. There is (was) no really portable way to do so. When you wanted to telnet through a proxy, you first had to telnet to the proxy port, possibly authenticate yourself, and then connect to your final destination. A Web browser could be configured to automatically use a proxy (originally intended for caching purposes), but most other applications that communicate using the TCP/IP protocol could not be so easily adapted.
Luckily, the new Linux kernel can be configured to drop all traffic for a specific port and/or IP address at a network port on the same machine. You can configure this by using the ipfwadm program (see Listing 1).
You use ipfwadm to define the packet filtering rules for the firewall code in the Linux kernel. If you combine an INPUT rule with the -r option (transparent proxying), the kernel will deliver all network traffic that matches that rule to a network port on the machine. And if you have a proxy listening on that port, it will thus "automagically" relay that network traffic. For example, the following rule will tell the kernel to deliver all outgoing telnet traffic to a transparent proxy that runs on the firewall and listens on the telnet port.
ipfwadm -I -a accept -P tcp -r -S localnet/24 -D 0.0.0.0/0 telnet
When modifying the destination address (specified after the -D option), you can redirect only specific connections through your proxy.
To enable transparent proxying (in a 2.0 kernel at least), you must specify a number of specific options when building the kernel. You will need to enable the IP firewalling code and the transparent proxy option to enable this feature. Although this solution, seems very well suited for our purposes, there are some drawbacks. First of all, you will need to write these transparent proxies yourself. The implementation is very Linux-specific, and each network protocol must be proxied in a different way. Also, when using a proxy you will lose the original IP address from the message. This means that any security measures on the other side are useless; they only see the IP address of the proxy and not the IP address of the original sender.
Using a Real Tunnel Linux (at least from the 2.0 kernel onwards) also offers real tunneling facility. You can enable it in the kernel when configuring it. You will have to enable a number of other options as well. These are listed below:
Network firewalls (CONFIG_FIREWALL) [N/y/?] y
IP: forwarding/gatewaying (CONFIG_IP_FORWARD) [N/y/?] y
IP: firewalling (CONFIG_IP_FIREWALL) [N/y/?] (NEW) y
IP: tunneling (CONFIG_NET_IPIP) [N/y/m/?] (NEW) m
When you build the tunneling facility as a module (indicated by the m in the last option), you can insert and remove the module at will on a running system. This makes it easier to configure and test.
Once you have built the kernel and its modules, you will have one module for tunneling, new_tunnel.o. This module needs to "talk" to another tunneling module on a Linux machine on the other side. The sending module will take care of the multiplexing, and the receiving module will handle the demultiplexing of the messages. We will use this feature for building our VPN.
Configuring the Tunnel After building the kernel and its associated modules, you need to reboot your system to activate the new kernel. After doing that, you can use the insmod, rmmod, and lsmod commands to insert, remove, or list the kernel modules.
So after inserting the new_tunnel.o module with the insmod command, you can check its proper loading by doing an lsmod. You should see output like this:
Module: #pages: Used by:
new_tunnel 1 0
And when you do a cat /proc/net/dev, you will see that you have a new networking device called tunl0:
Inter-| Receive | Transmit
face | packets errs drop fifo frame|packets errs drop fifo colls carrier
lo: 18 0 0 0 0 18 0 0 0 0 0
eth0: 3347 0 0 0 0 4009 0 0 0 19 0
tunl0: 0 0 0 0 0 0 0 0 0 0 0
The tunl0 device can be used as a "normal" networking device, and the ifconfig program can be used to configure and activate this device (or interface).
Consider, for test purposes, two networks that are both connected to the Internet through their own local ISP. These two networks should be connected transparently (see Figure 2). The network on the left has address 193.78.174.32 with subnetmask 255.255.255.224, and the network on the right has network address 193.78.174.64, also with subnetmask 255.255.255.224. Each network has a firewall (T1 and T2) that is used for connecting to the network, and the address of each outside interface is shown. The T1 firewall has internal address 193.78.174.33, and the T2 firewall has internal address 193.78.174.65. I assume that all "normal" network configuration has been done and will only show you the things you have to do to get the tunneling up and running.
On T1, you will need to configure the tunl0 interface (after inserting the module of course) like this:
ifconfig tunl0 193.78.174.33 netmask 255.255.255.224 up
The tunnel interface has the same IP address as the "real" internal interface. Once the interface is configured, you can add a static routing entry that will forward all traffic for the 193.78.174.64 network through the tunnel:
route add -net 193.78.174.64 netmask 255.255.255.224 gw 208.34.1.12 tunl0
This command adds a static route for network 193.78.174.64. All traffic for that address is sent to gateway 208.34.1.12 (the outside IP address of T2), through device tunl0.
On T2, the mirror image of this configuration is created:
ifconfig tunl0 193.78.174.65 netmask 255.255.255.224 up
route add -net 193.78.174.32 netmask 255.255.255.224 gw 194.21.134.23 tunl0
After configuring both sides of the tunnel like this, all traffic from one network to the other will be automatically multiplexed and demultiplexed by the tunneling module. By adding these static routes, all of this is done transparently without intervention of the user.
We have now built the tunnel and can communicate between the two networks. However, all data are sent in the clear, and can be read by anyone who has access to the packets. The next paragraph will describe a way to solve that problem as well.
Privacy As stated in the previous paragraph, the network traffic now flows between the two sites, but the data are not encrypted yet. For this purpose, we can use another public domain package, SSH. It stands for Secure Shell and is capable of both authenticating the communicating parties and encrypting the data that flows between them. You can find SSH on the Internet at:
http://www.cs.hut.fi/ssh/
Note, however, that SSH uses the RSA algorithm, which is patented in the United States. You will need to check your local legislation to see whether you can use it unmodified or whether you need to use the RSAref library instead of the SSH supplied one.
SSH was originally written by Tatu Ylönen of Finland and is now commercially maintained (check out the Web page for details). The features of SSH that are important to our tunneling are listed below:
Strong authentication. authentication methods: .rhosts together with RSA-based host authentication and pure RSA authentication.
All communications are automatically and transparently encrypted. Encryption is also used to protect against spoofed packets and hijacked connections.
Arbitrary TCP/IP ports can be redirected over the encrypted channel in both directions.
Client RSA-authenticates the server machine at the beginning of every connection to prevent trojan horses and man-in-the-middle attacks. The server authenticates the client machine using RSA before accepting .rhosts or /etc/hosts.equiv authentication.
You can even redirect X11 sessions through SSH to prevent anyone from taking over an X session. Most importantly, the session is encrypted between the client and server, and there is no need for the tunneling server to know the encryption. SSH will encrypt the payload area in the original IP packet, and the receiving end will decrypt the payload area and hand it over to the application.
SSH builds on most UNIX implementations, and it works across multiple platforms. You can have a Linux machine on one end and a SPARC or HP machine on the other end. Once you have downloaded SSH, you can follow the instructions included in the package to build it.
After building SSH you end up with a number of programs, the two most important of which are ssh and sshd. You can compare these two with the rsh and rshd programs originally found in the BSD pedigree of UNIX.
Configuring SSH
sshd is a server program that runs in the background and listens for incoming connections (on port 22). When a client connects, it is authenticated and if it is found to be reliable, sshd normally starts the program that is requested by the user. The input and output of that program are encrypted using the key that has been exchanged during the negotiating phase of the connection build up.
The ssh program is the client program that is used to execute a program on a remote host. ssh normally connects to sshd on the server machine, exchanges authentication information and an encryption key, and then has sshd execute the specified command on the remote machine. While executing, all data exchanged between ssh and sshd is encrypted using one of a number of algorithms.
For our purposes, ssh offers a few extra options that can be used to encrypt traffic through the tunnel. These options are specified in the manual page of ssh (see ftp site: ftp.mfi.com in /pub/sysadmin) and are the -L and -R options. Either of these options can be used to redirect traffic for a specific port to a port on a remote machine. This redirected traffic is encrypted before it is sent across the wire and decrypted on the other end before it is sent to the destination host. The -L option is used if you want to redirect access from a local port to a port on a remote system. Figure 3 shows a possible setup. The client machine wants to communicate securely with the server machine. To be able to do this, a machine on the same local network as the client (this might even be the same machine as the client) runs ssh with the -L option. The complete command line for starting this ssh looks like:
ssh -L portx:server:porty <name of machine running sshd>
The machine running ssh will start a subshell that listens on port x, and whenever a connection is made to that port, the ssh program will encrypt and redirect the data to the machine running the sshd. This machine will then decrypt the data and relay it to port y on the server.
It is possible to enable this redirection remotely by using the -R option. With this option, you tell the sshd on the remote machine to startup a ssh with the appropriate -L option. The remote side will then redirect all traffic to that port through the secure connection between ssh and sshd.
It would be possible to build a tunnel with just SSH. However, this is only possible when all systems involved can be accessed directly from the Internet. If that is not possible, you can resort to the tunneling facility of Linux. When installing SSH, it will automatically generate the host identification pairs for your machine, so you do not have to worry about that.
Combining SSH and Tunneling When combining SSH and tunneling, we run into a bit of a snag. When using SSH, you lose some transparency. You probably do not want to run ssh and/or sshd on each machine of your local network; this will be too time consuming with respect to maintenance, and SSH does not run on each and every platform. The best place then to install ssh and sshd is the tunneling machine. Once you have done that, you can use the ssh -R command to redirect ports on machines on the remote network to ports on machines on the local network. You do this on each tunneling machine on the network. You might even start up these ssh's when you start up your tunneling machine.
When specifying the ssh -R command you must specify the IP addresses (or host names if your local name server has a copy of the name mapping on the other network) as used on the other network. (Referring to Figure 2, you would use IP addresses in the 193.78.174.0 range. You would not use the IP addresses of the outside interfaces of the firewalls.) By using these addresses, you will force the encrypted connections through the tunneling module.
When you want to use an encrypted connection from a client on one network to a server on another network, you must connect through the redirected port on the tunneling machine as specified in the ssh -R command issued by the remote tunneling machine (Figure 3). So, you cannot transparently connect directly to the remote server to build an encrypted connection, but instead use a port on the local tunneling machine.
This is a bit inconvenient, but is a small price to pay for a secure network connection that will prevent unwanted snooping of the data.
There is one other security risk involved with this setup. You need to be aware of it so you can decide for yourself whether this risk is acceptable or not. SSH will only encrypt the payload area of the original IP packet and not the header information. (It could not even if it wanted to, since that header information contains the IP address and port fields among other things. If it encrypted these fields, it would render the IP packet undeliverable.) The tunneling module will encapsulate the original IP packet in the payload area of the message that will travel across the Internet. The original header is included in the encapsulated message, and this original header contains the IP addresses as they are being used on the inside of your firewall. Revealing that information might make you vulnerable to an IP spoofing attack, since an intruder could see what kind of IP addresses are trusted by your firewall. This is another reason why the tunneling should be done by a separate machine. The firewall now only needs to trust that one tunneling machine and can reject all connections from other local machines, thereby reducing the risk of an IP spoofing attack.
By the way, both of these problems can be solved by building the authentication and encryption in the tunneling module itself. The tunneling module should then encrypt the payload area of the IP packet that will be sent over the Internet. By encrypting this payload area, the module will automatically encrypt not only the original data, but the original IP headers as well, thereby hiding the internal IP addresses. Le Reseau has developed a firewall product, called WriteProtect, that uses an encrypting tunneling module that solves both of the problems mentioned in this paragraph and allows for a truly transparent encrypting tunnel.
Conclusions Using the software currently available on the Internet, you can easily build your own virtual private network. By using the tunnel, you can hide the internal IP addresses from the Internet. One advantage is that you are able to choose your own IP range for internal use and can even give each office its own IP address range. All of these features are built into Linux.
By using another package, SSH, you can even encrypt the data that are exchanged between the different sites (barring a few minor drawbacks). By using the proper encrypting algorithms, you can both secure your data and authenticate the identity of the communicating parties. By having a dedicated tunneling machine, you can tighten the security on your firewall and make sure the different networks can properly interoperate.
About the Author
Arthur Donkers graduated from the Delft University of Technology with a degree in Electrical Engineering and a major in Computer Architecture. Since then, he has worked for a number of software houses in the Netherlands and participated in several major projects. His primary field of interest in these projects has been datacommunications, especially the security aspects involved. He now is the founder and owner of Le Reseau, a company specializing in security-related issues for Unix, OpenVMS and Windows NT, and the application of Linux in corporate environments.
|