Cover V13, i05
may2004.tar

The OpenLDAP Proxy Server

Reinhard E. Voglmaier

Most people think of proxy servers only as servers that access resources on behalf of their users, but they can do much more. In this article, I will discuss LDAP proxy servers in particular and describe the functionality of this type of proxy. They can add access control, serve resources to their users, verify that users are who they claim to be, restrict access to resources, and rewrite requests using regular expressions. LDAP proxy servers also provide attribute mapping; this means they can map one attribute to another or hide an attribute altogether. These servers are frequently used for load balancing and fault tolerance, and can also have a cache to store results of frequently requested queries.

Another strong point of LDAP proxy servers is that the proxy can hide the complexity of the directory architecture from the end user. This allows administrators to offer different views of the same directory to different users. All these services, obviously, come at a cost, so it's no wonder that software vendors offer their LDAP proxy implementations for a high price.

Fortunately, there is an open source choice -- OpenLDAP. The OpenLDAP implementation is based on the former "University of Michigan LDAP Server" and can be downloaded along with complete documentation from the official OpenLDAP Web site [1]:

http://www.openldap.org
The OpenLDAP group, headed by Kurt Zeilenga, is the main player in the development of the LDAP standards and provides a reference implementation of a LDAP server along with OpenLDAP. The Web site offers a searchable archive of how-tos, links to related software projects, and a number of LDAP-related discussion groups.

In this article, I will give an overview of what LDAP proxy servers do, then jump directly to the architecture of the OpenLDAP implementation and, in particular, how the proxy fits into this design. I will also explain how to build and configure the OpenLDAP proxy. I will conclude with a few real-world scenarios. I think that the OpenLDAP framework is applicable not just to little projects with a small budget; it also scales very well, and I've done performance tests with several hundred thousand entries.

You can find out more about the OpenLDAP proxy server in the documentation shipped with OpenLDAP; detailed instruction can also be found in The ABC's of LDAP (Voglmaier, Auerbach Publications [2]). This article assumes you are already familiar with the LDAP protocol. All the configuration files can be downloaded as usual from the Sys Admin Web site (http://www.sysadminmag.com).

The OpenLDAP Proxy Server

As stated previously, an LDAP proxy server accesses services (in our case, LDAP services) on behalf of a client's request. This architecture is used frequently if the user is behind a firewall and wishes to access resources outside, normally on the Internet. More generally, the LDAP proxy provides a way of giving controlled access via the LDAP protocol to resources outside the actual domain; therefore, you may use it to join different domains in your intranet (e.g., different LANs located in different countries of your enterprise intranet).

Every LDAP server consists logically of two parts: a frontend and a backend. The frontend speaks the LDAP protocol [3] with the LDAP clients; meanwhile, the backend accesses the repository actually holding the data. Figure 1 shows the OpenLDAP architecture. The frontend speaks the LDAP protocol and contacts the backend upon the client's requests. The backend actually provides the data.

This architecture offers enormous flexibility. You can access different data stores just using different backends. If you need to access a different data store, not yet contained in the OpenLDAP framework, you can roll your own backend (and, I hope, make it available to the open source community).

OpenLDAP itself is shipped with a number of backends. The most frequently used ones seem to be the database backends BDB and LDBM. There is not much documentation available about which of these performs better under which conditions, so I won't address that here. OpenLDAP offers a backend that can store data in a RDBMS. You can also store the data in flat files using a shell backend or Perl functions. More about the available backends can be found in the documentation shipped with the OpenLDAP server.

Finally, there are the two backends this article is all about: the "ldap" module and the "meta" module. Both modules provide proxy services -- the "ldap" module is the basic module, and the "meta" module works on top of the "ldap" module and offers more sophisticated proxy services. The idea is to use the backend not to access a repository directly but to contact another LDAP server that holds the data. Figure 2 shows this architecture.

The OpenLDAP Proxy Cache

Directory services are designed for fast data retrieval; the speed of update processes, however, is not an important issue, since a directory is not meant to be updated frequently. To speed up searches, you can use the query caching extension of the OpenLDAP proxy server. This extension holds the result of a query in a cache in memory and looks at each query first to see whether it could recycle previous query results. Figure 3 shows this architecture. The "Cache Manager" makes the decision as to whether the result of a previous query can totally or partially be reused to answer the current query. This solution is new with the OpenLDAP 2.2.x version.

Unlike static Web content cached by Web proxy servers, the entries cached in an LDAP proxy server are obtained from queries. Different queries can produce similar result sets. The result set of a query B can also be a subset of a result set from previous query A. Query B is said to be contained in query A. The "cache manager", therefore, is responsible for understanding this "query containment". You can find more about query containment in the resources by Apurva Kumar [4-6].

The job of the cache manager is therefore to handle the following:

1. Query Containment: Determines whether the current query is contained in a previous one.

2. Cache Replacement: As new query results are added to the cache, it continues to grow. Once the cache size grows over a high-threshold watermark, the cache manager begins to cancel cache entries. It does this following a simple "least recent used" (LRU) policy. The oldest query and entries belonging to this query are cancelled. The cache manager does this until the cache reaches a low-threshold watermark.

3. Consistency Control: The cache content should not be held forever; the cache manager uses a policy to decide when to clean up results from the cache. OpenLDAP uses a parameter "TTL" (Time to Live) that allows you to configure how long items should be held in the cache.

Please note that the cache feature is still considered a work in progress, at the time of this writing not every feature works as you may expect from the documentation. Also, the documentation for the cache features is not yet complete -- some changes may still be made.

Installing the OpenLDAP Proxy Server

Now that we have seen a lot of theory, let's begin with practical work. As I mentioned previously, after you unpack the OpenLDAP distribution, you must compile your server. Many operating systems already contain a ready-to-use version of the OpenLDAP distribution. This comes in handy, but will not give you the latest release and will not necessarily result in a system tailored for your needs. Therefore, I suggest you compile it yourself. I will limit my description to the Unix platform, but note that there is a group porting OpenLDAP in the MS world.

As with most open source programs, the first step is to personalize the distribution via the "configure" script followed by a number of parameters. You do this to adapt the installation to your particular environment. Then, you type make depend and, finally, make. There is also a make test available, which runs a number of tests against your compiled binaries. The make test, however, works well only if you compile the complete distribution. In this example, it won't work, because we won't compile in things we don't need.

The parameters determine the configuration of your LDAP server; the configure -help command gives a listing with all available options. I will not cover the compilation of the LDAP server in general; you can find more information in the OpenLDAP documentation. Listing 1 shows the parameters used to customize the compilation of an OpenLDAP proxy server with the functionality required for this article. The meaning of each of these parameters is as follows:

--prefix -- Prefix where to install the software

--enable-slurpd -- Enable build of replication daemon

--enable-bdb -- Enable Berkeley DB backend

--enable-ldbm -- Enable DBM database backend

--enable-ldap -- Enable LDAP backend

--enable-meta -- Enable metadirectory backend

--enable-rewrite -- Enable DN rewriting in back-ldap and back-meta

--with-proxycache -- Enable caching

Note that the configure command shown pulls in a number of modules that may not be required for a simple proxy server. In fact, the simplest configuration would be just to use the --enable-ldap=yes switch, which would provide a simple proxy without any rewriting or cache functionality.

I recommend saving your configure command in a file (e.g., ConfigureProxy.sh), so you can reuse it later and keep track of how you compiled the LDAP server. Next, launch the configure script, the make depend, and the make commands. If everything proceeds successfully, install the server with the make install command.

Using the OpenLDAP Proxy Server

Now that we have successfully installed the OpenLDAP proxy server, let's look at a simple example. We will configure the proxy server simply to forward all requests to the LDAP server holding the data. Listing 2 shows the slapd.conf file configuring this behavior. As you can see, the configuration is very easy.

Next, we need to declare that we are using the LDAP backend. In the second line, we define the baseDN that the proxy server should handle. And at the end, we tell the LDAP proxy the uri that it must contact for requests to this baseDN. (The syntax of the uri follows the LDAP standard uri as defined in LDAP URL Format [4].) The proxy sends every LDAP request that has the baseDN "dc=LdapAbc,dc=com" to the LDAP server installed on host LdapServer.LdapAbc.com and listening on port 349. Meanwhile, requests for the baseDN "dc=LdapAbc,dc=org" are sent to the LDAP server installed on LdapServer.LdapAbc.com but listening on port 749. All requests (i.e., search, modify, add, and delete) are proxied.

As you may notice, you can define more than one backend, which allows load balancing. You can install different subtrees of the directory on different servers and glue them together with the OpenLDAP proxy server. You could have, for example, an OpenLDAP server, a Sun ONE server, and a Novell LDAP server, and speak to these three different servers via the OpenLDAP proxy server. The client does not even notice the complexity behind.

Mapping Attributes and objectclasses

Until now, the proxy passed every request straight through from the LDAP client to the LDAP server, and every response straight back again from the LDAP server to the LDAP client. We can, however, map attributes and objectclasses in such a way as to give the client a different view of the data hosted on the LDAP server. See Listing 3. Here we want to provide only the name, the surname, and the telephone number; the rest of the attributes should be hidden from the client. The attributes "givenName", "sn", and "telephoneNumber" are mapped to themselves in the first three lines, "cn" is mapped to "sn" (line 4), and every other attribute is removed with the last instruction:

map attribute *
We can also map entire objectclasses. The following instruction:

map objectclass groupOfNames group
maps the groupOfNames objectclass used by the LDAP client to the objectclass "group", which is used, for example, by the Active Directory of Microsoft.

The Rewrite Engine

The rewrite engine is a very powerful tool. Consider the example configuration shown in Listing 4. The suffix "dc=LdapAbc, dc=org" now points to the same LDAP server that "dc=LdapAbc, dc=com" points to. To begin, we need to switch on the rewrite engine. Then, we must provide the rewrite rule; in this case, we change the "dc=org" domain to the "dc=com" domain that the directory server really hosts. The syntax is very similar to that used by the well-known Unix tool sed; indeed, the rewrite engine uses regular expressions. You will notice also the "rewrite context" clause; let's see what this means.

Every LDAP server executes a number of operations (bind, search, compare, add, etc.). Each of these operations can be executed in a particular context. The operations, and thus the context are divided into client->server and server->client operations. You can define a rewrite rule individually for every such context. The abovementioned operations are all client->server operations (i.e., the data flow is from the client to the server). A server->client operation is shown in the last lines of our example configuration file -- searchResult. The default context is a general context for server->client operations. If we didn't configure the searchResult context, the results would return the "com" suffix and not the "org" suffix we requested. Therefore, we rewrite the results, and the client does not see that it has been served from the "com" domain.

The rewrite engine can also do much, much more. Explaining all of its functionality, however, is far beyond the scope of this article. Please consult the OpenLDAP documentation for more details.

The Proxy Cache Engine

The proxy cache extension is available from version 2.2.x. The proxy cache engine works with every backend; however, it's intended to be used with the "ldap" or "meta" backend. The proxy server holds its cache in a special database on disk. Thus, you must configure which database you want to use (LDBM, BDB, or HDB). Listing 5 shows an example configuration file.

Let's look at the details. There are four new instructions that configure the behavior of the cache. The first instruction is: "overlay proxycache"; it's just the beginning of the configuration of the proxy cache.

The second instruction is the proxyCache instruction. The syntax is:

proxyCache <DB> <maxentries> <nattrsets> <entrylimit> <period>
where DB is the database type you will use (BDB, LDBM, or the recent newcomer, HDB), maxentries is the maximum number of entries the cache may hold, and nattrsets is the maximum number of attribute sets that will be defined (you define them with the proxyAttrset parameter shown later). Entrylimit is the maximum number of entries of a cacheable query, and, finally, period is the time in seconds that each consistency check is executed (i.e., the cache manager controls for every entry whether it's time to live is over).

The third instruction needed is the proxyAttrset instruction. It has the syntax:

proxyAttrset <index> <attrs . . .>
where index is the index that identifies the set and attrs is a number of attributes. As the name suggests, it defines an attribute set. You will define as many different sets as you have declared in the proxyCache statement. The proxyAttrset is used to define cacheable templates. Let's look at cacheable templates together with the next instruction, the proxyTemplate.

Remember when I mentioned query containment? The more specific query is contained in the more general one; for example, the query ("sn=Voglmaier") is contained in the query ("sn=Vogl*"). Or, a search in the subtree "ou=people,dc=LdapAbc,dc=org" is contained in the search in the subtree "dc=LdapAbc,dc=org". To simplify things a little, the OpenLDAP group uses the concept of templates to determine query containment. This is done with the proxyTemplate instruction. A template is just a search filter that does not have an assertion value. Examples are:

Filter: (sn=Voglmaier), Template: (sn=),
Filter: (&(sn=Vogl*)(givenName=Rein*)), Template: (&(sn=)(givenName=))
The proxyTemplate has the syntax:

proxyTemplate <prototype_string> <attrset_index> <TTL>
where the prototype string is the template mentioned above, the TTL is the time to live (i.e., the time an entry may be kept in the cache without being used), and the attrset index, indicating the attribute set, is the same as defined before. Therefore, the attribute set and the proxyTemplate together define which entry is to be stored for how long in the cache.

Enough theory -- let's look at the example in Listing 5. Here we use an LDAP backend; the suffix is our usual LdapAbc group, and we will contact the LDAP uri we used in the example previously. Thus, the first three lines are the ones we just have seen. The overlay proxycache does nothing other than begin the definition of the whole thing. The proxycache instruction defines a BDB cache that holds up to 100000 entries. Furthermore, the cache holds only one attribute set and caches only results that return less than 1000 entries.

Every 100 seconds, the cache manager checks whether any of the entries in the cache are older than their TTL permits; if so, those entries are cancelled. The proxyAttrset defines the only attribute set (with index 0) our cache will hold. We have two templates combined with this attribute set -- one searching for the sn, the other combining sn and givenName with an "&" relation. From the cachesize instruction on, there are instructions to configure the BDB database that are therefore BDB specific.

Chaining

I will close this article mentioning a further important concept used in LDAP. When an LDAP server is asked about data that it does not hold, but it knows who does hold the data, the server will return this information to the client in the form of a referral. This technique is used, for example, in partitioning the LDAP directory tree. One reason for partitioning may be to improve performance. Instead of sending a referral to the client, the directory server could also proxy this request to the server holding the data. This technique is called "chaining". To do this, you must configure the proxy backend to all subtrees you will proxy, and the subtrees must be configured before their parents. Listing 6 shows an example configuration file.

Conclusion

In this article, I presented the OpenLDAP proxy server and its configuration. I showed that it is not only useful to let queries go out of domain boundaries, but that it can do much more. I described how the proxy can be used for load balancing as well as for rewriting LDAP queries and LDAP answers. I also briefly discussed chaining, showing that it can be combined with other backends as well.

OpenLDAP can be a powerful tool used in very different situations to address problems that can be difficult to solve another way. And, because the OpenLDAP suite is open source, it is free, including the source code. So, if you run into problems, you can consult the source code yourself and are free to extend the software to better fit your reality. If you have any doubts or questions, as always, I will be glad to help you.

Resources

1. OpenLDAP project -- http://www.openldap.org

2. Reinhard E. Voglmaier, The ABC's of LDAP, Auerbach Publications, November 2003

3. Lightweight Directory Access Protocol (v3), RFC 2251 -- http://www.ietf.org

4. The LDAP URL Format, RFC225 -- http://www.ietf.org

5. Apurva Kumar, IBM India Research Lab: The OpenLDAP Proxy Cache -- http://www.openldap.org/pub/kapurva/proxycaching.pdf

6. Draft-apurva-ldap-query-containment-01, Apurva Kumar: Schema to Support Query Containment in LDAP Directories

Reinhard Voglmaier studied physics at the University of Munich in Germany and graduated from Max Planck Institute for Astrophysics and Extraterrestrial Physics in Munich. After working in the IT department at the German University of the Army in the field of computer architecture, he was employed as a Specialist for Automation in Honeywell and then as a Unix Systems Specialist for performance questions in database/network installations in Siemens Nixdorf. Currently, he is responsible of LDAP Services at GlaxoSmithKline, Italy. He's also the author of The ABC's of LDAP (Auerbach Publications, November 2003). He can be reached at: rv33100@gsk.com.