This report is an comparison of two message passing systems available for use in the Linux kernel, kORBit and SunRPC. Both systems are compared under various headings and their usage demonstrated with an identical example - the Remote Device.

Corba in the Kernel?

Mark McLoughlin


Acronyms Used

Application Programming Interface.
Common Object Request Broker Architecture.
Common Data Representation.
Cyclic Redundancy Code.
Concurrent Versions System.
Distributed Shared Memory.
Entry Point Vector.
Interface Definition Language.
Network File Service.
Networks Of Workstations.
Object Identifier.
Object Request Broker.
Portable Object Adaptor.
Proxy File System.
Remote Procedure Call.
Symmetric Multi-Processor.
Tranmission Control Protocol/Internet Protocol.
External Data Representation.


Project Overview

Computer industry trends suggest that, in the future, large server systems will tend to be built from a collection of shared-memory multiprocessors connected together using high speed interconnect such as Myrinet[1]. Cluster computer systems such as these will provide large amounts of computing power and high availability. Given that these systems will be built using off the shelf commodity hardware, they will also be very cost effective.

The design of cluster computer systems is still a subject of research, and the optimal organization of such a system is by no means decided. One promising possibility is to extend an existing operating system so that each individual node in the system can co-operate to provide the user with the illusion that the cluster is a single computer running the existing operating system[2]. At the core of such a tightly coupled system would be the subsystem which enables nodes to communicate with one another, namely, the Message Passing Subsystem.

This report focuses on the design and implementation of an effective message passing scheme for the Linux operating system as a first step towards the provision of a scalable and highly available clustering solution for Linux. Linux was chosen as the development platform because of its modularity, its robustness, its open-source license and, not least, because of the number and varity of cluster projects for linux.


This project aims to investigate the existing kernel-level message passing schemes1.1 available for Linux, kORBit and SunRPC.

The practical work for this project involves implementing an example application of each of these schemes. The application should be one that demonstrates how these schemes may be used to implement some of the features required for a distributed operating system. The example developed for the project is a device driver which allows the user to interact with a physical device attached to another computer in the cluster. The device driver provides to the user the same interface provided for local devices.

The two message passing schemes, kORBit and SunRPC, are compared and contrasted under various headings to determine how suitable they are for use as the main message passing system for a Linux based cluster operating system. Benchmarking results showing the latency of the two systems are considered.

Finally, a conclusion is drawn on which, if any, of these systems are robust enough to be used in a distributed operating system.

Cluster Operating Systems


Computer industry advances today are dominated by a tremendous growth in small desktop systems because it is only these systems that provide the economies of scale to justify the massive on-going investment in architectural research. So that large scale systems may benefit from these innovations, desktop systems should be exploited as a building block for the implementation of high performance computing solutions[3].

Research into how a system built from a collection of desktop computers might be organised is ongoing. However, research in this field can generally be broken out into two extremes - loosely coupled systems and tightly coupled systems.

A loosely coupled distributed system is one where each node(computer) in the cluster works pretty much independently from other nodes in the system. When communication between nodes is required, standard operating system protocols are used. There is, however, little operating system support for clustered applications or for administering the system.

The other extreme, a tightly coupled distributed system is one that uses the same abstractions as conventional operating systems. Here, the nodes in the system co-operate to give all processes in the system a coherent view of memory across the nodes[4] - this is known as distributed shared memory(DSM)

Linux Clusters

Clustering is very popular within the Linux community. In the recent past many individual Linux based cluster products and cluster-related projects have been formed.

Solaris MC - the Inspiration

Solaris MC is a prototype distributed operating system developed at Sun Microsystems Laboratories. The system extends the Solaris UNIX operating system using object-oriented techniques[8].

Solaris MC is designed to provide to the user a single-system image[2]. This is where a cluster of computers appears, to all intents and purposes, to be a single computer running the Solaris operating system. The fact that the system is built out of many nodes should be completely invisible to the user. By maintaining the same binary interface as Solaris, existing applications can be executed unmodified.

Solaris MC is also designed to be highly available. Since each node runs a separate kernel, failures can be contained to that node. The failure of one node should not cause the entire system to fail. The system should detect the failure of a node and reconfigure itself to use the remaining nodes.

To achieve these design goals each kernel subsystem was modified to to be aware of the same subsystems on other nodes and to provide its services in a location transparent manner. For example, a global file system called the proxy file system, or PXFS, a distributed pseudo /proc file-system and distributed process management[9] were all implemented.

These global subsystems were implemented as a set of components built on top of the base kernel using the CORBA object model. By using CORBA the subsystem interfaces could easily be provided to the entire distributed system. Also, by abstracting components of the kernel using CORBA's interface definition language it was possible to use an IDL compiler to create the glue needed to perform arbitrary service requests on these components.

Solaris MC was the inspiration for this project. The project began as an investigation into how a system like Solaris MC might be implemented on Linux.

Message Passing


A message passing system is a subsystem of a cluster operating system that provides a set of message-based protocols which allow nodes in a cluster to communicate. The choice of a message passing system is fundamental to the design of a cluster operating system, as it is upon this subsystem that that the rest of the operating system must be built.

A message passing system shields the details of communication from the programmer, i.e. the programmer should need to be aware of as little as possible about the underlying network protocols.

Low-level message passing systems, such as the BSD sockets API, deal with the transfer of raw data from process to process. These general schemes are used as an interface for the programmer to the transport or network layer communication protocols, such as TCP/IP.

These low-level schemes often prove to be unwieldy for the programmer and, in these cases, a higher level communication mechanism must be used. This often takes the form of a Remote Procedure Call(RPC) mechanism which is based on a model very familiar to programmers.

The rest of this chapter will examine in detail these types of high level message passing schemes.

Desirable Features of A Good System

A message passing system may be evaluated under several headings :

Kernel Space Corba - kORBit


The Common Object Request Broker Architecture, or CORBA, is a specification[10] written and maintained by the Object Management Group(OMG). The specification supplies a set of abstractions and services needed to implement practical solutions for the problems associated with the development of applications in a distributed heterogenous environment[11]. The specification is language, platform and network architecture independent so as to aid industry-wide adoption of the standard.

CORBA is an architecture with mechanisms for objects to make requests and receive responses in a distributed heterogenous environment, somewhat similar to RPCs. At its core is the Object Request Broker, or ORB, which is solely responsible for all of the mechanisms required to locate an object implementation for a request, communicating the request and its arguments to the object implementation and retrieving the results of the request. There are quite a number of ORBs available, for example ORBit, MICO, JacORB and ORBix, and these ORBs, provided that they adhere strictly with the CORBA specification, can all inter-operate without problems.

The idea of having a kernel-space ORB to facilitate message passing in a cluster operating system was central to the design of Solaris MC. This idea was by no means a unique one and, indeed, was also put forward by Michi Henning, co-author of `Advanced CORBA Programming with C++'4.1.

``Suppose for a moment that CORBA is built into the OS kernel, and that OS objects, such as segment descriptors and inodes, are CORBA objects that are addressed via object references. Suddenly, a whole new world opens up. For example, there would be no need for distributed file systems such as NFS, RFS, or DFS(which are just more application-level protocols for specific purpose, namely file sharing). Instead, all file systems would be automatically distributed, almost by accident. You could share files at any level of granularity, down to a single file, simply because they are individually addressable by object references (instead of having to export sub-trees of a file system).''

Until very recently no implementation of a kernel-space ORB for Linux existed. `kORBit', released in December 2000, is a port of ORBit, the ORB used by the GNOME desktop, to kernel-space. kORBit can be best described as a kludge, and was developed purely as a proof of concept project. The port to kernel-space was achieved by providing an implementation of a subset of the standard C library, GNOME's `glib' library and various other user-space functions in header files. Parts of ORBit's source was modified so that a kernel-space thread is spawned during the ORB's initialisation routine to handle client requests. No advantage is taken of any possible kernel-space optimisations and because ORBit has no support for multi-threading, kORBit can only handle one process at a time.

Since March 2001, the project's mailing-list, web-page and CVS repository has been static. It is safe to assume the project has been discontinued.

Kernel Space RPC - SunRPC


SunRPC[12] is an RPC variation developed by Sun Microsystems in the mid 1980s to be used under its distributed file-system product, NFS[13]. Because NFS became popular, SunRPC became widely used on all sorts of platforms.

SunRPC programs are defined using the RPC language, an extension of Sun's External Data Representation(XDR) standard[14]. Every program is allocated a unique program number and version number. Clients can locate program services by contacting the portmapper5.1 service with the program's program and version numbers.

On Linux, NFS is implemented as a kernel-space daemon. In order that such a kernel-space service could be developed, it was also necessary to develop a kernel-space SunRPC library. This library was first developed in 1992 and has become very stable and relatively bug-free in recent years.

The Remote Device


In order to evaluate kORBit and kernel-space SunRPC, it was neccessary to put the two message passing systems to use in a real-world distributed kernel-level application. This is where the main body of practical work for this project was carried out.

Solaris MC was the original inspiration for this project, so it was fitting that some of the ideas behind Solaris MC should be used for this application. One of the main motivations behind Solaris MC was to provide the user with a single-system-image, that is, the user should be unaware of the distributed nature of the system.

One application of this is idea is that the user should be able to use a physical device attached to any of the nodes in the cluster as if it were physically attached to the node the user was operating from.

Using kernel-space CORBA and RPC the following were implemented

  1. A generic device driver interface written in OMG IDL and the RPC language respectively.

  2. A client device driver which can be accessed through the usual character special file interface.

  3. A device server which would control the physical device using existing device-driver code.

The actual device being controlled in this example application is not a physical device, but one that stores and retrieves data to and from the server's memory.

What follows is a description some of the issues considered while developing the two example device drivers. A more detailed description of the code for the drivers is presented in appendices A, B and C.


The CORBA implementation of the remote device is one that takes advantage of number of features that are specific to CORBA. This was a concious effort to not use the ORB as another RPC implementation.

CORBA is heavily based on object-orientated concepts. In a CORBA system there exists a collection of objects, isolated from their clients by well-defined encapsulating interfaces. Each object provides the service set out by its interface definition.

Thought of in this way, a device provides two important services - `read' and `write'. Every device `object' must provide this service in order to be considered a device. This is reflected in the corbaDevice interface.

interface corbaDevice {
  // ...
  typedef string dataString;

  void read( in long id, out dataString data ) raises( UnkownId );
  void write( in long id, in dataString data ) raises( ArgTooBig );

The read and write functions are passed two arguments - id and data. id uniquely identifies the dataString to be stored or retrieved by the device.

Another feature, specific to CORBA and object systems in general, taken advantage of in this definition is exceptions. Exceptions provide a more fine-grained approach to error reporting than error reporting techniques found in conventional programming. An exception not only is an indication that an operation request has failed, but also can be accompanied by additional, exception-specific information.

The exceptions defined in the corbaDevice interface, UnknownID and ArgTooBig, demonstrate how exceptions can be used to give a precise indication of the reason for the procedure failure.

interface corbaDevice {
   exception UnkownId {
     long id;

   exception ArgTooBig {
     long argLen;
     long maxLen;
   // ...

In a CORBA object system each object is identified by an object reference, or IOR. This reference provides the client ORB with the information required to communicate with the server ORB. Using an initial object reference, other object references can be obtained through operations on that object6.1. A difficult problem is how the client obtains that initial reference.

It is quite common in CORBA systems for an object reference to be converted into its stringified form, using the object_to_object ORB API method, and then transmitted to the client using out-of-band means(such as e-mail). For the project, the object reference was stored in a file and made available to the client using NFS.

With the remote device the extra problem of exporting the stringified reference from kernel-space is also presented. The most elegant way of achieving this is by using the /proc filesystem. This is a filesystem with memory resident files by which kernel resident data can be made available to user-space. For example, the corbaDevice object exported its reference using /proc/corba/device-server.


SunRPC is purely a remote procedure call standard. It has no notion of objects - it allows a client to execute a procedure on a remote server. The CORBA object can be compared to a SunRPC program. Like a CORBA interface, a SunRPC implements one or more procedures which may be initiated by clients.

A SunRPC procedure is identified by three fields in a RPC call message :

  1. Remote program number - These unique 32-bit numbers are usually assigned by Sun Microsystems, although developers are free to use the range 0x20000000 to 0x3fffffff for testing purposes.

  2. Remote program version number - This number identifies the version of the remote program that the client is using. Version numbers allow a server to implement different protocols at the same time.

  3. Remote procedure number - Each procedure in a SunRPC program specification is identified by a procedure number. This is unlike a CORBA procedure which is identified by its name.

The remote device was given 0x20101010 for its program number. Only one version of the program was defined and the program had two procedures, `read' and `write'.

program DEVICE_PROG {
   version DEVICE_VERSION {
      device_readres  read ( device_readargs  )  = 1;
      device_writeres write( device_writeargs )  = 2;
      } = 1;
   } = 20101010;

With SunRPC, only one argument may be passed to a function. If it is required that a procedure must be passed more than one parameter, these paramaters should be encapsulated in a structure. For example, the remote device's write procedure takes two arguments - the data identifier and the data itself. These are encapsulated in a device_writeargs structure.

const DEVICE_STRSZ = 1024;

struct device_writeargs {
 int    id;
 string data<DEVICE_STRSZ>;

The same is also true for the results of a procedure. If more than one result is to be passed from the procedure, they too must be encapsulated in a structure and this structure type specified as the return type for the procedure. SunRPC has no notion of `out' parameters.

The problem of how a client locates a server is handled gracefully with SunRPC. If the client knows the address of the server machine, the program number and version number, it may contact the portmapper program running on the remote host. The portmapper holds a list of registered programs, and the TCP or UDP port numbers that these programs are listening on. Once a client has discovered the port number using this protocol, it may contact the server directly and initiate a procedure call.

The device implementation

On the client side the remote device acts exactly like any other device driver. Once the driver has been loaded with insmod it can be accessed through a character special file. These special files are created using makenod. So for example, the CORBA remote device can be set up as follows :

[root@mark device]# insmod corba-device-client.o
corba-device-client: device registered on major 254
[root@mark device]# lsmod
Module                  Size  Used by
corba-device-client     1904   0  (unused)
korbit                122304   0  [corba-device-client]
[root@mark device]# mknod /dev/corba-device1 c 254 1
[root@mark device]# ls -l /dev/corba-device1
crw-rw-r-- 1 root  root  254,  1 Feb 10 14:06 /dev/corba-device1

The device can now be accessed by reading and writing to /dev/corba-device1, like any other device.

On the server side, the data written to the device is stored in a hash table, uniquely identified by the minor number of the device node. For an application to use the device as a storage area for its data, a special file with a minor number unique to that application would be created. The application would then read and write to the file.

kORBit/SunRPC - a comparison


The relation of an ORB to a remote procedure call is often confused. Although it may be useful for a programmer to think of CORBA as on object oriented RPC, the two are in fact semantically distinct. They do, however, have some common features. Let us look at some of the differences and commonalities.


A message passing system should be simple and easy to use. It should be relatively straight-forward to write code using the system. It should be not be neccessary for the developer using the system to worry excessively about network details.

CORBA is a relatively straight-forward RPC mechanism. Although, the code for even the smallest possible CORBA application may seem fairly intimidating at first, one must bear in mind that most of the code in both the client and sever parts of a CORBA application is boilerplate and seldom changes. Once the various client and server initialisation code is complete, the client mainly consists of calls on the server object and most of the body of the server's code is made up of the implementation of the server's interface methods.

kORBit is, more or less, as simplistic to develop for as any other CORBA ORB. Stubs and skeletons are compiled using an unmodified orbit-idl compiler and need not be changed for use in the kernel. The major difference between developing a kernel module with kORBit and developing a user space program with ORBit is adjusting to the concept that a module is loaded and not linked.

Other factors that affect the simplicity of using kORBit are not by design. kORBit is unfinished and, as a result, it may be neccessary to modify kORBit itself in order to, for example, use an ORB API method that has not been exported by the kORBit module.

Developing applications in user-space using SunRPC is also very staight-forward. The main body of work in writing a SunRPC server is in the writing of the implementation functions themselves. All code required for marshalling the arguments and de-marshalling the results of an RPC are generated using the rpcgen program. The SunRPC library handles most network level details and the registering of the program with the portmapper.

Kernel-space SunRPC is quite a different beast, however. The kernel's SunRPC library was developed alongside the NFS implementation and the two are quite closely linked. The library was also developed in such a way that the user of the library could write code that takes into account every possible kernel optimisation such as a zero-copy RPC using pre-allocated slabs. The SunRPC library in the kernel is, therefore, a much more low-level implementation than the user-space one.

A kernel SunRPC developer must aquaint himself with the exact encoding rules for SunRPC, as the marshalling and demarshalling methods must be written by the developer7.1. The developer must also be aware of details such as the exact amount of memory required for the arguments and return values of each function, in both XDR representation and the system's natural representation.


A distributed system is prone to catatastrophic events such as node crashes or communication medium failures. These events may cause a communication to be interrupted and the loss of data. A reliable message passing system must be designed to handle such events gracefully and recover from them where possible.

Much of the reliablility of a message passing system is based upon the reliablility of the underlying transport protocol. Both kORBit and SunRPC may use TCP as their underlying transport mechanism. By design, TCP is a reliable, connection-orientated protocol that takes care of such details as acknowledgments, timeouts, retransmissions, and the like[15]. However SunRPC also offers the option of using UDP as the transport layer protocol, which is unreliable datagram protocol. Thus, by default, if a kORBit or SunRPC service uses TCP as its transport protocol, it will be a more reliable service than the equivalent UDP based SunRPC service.

Another important aspect of the reliabilty of a message passing system is the coarseness of the system's error reporting. If it is possible for an application to determine more closely the cause of an error condition, it may be easier for the application to recover. CORBA provides an extremely fine-grained error reporting system through its exception mechanisms. User-defined system exceptions can be used to extend the standard system exceptions and provide and extensive error reporting system. In contrast, the error-reporting provided by SunRPC is no more than the error-reporting that can be provided by local procedure calls. This makes it very difficult for a programmer to build reliability into his application.

The level of reliability built into the design of a message passing system is, of course, important. However, no matter how reliable a design may be, it is often much more important how reliable the implementation of a design is.

The kernel-space SunRPC implementation has been available for quite a number of years, and in that time has seen many revisions and bug-fixes. Until recently the remaining issues were with the TCP code in the system, but have seemingly been fixed. The system can be considered to be extremely stable and bug-free.

In contrast kORBit is a very new project and its code has seen very little testing. In fact it may be that the only testing the code has received, apart from testing by its original authors, was during the development of this project. kORBit, for now at least, cannot be considered reliable.


Efficiency is a critical issue for a message-passing system to be acceptable. If a message-passing system is inefficient and expensive to use the entire system will suffer.

One of the first points to consider when comparing the efficiency of the two systems is the network overhead associated with each individual message. In the case of SunRPC, each CALL message has a minimum7.2 size of 40 bytes. In contrast, a CORBA Request message has an overhead of around7.3 70 bytes. However, both systems' reply messages carry a minimum overhead of 24 bytes.

Both systems use a similar set of rules for the encoding of procedure arguments, CORBA's CDR and SunRPC's XDR. Neither set of encoding rules use self-identifying data. This means that the encoder and decoder of the data must agree on what data is being exchanged, not only the rules by which the data is encoded. In CORBA this agreement comes in the form of the IDL interface definition and, in SunRPC the program definition is used.

The major difference between the two encoding rules which would affect efficiency is the byte ordering of the data. CDR-encoded data is tagged to indicate the byte ordering of the data. If the receiver uses a different byte ordering, the receiver is responsible for byte-swapping. This model, called `receiver makes it right', compares favourably with XDR's model of requiring big-endian ordering on the wire. With XDR, communication between two little-endian machines would be severely penalised. This is the case for communication between nodes in a cluster based on Intel hardware.

The efficiency of each system's underlying transport layer also deserves consideration. Both systems may use TCP/IP which, because it is a reliable, connection-orientated protocol, carries the substantial overhead associated with message acknowledgements and a three-way handshake setup phase. SunRPC on the other hand also offers the choice of using UDP as the transport protocol. This is a datagram orientated protocol and carries no setup or acknowledgment overhead.

In order to accurately determine the relative efficiency of kORBit and SunRPC the well-known suite of benchmarking tools, namely lmbench[16], were adapted to measure the latency of a simple `ping-pong' procedure using both kORBit and SunRPC.

This was achieved by developing, using both systems, kernel modules that introduced new system calls by which the ping-pong procedure could be called. The latency of these system calls were measured using a modified version of lmbench's lat_syscall program. The results are shown below

[root@mark benchmarking]# ./lat_syscall corba
CORBA ping pong test: 69985.0000 microseconds

[root@mark benchmarking]# ./lat_syscall sunrpc
SunRPC ping pong test: 2052.6667 microseconds

It is quite obvious from these results that kernel-space SunRPC is far more efficient than kORBit. However, it is more than likely that the huge deficiency between the two is due to the implementation details of kORBit.


The comparison of kORBit and kernel-space can be summarised with the table below.

  kORBit SunRPC
Simplicity Automatically generated stubs. Hand written stubs.
  Same API as user-space. Kernel-space API is very different and is undocumented.
Reliability Reliable transport protocol. Choice of reliable and unreliable transport protocols.
  Fine-grain error reporting possible. Same error reporting as local procedure call.
  Largely untested code. Stable and bug-free code.
Efficiency 70 byte request header. 24 byte reply header. 40 byte call header. 24 byte reply header.
  CDR encoding rules. XDR encoding rules.
  Receiver makes right byte ordering model. Big-endian byte ordering.
  TCP transport protocol. Both TCP and UDP transport protocols.
  69,985.0000 microseconds ping-pong latency. 2,052.6667 microseconds ping-pong latency.

Plan 9 - an Alternative?

9P - Plan 9's file protocol

Plan9 is a distributed system developed at AT&T Bell Laboratories during the late eighties and early nineties. Plan9 is heavily based on the UNIX concepts and ideas, but is a complete re-think of these ideas in an effort to allow UNIX handle a distributed, heterogenous environment in a more effective manner. Plan9's catch phrase, from the very beginning, was to `build a UNIX out of a lot of little systems, not a system out of a lot of little UNIXes'[17].

A Plan9 installation consists of a number of computers networked together. Each of these computers offers a different class of service, for example, large SMP servers offer computing cycles, and other large machines with large storage capacities provide a file storage service. These high-end servers are connected by high-speed networks and located together. Workstations are then connected, via low bandwidth networks like Ethernet, to these servers.

The concept at the core of Plan9 is the use of the file-system to co-ordinate the naming and access to resources, even those, such as devices, not traditionally treated as files. This is a very UNIX-like idea, but has been applied more rigorously in the design of Plan9. In Plan9 all resources are abstracted as files and a network-level protocol, 9P, is used to provide the user with remote access to these resources[18].

A Plan9 server exports one or more hierarchical file-systems that may be accessed by Plan9 processes. To access a server's file tree, a client process must attach the file tree to its private name space using bind or mount. After this, all system calls operating on that server's file tree are translated into request messages transmitted to the server[19]. 9P is the message passing system used to carry out these transactions.

9P contains 17 message types. A number of these are very similar to file related UNIX system calls, e.g open, create, read, write and stat. Although some messages are familiar, 9P also introduces messages that are quite different from any existing UNIX file operations.

The mount device in the Plan9 kernel manages all currently attached file trees. The client interface to these file trees is through regular rather than remote procedures calls. The mount device translates these calls into 9P RPC messages to be transmitted to the server over a transport protocol pipe to a user process or regular procedure call. The mount device is, in effect, a proxy object similar to a CORBA proxy object or a SunRPC stub.

Coding for Kernel Space - the problems.

One of the major hurdles that had to be overcome during this project was the extra difficulties associated with kernel-space programming. Because the kernel is the body of code that has direct control of the computer's hardware, one must be very careful - a coding error could, potentially, have serious implications for the integrity of the system.

Kernel programming can also require great patience. The time required to re-compile the kernel, reboot the computer and test a change that one has made can be very significant.

Kernel modules make it possible for a kernel developer to avoid this extra development time under certain circumstances. Modules are object files which may be linked, at run time, to the kernel - thus allowing changes to be tested more rapidly.

Because of a lack of available documentation, kernel programming also involves a very steep learning curve. The only documentation available for the kernel API, in most cases, is the code itself. None of the standard C libraries are available to the programmer, therefore he must adjust himself quickly to even perform trivial tasks such as outputting a debugging message.

A kernel developer must learn to be very careful when coding. Apart from the danger to the system, finding bugs can be very difficult and time consuming. No debugger is available to kernel programmers, so careful examination of the code, coupled with brute force debugging is often necessary. Also, because a bug may cause a system lock-up requiring a reboot, it may take quite a long time to identify the source of a bug.

Additional Project Work

Using Gtk from Kernel-Space

Most of the coding effort for this project was concentrated on the remote device drivers using CORBA and SunRPC. However, it occured to me that kernel-space CORBA had another use. It could be used as a mechanism for interfacing kernel-space with user-space.

As a brief example of this I developed a kernel module that communicated, using CORBA, with a user-space server written in Perl. This server allowed the kernel module to interact with the user using a graphical user interface developed with the GTk widget set.

This kind of use of kernel space CORBA is clearly inappropiate. A graphical user interface to a running kernel would neither be required, nor useful. However, this example does demonstrate the possibilities for silliness if a stable implementation of kernel-space CORBA existed.

Kernel Symbols Document

While developing the code for this project, one of the major difficulties was coming to terms with the system for handling kernel-level symbols. This was made all the more difficult by the fact that documentation on this system is not available. Eventually, after a lot of reading of code, I came to grips with the CONFIG_MODVERSIONS system.

I soon came to realise that I was not alone in my difficulties in understanding the system and I found myself answering questions on the `kernelnewbies' mailing-list. In an effort to make things easier for others and contribute something to the community I wrote a document10.1 describing how the system works.

The document attracted quite a lot of interest when I posted it to the linux-kernel mailing list. Several people emailed me to thank me for clearing up difficulties they were having. Also, the original designer of the system emailed me to confirm that document was accurate.

`Kernel Traffic', the very popular weekly summary of discussions on the linux-kernel mailing list also mentioned the document10.2.


Since I was developing the project on an `Open Source' platform, I felt compelled to contibute as best I could to the furtherment of the platform. I attempted to fix any bugs or feature-faults I encountered. Below is a list of the patches submitted


Both kORBit and SunRPC have their merits and demerits. For this reason it is the authors opinion that a hybrid system built upon the simplicity of SunRPC, but incorporating some of the useful ideas in CORBA, would be most suitable for an inter-kernel message passing system.

As a conclusion, this may seem ill-considered, but is justified. Because the kernel is such an integral part of a system's performance, any mechanisms it uses must be tailored and optimised specifically for the kernel. This, of course, is a trade-off between speed and a generic implementation.

Kernel-space SunRPC is an efficient and optimised remote procedure call mechanism. kORBit, because it provides so much more functionality, is far too inefficient.

Also SunRPC's portmapper protocol is a very simple, but effective, bootstrapping mechanism. On a broadcast LAN this method would make it very easy for a node to detect the available services on the LAN. The CORBA equivalent of the portmapper, the CORBA Naming Service, is too over-engineered for a tightly coupled system.

kORBit's automatically generated stubs make the programmer's life much easier but they cannot take advantage of possible kernel-space optimisations. If the SunRPC library had a richer API, it might be easier to define the stubs.

Furthermore, CORBA by default uses TCP as its transport protocol. For a LAN based cluster the extra overhead associated with TCP is unnecessary. UDP based SunRPC operates well in such an environment.

However, CORBA's object model would map very neatly to kernel-space objects like files, processes etc. Features like the POA's implicit activation of objects would be especially useful.

In short, a domain-specific object broker implemented as a thin layer above the SunRPC API would be the most likely candidate as the core message-passing system for a Linux based cluster operating system.


N. Boden, D. Cohen, R. Felderman, A. Kulawik, C. Seitz, J. Seizovic, and Wen-King Su. Myrinet: A Gigabit-per-Second Local Area Network.

Yousef A. Khalidi, Jose M. Bernabeu, Vlada Matena, Ken Shiriff, Moti Thanadi. Solaris MC: A Multi Computer OS. From the proceedings of 1996 USENIX Conference, January 1996.

Thomas E. Anderson, David E. Culler, David A. Patterson. A Case for NOW (Networks of Workstations) and the NOW team.

John B. Carter Alan L. Cox, Sandhya Dwarkadas, Elmootazbellah N. Elnozahy, David B. Johnson Pete Keleher, Steven Rodrigues, Weimin Yu, and Willy Zwaenepoel. Network Multicomputing Using Recoverable Distributed Shared Memory.

Amnon Barak, Oren La'adan, Amnon Shiloh, Institute of Computer Science, The Hebrew University of Jerusalem. Scalable Cluster Computing with MOSIX for LINUX.

Wensong Zhang, National Laboratory for Parallel and Distributed Computing. Linux Vitrual Server for Scalable Network Services.

Daniel Ridge, Donald Becker, Phillip Merkey, Thomas Sterling. Beowulf: Harnessing the Power of Parallelism in a Pile-of-PCs. Proceedings, IEEE Aerospace, 1997.

Jose M. Bernabeu-Auban, Vlada Matena, Yousef A. Khaldi, Sun Microsystems Laboratories. Extending a traditional OS using Object-Orientated Techniques.

Ken Shiriff, Sun Microsystems Laboratories. Building Distributed Process Management on an Object-Orientated Framework.

Object Managment Group. The Common Object Request Broker: Architecture and Specification. Revision 2.4, October 2000.

Michi Henning, Steve Vinoski. Advanced CORBA Programming with C++. Addison-Wesley, Reading, Massachusetts, 1999, ISBN 0-201-37927-9.

R. Srinivasan, Sun Microsystems, Inc. RFC 1831: Remote Procedure Call Protocol Specification Version 2.

Sun Microsystems, Inc. RFC 1813: NFS Version 3 Protocol Specification.

R. Srinivasan, Sun Microsystems, Inc. RFC 1832: External Data Representation Standard.

W. Richard Stevens. Unix Network Programming Volume1, Networking APIs: Sockets and XTI. Prentice-Hall, 1998 , ISBN 0-13-490012-X

Larry Mc Voy, Silicon Graphics Inc., Carl Staelin, Hewlett-Packard Laboratories. lmbench: Portable tools for performance analysis.

Rob Pike, Dave Presotto, Sean Dorward, Bob Flandrena, Ken Thompson, Howard Trickey, and Phil Winterbottom, Bell Laboratories. Plan 9 from Bell Labs.

Rob Pike, Dave Presotto, Ken Thompson, Howard Trickey, and Phil Winterbottom, Bell Laboratories. The Use of Name Spaces in Plan 9.

Plan 9 manual pages - Section 5 - Plan 9 File Protocol, 9p.

Object Managment Group. C Language Mapping Specification.

The Device Driver Code

The pseudo device developed for this project, the Remote Device, behaves in the same way as any character special file under UNIX. It can be opened, closed, read from and written to. Most character special files are an interface to an actual physical device. However, this pseudo device merely stores and retrieves strings, which are uniquely identified by the minor number of the character file, to and from memory.

Every device in UNIX is controlled by a body of code called a device driver. With Linux, these device drivers can be compiled as part of the kernel or loaded into the kernel at run time as a kernel module. Outlined below is an overview of the device driver for the Remote Device.

Registering the Device

When a device driver module is loaded into the kernel it must have an init_module function defined. This function is called by the kernel to allow the module to initialise itself. In the case of device drivers, this usually involves registering itself with the kernel and initialising the physical device.

When a device driver registers itself with the kernel, it provides a file_operations structure which defines how the functionality of the device is implemented.

static struct file_operations client_device_fops = {
   .owner   = THIS_MODULE,
   .read    = client_device_read,
   .write   = client_device_write,
   .open    = client_device_open,
   .release = client_device_release

In the init_module function, this structure, along with the major number and name of the device, is passed to the register_chrdev which registers the device driver as a character device driver.

int init_module( void ) {

 register_chrdev( device_major, "client-device", &client_device_fops );

 return 0;

When the driver module is unloaded from the kernel, it must unregister itself with the kernel. This is done using unregister_chrdev in the cleanup_module function.

void cleanup_module( void ) {

 unregister_chrdev( device_major, "client-device" );

The Device Implementation

The file_operations structure that the driver registers using the register_chrdev function defines the functionality of the device by specifying the functions that implement the device's open, close, read and write operations.

The functions that implement the open and close operations merely increment and decrement, respectively, the module's usage count. This usage count is used by the kernel to ensure that the module cannot be unloaded while it is in use.

static int
client_device_open( struct inode *inode, struct file *file ) {


 return 0;

static int
client_device_release( struct inode *inode, struct file *file ) {


 return 0;

The functions that implement the read and write operations remotely retrieve and store, respectively, the string passed to the driver using the the buff argument. This is carried out using either SunRPC or CORBA and will be shown in detail later.

The buff argument is a pointer to the location of the string in user-space. Because the string is not located in kernel-space, it cannot be accessed directly, but must be accessed using the copy_from_user or copy_to_user functions.

static ssize_t
client_device_read( struct file *file, char *buff, 
                    size_t length, loff_t *offset ) {

 /* CORBA or SunRPC code to retrieve the string */

 if ( copy_to_user( buff, res.data, num_bytes ) != 0 ) {
    return -EFAULT;

 return num_bytes;

static ssize_t
client_device_write( struct file *file, const char *buff, 
                     size_t length, loff_t *offset ) {

 if ( copy_from_user( args.data, buff, num_bytes ) != 0 ) {
    return -EFAULT;

 /* CORBA or SunRPC code to store the string */

 return num_bytes;

The CORBA code

The Remote Device device driver was developed using both CORBA and SunRPC to remotely store and retrieve the data written to and read from the device. This appendix outlines the code to implement the CORBA version of the device driver.

Defining the Interface

The development of any CORBA application, in user-space and kernel-space, begins with the design of the interface the server will provide. This is specified in CORBA IDL, an Interface Definition Language.

The corbaDevice interface has two functions - read and write. Both functions take an identifier as an argument which identifies the data to be stored or retrieved. Both functions can raise an exception to indicate the failure of the operation. The write function is passed the data string to be stored as a parameter. The read function returns as an `out' parameter the retrieved data string.

interface corbaDevice {

 exception UnkownId {
    long id;

 exception ArgTooBig {
     long argLen;
     long maxLen;

 typedef string dataString;

 void read( in long id, out dataString data ) raises( UnkownId );
 void write( in long id, in dataString data ) raises( ArgTooBig );

Generating the Stubs

The stubs generated for a kernel-space application are the same as the stubs generated for a user-space application. This is done using the orbit-idl IDL compiler.

[mark@mark device]$ orbit-idl device.idl

This command generates four files

  1. device.h This file defines all structure definitions from the corbaDevice, along with prototypes of the functions required to allocate these structures. Prototypes for the read and write functions are also defined. This file must be included by the server and client code.

  2. device-stubs.c This file defines the client-side read and write `proxy' functions. These functions handle the marshaling of the arguments to the function, sending the Request message and demarshaling the results from the Reply message. This code must be linked to the client code.

  3. device-skels.c This file contains code that handles the server-side of the interface. Functions that handle the marshalling and demarshalling of results and arguments are defined along with functions that are used by the ORB to locate the servant implementation functions. This code must be linked to the server code.

  4. device-common.c This file contains functions for allocating and freeing memory for each of the structures defined for the interface. This code must be linked to both the server and client code.

Implementing the Server

A servant is a language specific entity that can incarnate a CORBA object. Since C is not an object orientated language, a servant is composed of a data structure that holds the state of the object along with a table of method functions that manipulate that state to implement the CORBA object. This table of method functions is called an entry point vector, or EPV[20].

The EPV for the device servant is defined as

static POA_corbaDevice__epv device_epv = {
 ._private = NULL,
 .read     = device_read,
 .write    = device_write

device_read and device_write are the two server side functions for storing and retrieving the data.

A number of steps must be followed before the server can accept requests. These same steps must be followed in a user-space program, but in the case of a kernel module these steps will be excuted in the module's init_module function.

  1. Initialise the ORB
    device_orb = CORBA_ORB_init( &argc, argv, "orbit-local-orb", &d_ev );

  2. Initialise the servant
    POA_corbaDevice__init( &device_servant, &d_ev );

  3. Obtain a reference for the root POA, create a servant manager and activate the manager
    device_poa = CORBA_ORB_resolve_initial_references( device_orb, 
                                                       "RootPOA", &d_ev );
    d_poa_mgr = PortableServer_POA__get_the_POAManager( device_poa, &d_ev );
    PortableServer_POAManager_activate( d_poa_mgr, &d_ev );

  4. Create an object ID, which will uniquely identify the object in the POA's Active Object Map, and then activate the object[11].
    device_objid = PortableServer_string_to_ObjectId( "CorbaDevice", &d_ev );
    PortableServer_POA_activate_object_with_id( device_poa, device_objid,
                                                &device_servant, &d_ev );

  5. Obtain a reference for the servant and export the servant's Interoperable Object Reference, or IOR, using the /proc interface
    device = PortableServer_POA_servant_to_reference( device_poa,
                                                      &d_ev );
    korbit_register_ior( "corba-device-server", device, device_orb, &d_ev);

  6. Make the ORB start listening for requests
    CORBA_ORB_run( device_orb, &d_ev );

When the server module is being unloaded the servant must be removed from the POA's active object map, the ORB shut down and the /proc entry removed.

 PortableServer_POA_deactivate_object( device_poa, device_objid, &d_ev );
 CORBA_ORB_shutdown( device_orb, 0, &d_ev );
 remove_proc_entry("corba/b-device-server, 0);

Implementing the Client

When the device client module is loaded into the kernel, in addition to registering the device with the kernel, it must also initialise the ORB and instantiate a proxy object from the object's stringified IOR.

device_orb = CORBA_ORB_init( &argc, argv, "orbit-local-orb", &device_ev );

device = CORBA_ORB_string_to_object( device_orb, device_ior, &device_ev );

The object's stringified ior, device_ior, is passed to the module as a load-time parameter. To allow for this, the variable must be global to the module and indicated to be a module parameter using the MODULE_PARM macro.

static char *device_ior = NULL;

MODULE_PARM( device_ior, "s" );

When the module is being loaded, the parameter is passed to the module on the command line.

[root@mark device]$ insmod corba-device-client.o device_ior=$(cat device.ior)

Once the proxy object has been instantiated it is a simple matter of initialisig the arguments for the read or write functions and calling the function on the proxy object.

corbaDevice_write( device, id, buffer, &device_ev );

The SunRPC code

Developing the SunRPC version of the Remote Device is very different from user-space SunRPC programming. The API is not the same and stubs must be written by hand rather than generated using rpcgen. Because stubs are hand-written, it is not actually neccessary to define the interface using the RPC language. An interface definition should be written, though, as it provides a concrete specification of the interface.

Defining the Interface

The RPC_DEVICE_PROG program number is defined to be in the user-defined range, 0x20000000 - 0x3fffffff, as specified by the SunRPC RFC[12]. Only one version of the program is specified. The program has two procedures, rpc_device_read and rpc_device_write.

const DEVICE_STRSZ = 1024;

struct device_writeargs {
 int    id;
 string data<DEVICE_STRSZ>;

struct device_readargs {
 int    id;

struct device_writeres {
 int    stat;

struct device_readres {
 string data<DEVICE_STRSZ>;
 int    stat;

program DEVICE_PROG {
  device_readres  read ( device_readargs  )  = 1;
  device_writeres write( device_writeargs )  = 2;
  } = 1;
 } = 20101010;

Implementing the Service

The RPC program service is defined using a svc_program structure which includes information such as the program number and the versions of the service which are available.

static struct svc_program rpc_device_svc_program = {
 .pg_prog   = RPC_DEVICE_PROG,           /* program number      */
 .pg_lovers = 1,                         /* lowest version no.  */
 .pg_hivers = RPC_DEVICE_NRVERS - 1,     /* highest version no. */
 .pg_nvers  = RPC_DEVICE_NRVERS,         /* number of versions  */
 .pg_vers   = rpc_device_svc_versions,   /* version array       */
 .pg_name   = "rpc-device",              /* service name        */
 .pg_stats  = &rpc_device_svc_stats      /* rpc statistics      */

The pg_vers member of this structure is an array of svc_version structures. There must be one of these structures defined in the array for every version that is implemented. These structures contain the version number and number of procedures available for this version of the programs.

static struct svc_version rpc_device_version1 = {
 .vs_vers     = 1,                     /* version number              */
 .vs_nproc    = 3,                     /* number of procedures        */
 .vs_proc     = rpc_device_procedures, /* array of per-procedure info */
 .vs_dispatch = NULL                   /* use the default dispatcher  */

static struct svc_version *rpc_device_svc_versions[] = {
 NULL,                                 /* no version 0 */

The vs_proc member of the svc_version structure is an array of svc_procedure structures, one for each procedure. Each structure details information such as the function that implements the procedure, the XDR encoding and decoding functions and buffer sizes.

struct svc_procedure rpc_device_procedures[] = {
  .pc_func      = (svc_procfunc)rpc_device_read,
  .pc_decode    = (kxdrproc_t)xdr_device_decode_readargs, 
  .pc_encode    = (kxdrproc_t)xdr_device_encode_readres,
  .pc_release   = NULL,
  .pc_argsize   = RPC_DEVICE_READ_XDR_ARGSZ,
  .pc_ressize   = RPC_DEVICE_READ_XDR_RESSZ,
  .pc_count     = 0,
  .pc_cachetype = RC_NOCACHE

Each procedure implementation function accepts three arguments - a svc_rqst structure with information about the request in progress and two void pointers locating the function arguments and results.

static int
rpc_device_write( struct svc_rqst *req, 
                  device_writeargs *args,
                  device_writeres *res ) {


 return RPC_SUCCESS;

Each XDR encoding and decoding function also accept three arguments - a svc_rqst structure, a pointer to a buffer of 32 bit words, or QUADS, which contains the data received or to be transmitted in its XDR representation and a void to the results or arguments to be encoded or decoded respectively. Each function must adhere strictly to the XDR data encoding rules[14].

static int
xdr_device_encode_readres( struct svc_rqst *req, u32 *p, 
                           device_readres *res ) {

 *p++ = htonl( DEVICE_STRSZ );
 memcpy( p, res->data, DEVICE_STRSZ );

 *p++ = htonl( res->stat );

 return device_ressize_check( req, p );

In order to make the server begin listening for requests a number of steps must be followed

  1. Allocate and initialise an svc_serv structure which will contain information about the state of the server.

    struct svc_serv *serv;
    serv = svc_create( &rpc_device_svc_program, 
                       RPC_DEVICE_XDRSIZE );

  2. Set up a socket on which the server will accept requests.

    svc_makesock( serv, IPPROTO_UDP, port );

  3. Spawn off a kernel thread which will poll the socket for requests and dispatch any it receives.

    svc_create_thread( rpc_device_thread_func, serv );

Implementing the Client

On the client side, very similar structures must be created to define the RPC program. The main difference between these structures is that the per-procedure info does not specify a function to implement the procedure and that the XDR functions must encode the arguments and decode the results of the procedure, rather then the other way around on the server.

static struct rpc_procinfo rpc_device_procedures[] = {
  .p_procname = "read",                                 /* procedure name  */
  .p_encode   = (kxdrproc_t)xdr_device_encode_readargs, /* xdr encode func */
  .p_decode   = (kxdrproc_t)xdr_device_decode_readres,  /* xdr decode func */
  .p_bufsiz   = RPC_DEVICE_XDRSIZE,                     /* xdr buffer size */
  .p_count    = 0                                       /* call count */

In order for the client to call a procedure on the server it must first set up a socket, or an xprt, and connect to the server.

struct rpc_xprt *xprt;
struct rpc_clnt *clnt;

xprt = xprt_create_proto( IPPROTO_UDP, addr, NULL );

clnt = rpc_create_client( xprt, hostname, &rpc_device_clnt_program,
                          RPC_DEVICE_VERSION, RPC_AUTH_NULL );

The client can then call the function on the server using rpc_call.

struct device_readargs args;
struct device_readres  res;

rpc_call( clnt, DEVICEPROC_READ, &arg, &res, 0 );

Kernel Symbols

While developing the code for this project, a major difficulty was coming to terms with the system for handling kernel level symbols. This was made all the more difficult by the fact that documentation on this system is not available. This appendix is a copy of the documentation of the system that I wrote.

Exporting Symbols

By default, any global variables or functions defined in a module are exported to the kernel symbol table when the module is loaded. However, there are ways by which you may control which symbols are exported.

If you only require that none of the module's symbols be exported you can use the EXPORT_NO_SYMBOLS macro.

If however, you require that only some of your global symbols be exported you will need to use the EXPORT_SYMBOL macro to export it. If CONFIG_MODVERSIONS is turned on, a further step is required in the build process, but that will be explained later.

So How Does This Work?

A kernel module that explicitly exported symbols will have two special sections in its object file: the symbol table __ksymtab and the string table .kstrtab. When a symbol is exported by a module using EXPORT_SYMBOL, two things happen

When a module is loaded, this info is added to the kernels symbol table and these symbols are now treated like any of the kernel's exported symbols.

To take a peek at your module's symbol table do

$> objdump --disassemble -j __ksymtab sunrpc.o

or the string table do

$> objdump --disassemble -j .kstrtab sunrpc.o


CONFIG_MODVERSIONS is a notion thought up to make people's lives easier. In essence, what it is meant to achieve is that if you have a module you can attempt to load that module into any kernel, safe in the knowledge that it will fail to load if any of the kernel data structures, types or functions that the module uses have changed.

If your kernel is not compiled with CONFIG_MODVERSIONS enabled you will only be able to load modules that were compiled specifically for that kernel version and that were also compiled without MODVERSIONS enabled.

However, if your kernel is compiled with CONFIG_MODVERSIONS enabled you will be able to load a module that was compiled for the same kernel version with MODVERSIONS turned off. But - here's the important part folks - you will also be able to load any modules compiled with MODVERSIONS turned on, as long as the kernel API that the module uses hasn't changed.

So How Does This Work?

When CONFIG_MODVERSIONS is turned on, a special piece of versioning info is appended to every symbol exported using EXPORT_SYMBOL.

This versioning info is calculated using the genksyms command whose man page has this to say about how the info is calculated

``When a symbol table is found in the source, the symbol will be expanded to its full definition, where all struct's, unions, enums and typedefs will be expanded down to their basic part, recursively. This final string will then be used as input to a CRC algorithm that will give an integer that will change as soon as any of the included definitions changes, for this symbol.

The version information in the kernel normally looks like: symbol_R12345678, where 12345678 is the hexadecimal representation of the CRC.''

What this means is that the versioning info is calculated in such a way that it will only change when the definition of that symbol changes.

The versioning string is appended by the use of a #define in modversions.h for every exported symbol. The #define usually winds up looking something like this (simplified)

#define printk printk_R1b7d4074

What this does is to effectively get rid of the function `printk' - alas, poor printk - and replace it with the much more handsome `printk_R1b7d4074'. If you have a look at modversions.h you'll notice that it just includes loads of .ver files. These are generated using a command similar to

gcc -E -D__GENKSYMS__ ${c-file} |\ 
        genksyms -k ${ver} > ${ver-file}

Notice that the c file is first passed through the c preprocessor before being passed to genksyms. This is to collapse all macros and stuff beforehand.

What does this mean for modules?

When modules are being compiled for a kernel with CONFIG_MODVERSIONS turned on, the header file linux/modversions.h must be included at the top of every c file. This you be an awful pain to do, so we just do it with the gcc flag -include.

             -include $(HPATH)/linux/modversions.h

The extra MODVERSIONS flag is used to indicate that this is a module being compiled with CONFIG_MODVERSIONS turned on as opposed to the kernel being compiled with CONFIG_MODVERSIONS enabled.


... schemes1.1
A kernel-level system is one which was developed so it may be used within the lowest level of the operating system - the kernel
... Ethernet2.1
The term channel-bonded Ethernet refers to the striping of network traffic across two or more Ethernets, hence increasing the bandwidth available to the cluster.
... space2.2
A global PID space lets you see all the processes running on the cluster with ps.
... DIPC2.3
DIPC allows you to use sysv shared memory, semaphores and wait-queues across the cluster
... C++'4.1
Michi Henning. `What if the internet were built using CORBA?'. http://www.ntlug.org/~cbbrowne/corba.html
The portmapper service runs on port number 110.
... object6.1
Such an object is known as a `Factory' object.
... developer7.1
Indeed, when writing the code for this project, I found it neccessary on several occasions to use a packet-sniffer to obtain the UDP packets and decode these, byte-by-byte, on paper with the XDR and RPC specfications beside me.
... minimum7.2
Using AUTH_NULL authentication.
... around7.3
Because a CORBA request comprises of quite a number of variable length fields, the minimum header length is not an accureate representation of the message's overhead.
... document10.1
See Appendix D.
... document10.2

Mark McLoughlin 2001-05-10