Last Updated: 15 December, 2008

With the release of libhugetlbfs 2.1, a number of helper utilities were added to make the lifes of hugepage users easier. In the past, configuring the system and running applications required the use of a number of proc entries and setting of environment variables but the utilities significantly reduce the complexity and number of proc entries the user must be aware of. This document gives an overview of the utilities and examples of how they can be used. It is based on libhugetlbfs 2.1.1 which includes some minor bug fixes for 2.1. It will also cover what actions still require direct use of proc tunables. It finishes by showing how the utilities can be used to run the sysbench benchmark with huge pages.

Setting Up
First of all, install libhugetlbfs in your home directory and add the bin directory to your PATH and the manual pages to your MANPATH.

    $ wget \
        http://libhugetlbfs.ozlabs.org/releases/libhugetlbfs-2.1.1.tar.gz
    $ tar -zxf libhugetlbfs-2.1.1.tar.gz
    $ cd libhugetlbfs-2.1.1/
    $ make PREFIX=$HOME/opt/libhugetlbfs
    $ make PREFIX=$HOME/opt/libhugetlbfs install
    $ export PATH=$HOME/opt/libhugetlbfs/bin:$PATH
    $ export MANPATH=`manpath`:$HOME/opt/libhugetlbfs/share/man
Next we want to configure mount points for each of the pages supported by the system. Assuming the mount points are not configured already, the easiest method is to run this bash script which will create one mount per pagesize under /var/lib/hugetlbfs/mounts.

Configuring Huge Page Pools and Overcommit Values
In older kernels and distributions, /proc/sys/vm/nr_hugepages was used to set the fixed size of the hugepage pool and /proc/meminfo was used to monitor pool statistics. This was straight-forward but somewhat limited as the administrator needed to know exactly how many hugepages were required by the application. To alleviate this, recent kernels introduced a /proc/sys/vm/nr_overcommit_hugepages tunable that specifies the number of hugepages the system would allocate on demand if an application required it. In effect, this allows an administrator to say to the kernel "I know I need at least X huge pages, but I may need up to Y more, allocate them if you can". In 2.6.27, multiple hugepage sizes were introduced, each of which has one directory under /sys/kernel/mm/hugepages increasing the number of proc entries that must be manipulated to configure the system. This is where hugeadm and pagesize comes into play.

The pool sizes can be viewed using hugeadm --pool-list. For each hugepage size supported by the system, it will display the minimum, current and maximum number of hugepages for that size in the system and whether it is the default hugepage size or not. The minimum number is the static size of the hugepage pool. Current counts the number of pages in use, be it because they are in the hugepage pool or used by an application. Hence, the "current" value can be somewhere between minimum and maximum depending on application usage. Maximum is the total number of hugepages that can be in use. The system will resize the pool dynamically depending on application demand between the "minimum" and "maximum" value.

To configure the pool sizes, use --pool-pages-min to set the minimum values and --pool-pages-max for the maximum value. Both can be set to either a fixed value or adjusted by a delta as described in the Shell scripts sometimes need to know what pagesizes are supported. By using hugeadm --page-sizes, the administrator can list what page sizes are currently supported and have a pool configured. hugeadm --page-sizes-all or pagesize -H will list all huge page sizes supported by the system although the pagesize utility will also include the base page size if asked to list all sizes with -a.

The utilities are not tied to the kernel version and should behave sensibly on older kernels even when features like multiple hugepages and dynamic hugepage pool resizing are not available.

Running Applications with Huge Pages
The hugectl utility configures the environment for use by libhugetlbfs. By default, it uses the library it was installed with but --library-use-path to use the system library path and --library-path will use a specific path.

--text, --data and --bss back their respective sections with hugepages assuming the application has been relinked to align these sections to a hugepage-boundary. Optionally a specific hugepage size can be specified here allowing text/data to be backed by 64K pages and heap backed by 16M pages on POWER 5+ for example.

--heap uses the glibc malloc morecore hook to back large malloc() requests with hugepages and also can specify a hugepage size.

--shm overrides calls to shmget() to back them with hugepages but unlike the other options, it cannot specify a hugepage size.

Some options supported by libhugetlbfs are not available via hugectl. For example, applications cannot be forced to back text/data with hugepages because they are not aligned. This is normally set with the HUGETLB_FORCE_ELFMAP environment variable. Similarly, heap shrinking is not set via the hugectl interface. However, in the event other libhugetlbfs features are used, hugectl has a --dry-run switch which displays the values environment variables are set to. This can be cut and pasted into a shell script and modified as necessary.

Setting Application Defaults
Applications can be relinked with their text and data sections aligned to a hugepage boundary as described in the libhugetlbfs HOWTO. By default, these applications continue to use base pages and the HUGETLB_ELFMAP environment must be set for hugepages to be used as described in the libhugetlbfs manual page or using the hugectl utility. The hugeedit.txt utility can set whether text, data or both sections should be backed by hugepages by default or not.

Configuring and Running Sysbench
In this section, it will be shown how to run the sysbench benchmark using the postgres database. The assumption is made that sysbencn and postgres are already installed and configured. In addition, it assumes your mount points, environment, binary and library paths set appropriately for running the benchmark with small pages.

First configure the hugepage pools. The postgres server configuration was configured to use about 730MB of shared pool memory and other buffers. Lets configure the pool to be 730MB in size statically but can grow to 1024MB as the server may use more due to the administrators poor understanding of how postgres consumes memory. This is based on a POWER6 machine so the hugepage size by default is 16MB.

  $ hugeadm --pool-pages-min 16MB:45
  $ hugeadm --pool-pages-max 16MB:64
  $ hugeadm --pool-list
      Size  Minimum  Current  Maximum  Default
     65536        0        0        0         
  16777216       45       45       64        *
Now we can see that 45 huge pages are committed and up to 64 can be used. Next we start the server to use shared memory backed by huge pages. Note that as we are using huge pages, we have to up the locked memory limit. Note also that su -p is specified to preserve the environment. The final important note is that we do not back the heap of the server using huge pages. As postgres makes heavy use of fork, there may be excessive hugepage faulting for COW by the children and with dynamic resizing, this could cause performance difficulties. In contrast, mysql would have used --heap due to its threaded nature. In the case of this machine, postgres is running as nobody but it might be different for you.
  $ ulimit -l $((1024*1024))
  $ su -p -s /bin/bash nobody -c "hugectl --shm pg_ctl
      -D $HOME/opt/postgres-8.3.4/data/
      -l $HOME/opt/postgres-8.3.4/logs/logfile start"
  $ grep Huge /proc/meminfo
  HugePages_Total:    45
  HugePages_Free:     42
  HugePages_Rsvd:     30
  HugePages_Surp:      0
  Hugepagesize:    16384 kB
Regrettably, the /proc/meminfo file had to be used to confirm hugepage usage in the application but you can see that 3 hugepages were used and a further 30 may be faulted later by postgres. Ideally hugeadm --pool-list would have been used but as only the static pool is being used, the Current value would not have changed. This will be addressed in a future release. Now we are ready to run sysbench itself.
  $ su -p -s /bin/bash nobody -c \
       "psql template1 -c 'CREATE DATABASE pgtest;'"
  $ su -p -s /bin/bash nobody -c \
       "psql template1 -c 'CREATE ROLE sbtest with LOGIN;'"
  $ sysbench --max-time=30 --max-requests=2000000 --db-driver=pgsql \
      --pgsql-db=pgtest --test=oltp --oltp-point-selects=256 \
      --oltp-test-mode=complex --oltp-table-size=2000000 prepare
  $ sysbench --max-time=30 --max-requests=2000000 --db-driver=pgsql \
      --pgsql-db=pgtest --test=oltp --oltp-point-selects=256 \
      --oltp-test-mode=complex --oltp-table-size=2000000 \
      --num-threads=16 run
Comparing against base pages, there was about a 3% improvement on this basic run.

Conclusion
To conclude, hugepages as always when used in the right circumstances will improve performance. With kernel 2.6.27 and libhugetlbfs 2.1, using hugepages is more reliable and an easier experience. Suggestions on how to improve the library and utilities even more are always welcome on the libhugetlbfs-devel mailing list.