|
Last Updated: 15 December, 2008 With the release of libhugetlbfs 2.1, a number of helper utilities were added to make the lifes of hugepage users easier. In the past, configuring the system and running applications required the use of a number of proc entries and setting of environment variables but the utilities significantly reduce the complexity and number of proc entries the user must be aware of. This document gives an overview of the utilities and examples of how they can be used. It is based on libhugetlbfs 2.1.1 which includes some minor bug fixes for 2.1. It will also cover what actions still require direct use of proc tunables. It finishes by showing how the utilities can be used to run the sysbench benchmark with huge pages.
Setting Up
$ wget \
http://libhugetlbfs.ozlabs.org/releases/libhugetlbfs-2.1.1.tar.gz
$ tar -zxf libhugetlbfs-2.1.1.tar.gz
$ cd libhugetlbfs-2.1.1/
$ make PREFIX=$HOME/opt/libhugetlbfs
$ make PREFIX=$HOME/opt/libhugetlbfs install
$ export PATH=$HOME/opt/libhugetlbfs/bin:$PATH
$ export MANPATH=`manpath`:$HOME/opt/libhugetlbfs/share/man
Next we want to configure mount points for each of the pages supported by the
system. Assuming the mount points are not configured already, the easiest
method is to run this bash script which will
create one mount per pagesize under /var/lib/hugetlbfs/mounts.
Configuring Huge Page Pools and Overcommit Values The pool sizes can be viewed using hugeadm --pool-list. For each hugepage size supported by the system, it will display the minimum, current and maximum number of hugepages for that size in the system and whether it is the default hugepage size or not. The minimum number is the static size of the hugepage pool. Current counts the number of pages in use, be it because they are in the hugepage pool or used by an application. Hence, the "current" value can be somewhere between minimum and maximum depending on application usage. Maximum is the total number of hugepages that can be in use. The system will resize the pool dynamically depending on application demand between the "minimum" and "maximum" value. To configure the pool sizes, use --pool-pages-min to set the minimum values and --pool-pages-max for the maximum value. Both can be set to either a fixed value or adjusted by a delta as described in the Shell scripts sometimes need to know what pagesizes are supported. By using hugeadm --page-sizes, the administrator can list what page sizes are currently supported and have a pool configured. hugeadm --page-sizes-all or pagesize -H will list all huge page sizes supported by the system although the pagesize utility will also include the base page size if asked to list all sizes with -a. The utilities are not tied to the kernel version and should behave sensibly on older kernels even when features like multiple hugepages and dynamic hugepage pool resizing are not available.
Running Applications with Huge Pages --text, --data and --bss back their respective sections with hugepages assuming the application has been relinked to align these sections to a hugepage-boundary. Optionally a specific hugepage size can be specified here allowing text/data to be backed by 64K pages and heap backed by 16M pages on POWER 5+ for example. --heap uses the glibc malloc morecore hook to back large malloc() requests with hugepages and also can specify a hugepage size. --shm overrides calls to shmget() to back them with hugepages but unlike the other options, it cannot specify a hugepage size. Some options supported by libhugetlbfs are not available via hugectl. For example, applications cannot be forced to back text/data with hugepages because they are not aligned. This is normally set with the HUGETLB_FORCE_ELFMAP environment variable. Similarly, heap shrinking is not set via the hugectl interface. However, in the event other libhugetlbfs features are used, hugectl has a --dry-run switch which displays the values environment variables are set to. This can be cut and pasted into a shell script and modified as necessary.
Setting Application Defaults
Configuring and Running Sysbench First configure the hugepage pools. The postgres server configuration was configured to use about 730MB of shared pool memory and other buffers. Lets configure the pool to be 730MB in size statically but can grow to 1024MB as the server may use more due to the administrators poor understanding of how postgres consumes memory. This is based on a POWER6 machine so the hugepage size by default is 16MB.
$ hugeadm --pool-pages-min 16MB:45
$ hugeadm --pool-pages-max 16MB:64
$ hugeadm --pool-list
Size Minimum Current Maximum Default
65536 0 0 0
16777216 45 45 64 *
Now we can see that 45 huge pages are committed and up to 64 can be used. Next
we start the server to use shared memory backed by huge pages. Note that
as we are using huge pages, we have to up the locked memory limit. Note
also that su -p is specified to preserve the environment. The
final important note is that we do not back the heap of the server using
huge pages. As postgres makes heavy use of fork, there may be excessive
hugepage faulting for COW by the children and with dynamic resizing, this
could cause performance difficulties. In contrast, mysql would have used
--heap due to its threaded nature. In the case of this machine,
postgres is running as nobody but it might be different for you.
$ ulimit -l $((1024*1024))
$ su -p -s /bin/bash nobody -c "hugectl --shm pg_ctl
-D $HOME/opt/postgres-8.3.4/data/
-l $HOME/opt/postgres-8.3.4/logs/logfile start"
$ grep Huge /proc/meminfo
HugePages_Total: 45
HugePages_Free: 42
HugePages_Rsvd: 30
HugePages_Surp: 0
Hugepagesize: 16384 kB
Regrettably, the /proc/meminfo file had to be used to confirm
hugepage usage in the application but you can see that 3 hugepages were
used and a further 30 may be faulted later by postgres. Ideally hugeadm
--pool-list would have been used but as only the static pool is being
used, the Current value would not have changed. This will be addressed in
a future release. Now we are ready to run sysbench itself.
$ su -p -s /bin/bash nobody -c \
"psql template1 -c 'CREATE DATABASE pgtest;'"
$ su -p -s /bin/bash nobody -c \
"psql template1 -c 'CREATE ROLE sbtest with LOGIN;'"
$ sysbench --max-time=30 --max-requests=2000000 --db-driver=pgsql \
--pgsql-db=pgtest --test=oltp --oltp-point-selects=256 \
--oltp-test-mode=complex --oltp-table-size=2000000 prepare
$ sysbench --max-time=30 --max-requests=2000000 --db-driver=pgsql \
--pgsql-db=pgtest --test=oltp --oltp-point-selects=256 \
--oltp-test-mode=complex --oltp-table-size=2000000 \
--num-threads=16 run
Comparing against base pages, there was about a 3% improvement on this basic
run.
Conclusion |