CentOS/RHEL/SL 6: root filesystem on tmpfs

There are several scenarios where conventional hard drives are not really needed. Examples are HPC cluster nodes, virtualization nodes, home theater streaming PCs, silent desktops, internet cafés and embedded systems. Hard drives tend to fail, they are slow, they consume power, they generate heat and noise, and they are quite expensive if you need/want something faster and more reliable than SATA.

This post will show how to run CentOS 6 directly from tmpfs backed by memory, without using the (standard) 512 MB writable overlay. The procedure should be similar for RHEL and Scientific Linux 6.

The resulting boot process will be:

  1. Boot a node off a PXE enabled DHCP server.
  2. Chainload into iPXE.
  3. Download vmlinuz and a rather large initrd containing the entire filesystem over ftp/http(s). Try to avoid tftp when downloading the initrd because of its file size limitation and slow transfer speeds.
  4. Once downloaded, the kernel will start and the initrd will be mounted.
  5. The modified dracut scripts in the initrd will create a tmpfs partition in memory with the same size as your filesystem image included in the initrd.
  6. Your entire filesystem image will be copied to the tmpfs partition and attached to a loop device.
  7. This loop device will be used as the new root device, and the boot process continues as usual.

This is a screenshot from an ongoing boot process:

Now to the procedure:

First, create a custom kickstart file. I’ve included the specialties below:

bootloader --location=mbr --append="toram"
clearpart --all
firstboot --disabled
install
lang en_US.UTF-8
network --bootproto dhcp --device eth0 --onboot yes
part / --fstype=ext4 --size=2048
reboot
zerombr

%packages
patch

%post
cat > /etc/fstab << END
tmpfs      /         tmpfs   defaults         0 0
devpts     /dev/pts  devpts  gid=5,mode=620   0 0
tmpfs      /dev/shm  tmpfs   defaults         0 0
proc       /proc     proc    defaults         0 0
sysfs      /sys      sysfs   defaults         0 0
END

# The patch is base64 encoded to avoid having to escape it manually.
cat > /root/dmsquash-live-root.base64 << EOF_patch
MjFhMjIKPiBnZXRhcmcgdG9yYW0gJiYgdG9yYW09InllcyIKMTM0YzEzNSwxMzgKPCAgICAgZG9f
bGl2ZV9mcm9tX2Jhc2VfbG9vcAotLS0KPiAgICAgIyBDcmVhdGUgb3ZlcmxheSBvbmx5IGlmIHRv
cmFtIGlzIG5vdCBzZXQKPiAgICAgaWYgWyAteiAiJHRvcmFtIiBdIDsgdGhlbgo+ICAgICAgICAg
ZG9fbGl2ZV9mcm9tX2Jhc2VfbG9vcAo+ICAgICBmaQoxNjNjMTY3LDIxMwo8ICAgICBkb19saXZl
X2Zyb21fYmFzZV9sb29wCi0tLQo+ICAgICAjIENyZWF0ZSBvdmVybGF5IG9ubHkgaWYgdG9yYW0g
aXMgbm90IHNldAo+ICAgICBpZiBbIC16ICIkdG9yYW0iIF0gOyB0aGVuCj4gICAgICAgICBkb19s
aXZlX2Zyb21fYmFzZV9sb29wCj4gICAgIGZpCj4gZmkKPiAKPiAjIEkgdGhlIGtlcm5lbCBwYXJh
bWV0ZXIgdG9yYW0gaXMgc2V0LCBjcmVhdGUgYSB0bXBmcyBkZXZpY2UgYW5kIGNvcHkgdGhlIAo+
ICMgZmlsZXN5c3RlbSB0byBpdC4gQ29udGludWUgdGhlIGJvb3QgcHJvY2VzcyB3aXRoIHRoaXMg
dG1wZnMgZGV2aWNlIGFzCj4gIyBhIHdyaXRhYmxlIHJvb3QgZGV2aWNlLgo+IGlmIFsgLW4gIiR0
b3JhbSIgXSA7IHRoZW4KPiAgICAgYmxvY2tzPSQoIGJsb2NrZGV2IC0tZ2V0c3ogJEJBU0VfTE9P
UERFViApCj4gCj4gICAgIGVjaG8gIkNyZWF0ZSB0bXBmcyAoJGJsb2NrcyBibG9ja3MpIGZvciB0
aGUgcm9vdCBmaWxlc3lzdGVtLi4uIgo+ICAgICBta2RpciAtcCAvaW1hZ2UKPiAgICAgbW91bnQg
LW4gLXQgdG1wZnMgLW8gbnJfYmxvY2tzPSRibG9ja3MgdG1wZnMgL2ltYWdlCj4gCj4gICAgIGVj
aG8gIkNvcHkgZmlsZXN5c3RlbSBpbWFnZSB0byB0bXBmcy4uLiAodGhpcyBtYXkgdGFrZSBhIGZl
dyBtaW51dGVzKSIKPiAgICAgZGQgaWY9JEJBU0VfTE9PUERFViBvZj0vaW1hZ2Uvcm9vdGZzLmlt
Zwo+IAo+ICAgICBST09URlNfTE9PUERFVj0kKCBsb3NldHVwIC1mICkKPiAgICAgZWNobyAiQ3Jl
YXRlIGxvb3AgZGV2aWNlIGZvciB0aGUgcm9vdCBmaWxlc3lzdGVtOiAkUk9PVEZTX0xPT1BERVYi
Cj4gICAgIGxvc2V0dXAgJFJPT1RGU19MT09QREVWIC9pbWFnZS9yb290ZnMuaW1nCj4gCj4gICAg
IGVjaG8gIkl0J3MgdGltZSB0byBjbGVhbiB1cC4uICIKPiAKPiAgICAgZWNobyAiID4gVW1vdW50
aW5nIGltYWdlcyIKPiAgICAgdW1vdW50IC1sIC9pbWFnZQo+ICAgICB1bW91bnQgLWwgL2Rldi8u
aW5pdHJhbWZzL2xpdmUKPiAKPiAgICAgZWNobyAiID4gRGV0YWNoICRPU01JTl9MT09QREVWIgo+
ICAgICBsb3NldHVwIC1kICRPU01JTl9MT09QREVWCj4gCj4gICAgIGVjaG8gIiA+IERldGFjaCAk
T1NNSU5fU1FVQVNIRURfTE9PUERFViIKPiAgICAgbG9zZXR1cCAtZCAkT1NNSU5fU1FVQVNIRURf
TE9PUERFVgo+ICAgICAKPiAgICAgZWNobyAiID4gRGV0YWNoICRCQVNFX0xPT1BERVYiCj4gICAg
IGxvc2V0dXAgLWQgJEJBU0VfTE9PUERFVgo+ICAgICAKPiAgICAgZWNobyAiID4gRGV0YWNoICRT
UVVBU0hFRF9MT09QREVWIgo+ICAgICBsb3NldHVwIC1kICRTUVVBU0hFRF9MT09QREVWCj4gCj4g
ICAgIGVjaG8gIlJvb3QgZmlsZXN5c3RlbSBpcyBub3cgb24gJFJPT1RGU19MT09QREVWLiIKPiAg
ICAgZWNobwo+IAo+ICAgICBsbiAtcyAkUk9PVEZTX0xPT1BERVYgL2Rldi9yb290Cj4gICAgIHBy
aW50ZiAnL2Jpbi9tb3VudCAtbyBydyAlcyAlc1xuJyAiJFJPT1RGU19MT09QREVWIiAiJE5FV1JP
T1QiID4gL21vdW50LzAxLSQkLWxpdmUuc2gKPiAgICAgZXhpdCAwCjE2OWMyMTksMjIxCjwgICAg
IGVjaG8gIjAgJCggYmxvY2tkZXYgLS1nZXRzeiAkQkFTRV9MT09QREVWICkgc25hcHNob3QgJEJB
U0VfTE9PUERFViAkT1NNSU5fTE9PUERFViBwIDgiIHwgZG1zZXR1cCBjcmVhdGUgLS1yZWFkb25s
eSBsaXZlLW9zaW1nLW1pbgotLS0KPiAgICAgaWYgWyAteiAiJHRvcmFtIiBdIDsgdGhlbgo+ICAg
ICAgICAgZWNobyAiMCAkKCBibG9ja2RldiAtLWdldHN6ICRCQVNFX0xPT1BERVYgKSBzbmFwc2hv
dCAkQkFTRV9MT09QREVWICRPU01JTl9MT09QREVWIHAgOCIgfCBkbXNldHVwIGNyZWF0ZSAtLXJl
YWRvbmx5IGxpdmUtb3NpbWctbWluCj4gICAgIGZpCg==
EOF_patch

cat /root/dmsquash-live-root.base64 | base64 -d > /root/dmsquash-live-root.patch

patch /usr/share/dracut/modules.d/90dmsquash-live/dmsquash-live-root /root/dmsquash-live-root.patch

ls /lib/modules | while read kernel; do
  echo " > Update initramfs for kernel ${kernel}"
  initrdfile="/boot/initramfs-${kernel}.img"

  /sbin/dracut -f $initrdfile $kernel
done
%end

%post --nochroot

echo "Copy initramfs outside the chroot:"
ls $INSTALL_ROOT/lib/modules | while read kernel; do
  src="$INSTALL_ROOT/boot/initramfs-${kernel}.img"
  dst="$LIVE_ROOT/isolinux/initrd0.img"
  echo " > $src -> $dst"
  cp -f $src $dst
done
%end

Explaination: The post script will apply a patch to /usr/share/dracut/modules.d/90dmsquash-live/dmsquash-live-root before regenerating the initramfs. This patch will add support for the ‘toram’ boot parameter. Then, the initramfs is being copied to the isolinux directory outside the filesystem image.

Second, use livecd-creator and livecd-iso-to-pxeboot from the livecd-tools package to convert the kickstart file into a bootable vmlinuz and initrd:

$ sudo livecd-creator --config=centos6.ks fslabel=centos6
$ sudo livecd-iso-to-pxeboot centos6.iso

The commands above will create tftpboot/vmlinuz0 and tftpboot/initrd0.img. Put these files on your boot server and create a suitable PXE configuration. livecd-iso-to-pxeboot will create tftpboot/pxelinux.cfg/default which can be used as a template.

Now you are ready to boot one or multiple CentOS 6 in-memory instances over the network!

Another screenshot:

Feature request upstream.

Posted in sysadm | Leave a comment

KVM with iPXE in RHEL 6

A while ago I discovered the amazing iPXE project. It is a complete PXE implementation with lots of nifty features, based on the gPXE project. Redhat ships the gPXE firmware for qemu and KVM, and you might want to use iPXE instead as the iPXE project currently seems to be more active. The major features (copied from ipxe.org):

  • boot from a web server via HTTP
  • boot from an iSCSI SAN
  • boot from a Fibre Channel SAN via FCoE
  • boot from an AoE SAN
  • boot from a wireless network
  • boot from a wide-area network
  • boot from an Infiniband network
  • control the boot process with a script

First, download the source code:

espen@luft:~$ mkdir ~/git
espen@luft:~$ cd ~/git
espen@luft:~/git$ git clone git://git.ipxe.org/ipxe.git
Cloning into ipxe...
remote: Counting objects: 33376, done.
remote: Compressing objects: 100% (9193/9193), done.
remote: Total 33376 (delta 24642), reused 30782 (delta 22666)
Receiving objects: 100% (33376/33376), 8.02 MiB | 1.94 MiB/s, done.
Resolving deltas: 100% (24642/24642), done.
espen@luft:~/git$ cd ipxe/
espen@luft:~/git/ipxe$

Then change the general configuration file (src/config/general.h) to suit your needs. Use the #define and #undef to activate and deactivate various features such as VLAN support, DHCP support, etc. Below is a small part of the header file for you to see.

[...]
#define IWMGMT_CMD   /* Wireless interface management commands */
#define FCMGMT_CMD   /* Fibre Channel management commands */
#define ROUTE_CMD    /* Routing table management commands */
#define IMAGE_CMD    /* Image management commands */
#define DHCP_CMD     /* DHCP management commands */
#define SANBOOT_CMD  /* SAN boot commands */
#define LOGIN_CMD    /* Login command */
#undef  TIME_CMD     /* Time commands */
#undef  DIGEST_CMD   /* Image crypto digest commands */
#undef  LOTEST_CMD   /* Loopback testing commands */
#undef  VLAN_CMD     /* VLAN commands */
#undef  PXE_CMD      /* PXE commands */
#undef  REBOOT_CMD   /* Reboot command */
[...]

Now it’s time compile the firmware.

espen@luft:~/git/ipxe$ cd src/
espen@luft:~/git/ipxe/src$ make bin/virtio-net.rom
  [DEPS] arch/i386/drivers/net/undirom.c
  [DEPS] arch/i386/drivers/net/undipreload.c
  [DEPS] arch/i386/drivers/net/undionly.c
  [DEPS] arch/i386/drivers/net/undinet.c
[...]
  [BIN] bin/virtio-net.rom.bin
  [ZINFO] bin/virtio-net.rom.zinfo
  [ZBIN] bin/virtio-net.rom.zbin
  [FINISH] bin/virtio-net.rom
[...]
espen@luft:~/git/ipxe/src$

The firmware compiled successfully, and it is ready to use. Log onto the RHEL 6 node, and verify that you have installed the package gpxe-roms-qemu (qemu-kvm currently depends on gpxe-roms-qemu). The directory /usr/share/gpxe/ contains the gPXE boot roms from this package.

To use your custom iPXE boot firmware instead, you can build a new rpm package that contains the new rom – or you can simply replace /usr/share/gpxe/virtio-net.rom [gPXE] with your ~/git/ipxe/src/bin/virtio-net.rom [iPXE]. As least you will have iPXE boot firmware until the qemu-roms-qemu package is updated ;)

Make sure that your virtual machines are using the virtio network device driver, and you are all set:

[...]
<interface type='bridge'>
  [...]
  <model type='virtio'/>
</interface>
[...]

Your virtual machines will now be booted using the iPXE boot firmware. Have a look at the iPXE scripting documentation for more inspiration!

Posted in sysadm | Leave a comment

Workaround of the day: PXE as primary boot dev on a dl360 g7

The problem: How to set PXE as the preferred boot device on a default HP dl360 g7 from Linux?

The key words here are “from Linux”. No iLO tricks are allowed. Of course, the simple answer should be: “use ipmitool!”. However, ipmitool (I’m currently using ipmitool-1.8.11-6.el6.x86_64 in Scientific Linux 6) doesn’t quite seem to do the trick here. The following have been tried without success:

[root@localhost ~]# ipmitool chassis bootdev pxe
Set Boot Device to pxe
[root@localhost ~]# reboot
[root@localhost ~]# ipmitool chassis bootparam set bootflag force_pxe
Set Boot Device to force_pxe
[root@localhost ~]# reboot
[root@localhost ~]# ipmitool chassis bootdev pxe clear-cmos=yes
Set Boot Device to pxe
[root@localhost ~]# reboot

It is nice to know that the dl360 g7 will PXE boot if it is unable to boot from the local hard drive. This behaviour can be used as a (not-so-pretty-ok-I-admit-it-is-very-very-VERY-ugly-but-it-works-for-me type of) workaround:

[root@localhost ~]# modprobe ipmi_si
[root@localhost ~]# modprobe ipmi_devintf
[root@localhost ~]# alias reinstall="/bin/dd if=/dev/zero \
of=/dev/sda bs=512 count=1 2>/dev/null ; sync ; \
ipmitool chassis bootparam set bootflag force_pxe \
>/dev/null ; reboot"
[root@localhost ~]# reinstall

(The alias is called reinstall because booting from PXE will, in this setup, reinstall the node.)

Good luck!

Posted in sysadm | Leave a comment

munincollector-ng

Munincollector-ng is a perl script that collects graphs from multiple munin installations to display them in one page. A scenario where this is helpful is when you have (too) many munin clients on (too) many munin masters, and you want to look through some of the graphs – i.e. the Disk usage in percent (aka df) plugin – without spending/wasting too much time browsing through the less important graphs.

It consists of one perl script and one configuration file. It is being executed regularly by cron. At each run, it iterates through the configuration file; downloads the graphs to a local directory and generates an html file:

Below is some example configuration that will gather the week and month graphs from the df plugin from four separate munin masters (three without authentication and one with authentication). The graphs will be downloaded to /var/www/munincollector-ng/:

# General configuration
graph.plugin df
graph.type week month
graph.log /var/log/munincollector-ng.log
graph.dir /var/www/munincollector-ng

# Configuration per munin master you want to collect graphs from.
# The format is: <id>.<option> <value>

# Three munin installations with no authentication
uio.url http://munin.ping.uio.no
foo.url http://foo.com/munin/
bar.url http://bar.com/munin/

# One munin master that requires authentication
baz.url http://baz.com/munin/
baz.realm Munin
baz.username user1
baz.password pass1
baz.netloc baz.com:80

An example cron job that will execute the script once per day (make sure user have write permissions in /var/www/munincollector-ng/):

8 8 * * * user /usr/local/bin/munincollector-ng -c /etc/munincollector-ng/example.conf

The script is available from github.

PS: Put the logo.png and style.css from your /etc/munin/templates/ directory into /var/www/munincollector-ng/ to make it look a bit nicer.

Posted in sysadm | 2 Comments