Repairing a broken grub.conf in CentOS

Today I got an error during updating Hyper-V kernel modules on CentOS 6.3. The upgrading process removes the already loaded kernel modules in a running system. This results in a kernel oops and a broken init ramdisk for the currently running kernel. Sometimes it happens that a kernel oops occures during writing the new grub configuration. On the next boot grub cannot find any configuration files and falls back to the grub shell. You will get a prompt like this:

grub >

Experts will be able to boot the system typing the commands for grub directly in the prompt and boot the system manually. But if you entered the lines and the system is up and running again, you won’t find any grub configuration.

On CentOS you are not able to generate a grub configuration from scratch via any scripts or tools like the Debian update-grub. CentOS itself uses grubby to generate the kernel entries. But grubby needs a template from which it can generate the entries. Last but not least this template will be generated by reading an existing grub entry. This approach ends up in a worst case scenario if you have misconfigured your /boot/grub/grub.conf, /etc/grub.conf or /boot/grub/menu.lst (the both last entries are only symlinks to /boot/grub/grub.conf) and you now try to reset your configuration. And again big trouble, if generating the grub.conf failed – for example during a kernel crash. In my situation it was a kernel crash.

It is absolutely impossible to regenerate a grub.conf from scratch with any of the tools delivered by CentOS. My solution:

  1. boot your system via Install-Disk or by grub command line prompt
  2. create an empty new /boot/grub/grub.conf
  3. add the next code snippet to your grub.conf
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title CentOS (2.6.32-279.22.1.el6.x86_64)
  root (hd0,0)
  kernel /vmlinuz-2.6.32-279.22.1.el6.x86_64 ro root=/dev/sda3
  initrd /initramfs-2.6.32-279.22.1.el6.x86_64.img

N O T E:
I have a separated /boot partition on my systems. In standard configuration delivered by CentOS /boot and / will be on the same partition. In this case, the path to kernel and initrd will start with /boot/vmlinuz... and /boot/initramfs... . The root partition mostly will be root=/dev/sda1.

Try to boot your system with your manually built grub.conf. If anything works fine you can add new boot entries by CentOS’ tool grubby. For example:

root@host:~ $ grubby --add-kernel="/boot/vmlinuz-2.6.32-279.22.1.el6.x86_64"\
--initrd="/boot/initramfs-2.6.32-279.22.1.el6.x86_64.img"\
--title="CentOS (2.6.32-279.22.1.el6.x86_64)" --copy-default --make-default

The tool grubby will replace the /dev/sda? device file with the UUID string of the partition.
You can use the next line to generate an entry for each kernel image in /boot/:

for kernel in /boot/vmlinuz-*; do \
version=`echo $kernel | awk -F'vmlinuz-' '{print $NF}'`; \
grubby --add-kernel="/boot/vmlinuz-${version}" \
--initrd="/boot/initramfs-${version}.img" \
--title="CentOS (${version})" \
--copy-default --make-default; \
done

You should check the /etc/grub.conf for duplicate entries or maybe you will resort the boot order. Reboot your system to check if anything works fine again.

Advertisements

Improving performance of Jenkins builds

For building our software we use the continuous integration build server solution Jenkins CI. In the last week we got problems with the build duration. After analyzing the build progress we found some hints for improving the build performance.

Bottlenecks during compiling

The first hint was the duration time of compiling the source code. Before we added a new HDD, build time over all was 11 minutes. After them build time increased to 15 minutes! Another build job which only compiled an artifact without executing any tests needed 3 minutes more – in comparison with the older configuration.
So we analyzed the complete build progress step by step and found a bottleneck. Our software using grails during build progress. For compiling the source code into java bytecode, jenkins writes into .grails in the home directory. We moved the complete home onto a SSD and got back the lost 3 minutes.

Bottlenecks during testing

Another bottleneck is the working directory in Jenkins called “workspace”. Each job gets its own workspace. During generating the artifacts the complete source code repository will checked out into this workspace. Any test result, compiled class and the artifact itself will be buffered in the workspace directory. After build was successful the artifacts and the test results will be copied to jobs directory. If it is possible to mount the workspace of a build job on a SSD you will get any better performance.

The myth of SSD vs. tmpfs

Maybe in your opinion, it could be much better to mount the workspace and .grails directory and any other highly used directories on a tmpfs. We tried this option and the result is: Yes it will decrease the build time. But not as much as you think. Mounting the workspace and the .grails directory on a tmpfs decreased the build time at maximum 10 seconds in comparison to a SSD. The improvement to mount these directories on a tmpfs would be in reducing the write cycles on the SSD which increases the lifetime of a SSD.

Resume

It is possible to impove the duration time of a Jenkins build. The first approach to improve the duration time of a build is to get out in which directories will jenkins write/delete/copy files. In our example we improved the build time by 33% moving the workspace and all other directories on a SSD.
Another possibility to increase the build time could be a faster CPU. This option would need a build server (maybe a jenkins slave) with a desktop CPU. Because the Intel Xeons are much slower than the most high end desktop CPUs. We didn’t tested this, but one of our developer workstations with an Intel i7 CPU only needs 80% of the time for building the same artifact.

Solved problems updating perl-XML-SAX-0.96-7.el6.noarch on CentOS 6

Today yum chase its own tail during the package update of one of our servers.

Updating the packages needed the installation of any new packages because of dependencies. The transaction check breaks with:

Transaction Check Error:
  file /usr/share/man/man3/XML::SAX::Base.3pm.gz conflicts between attempted installs of perl-XML-SAX-0.96-7.el6.noarch and perl-XML-SAX-Base-1.04-1.el6.rf.noarch
  file /usr/share/man/man3/XML::SAX::Exception.3pm.gz conflicts between attempted installs of perl-XML-SAX-0.96-7.el6.noarch and perl-XML-SAX-Base-1.04-1.el6.rf.noarch

No way out. Google didn’t know anything about.

The solution of this problem is removing any packages and reinstall them manually. In my situation I removed both packages which made trouble: perl-XML-SAX-0.96-7.el6.noarch and perl-XML-SAX-Base-1.04-1.el6.rf.noarch. I reinstalled perl-XML-SAX-0.96-7.el6.noarch again and any transaction check works and yum installed the updates. I didn’t reinstalled the package perl-XML-SAX-Base-1.04.1.el6.rf.noarch because it seems to be no longer a dependency of any perl-SAX package.