My old Ubuntu 12.04 server had serious performance problems with NFS exported home directories and the decision has been taken to use CentOS as the next server distro.
Introduction
My old Ubuntu server had serious performance problems with NFS exported home directories and the decision has been taken to use CentOS as the next server distro. The server had one HD with four partitions for the current system, previous system, tmp directories and swap, and a RAID6 array of four 1TB HDs, each with a single partition.
Revision history 2013/04/11 11:24 Created 2013/04/09 17:32 Added section for bind mounted home directories
Getting ready
Download netinstall, DVD1 and DVD2 ISOs:
wget http://www.mirrorservice.org/sites/mirror.centos.org/6.4/isos/i386/CentOS-6.4-i386-netinstall.iso wget http://www.mirrorservice.org/sites/mirror.centos.org/6.4/isos/i386/CentOS-6.4-i386-bin-DVD1.iso wget http://www.mirrorservice.org/sites/mirror.centos.org/6.4/isos/i386/CentOS-6.4-i386-bin-DVD2.iso
Burn netinstall ISO to CD, make DVDs available on a local webserver or on a USB drive.
Note down the server’s old network settings if they are not obtained via DHCP.
Shut the system down and disconnect RAID disks (just in case).
shutdown -P now
Installation
Boot from netinstall CD.
Press tab at boot menu and add ‘vnc’ boot option. This may be required since the standard console based anaconda installer does not always allow custom partitioning and the specification of other installation details; for these, the graphical installer is needed.
Select language, and keyboard type and layout, configure network connection.
Once the VNC server is up, connect via Remote Desktop Viewer running on another machine as instructed. On Ubuntu vinagre can be used for this.
Follow graphical installation, select basic server install type and include NFS server.
Once the installation is complete, reboot. At this point the server will be running in console mode without an X11 front-end. For all admin tasks, I prefer to connect to the server via ssh and run the various GUI admin tools from my desktop machine.
On the server terminal log in as root and enable X11 auth needed for ssh forwarding:
yum install xorg-x11-xau* exit
From a desktop machine connect to the server via ssh:
ssh -X root@server
Update the system:
yum update
If necessary, e.g. the kernel packages have been updated, shut the system down and re-boot, then log in again.
Configure yum to exclude kernel packages from future updates:
nano /etc/yum.conf
exclude = kernel*
Remember to update any excluded packages manually when convenient.
You can also use the system admin GUI for package management:
gpk-application
Set static network address
During installation network settings may have been obtained automatically by DHCP; if a static network configuration is desired, follow the steps below, replacing the various IP addresses with ones corresponding to your network.
system-config-network
; eth0 interface
Inet addr: 10.0.0.253
Mask: 255.0.0.0
Bcast: 10.255.255.255
; Router
Gateway: 10.0.0.254
; DNS
search your.domain
nameserver 10.0.0.254
Configure postfix
Modify postfix so that mail will get delivered to ‘proper’ external e-mail addresses.
cp /etc/postfix/main.cf /etc/postfix/main.cf.ori nano /etc/postfix/main.cf
Edit the following lines; replace relayhost with your ISP’s SMTP server.
myorigin = your.domain
inet_interfaces = all
relayhost = smtp.yourisp.com
service postfix restart
Additional useful mail admin commands:
Check mail queue:
mailq
Remove message from queue:
postsuper -d msgID
Send mail from the command line or from scripts:
mail mailx sendmail
Install APC UPS daemon
wget http://www.mirrorservice.org/sites/dl.fedoraproject.org/pub/epel/6/i386/apcupsd-3.14.10-1.el6.i686.rpm yum install apcupsd-3.14.10-1.el6.i686.rpm
Since I am using a SmartUPS with an APC ‘smart’ serial cable, I had to make the following changes:
cp /etc/apcupsd/apcupsd.conf /etc/apcupsd/apcupsd.conf.ori nano /etc/apcupsd/apcupsd.conf
UPSNAME ServerPS
UPSCABLE smart
UPSTYPE apcsmart
DEVICE /dev/ttyS0
NISIP 127.0.0.1
Change the e-mail address used for notifications:
cd /etc/apcupsd for id in apccontrol changeme commfailure commok offbattery onbattery ; do cp -i $id $id.ori ; sed -e 's/SYSADMIN=root/SYSADMIN=system.administrator@your.domain/' < $id.ori > $id ; done
service apcupsd restart
For an admin GUI to apcupsd, take a look at apcupsd-gui and apcupsd-cgi.
Enable SMART monitoring for HDDs
cp /etc/smartd.conf /etc/smartd.conf.ori nano /etc/smartd.conf
DEVICESCAN -n standby,7,q -m system.administrator@your.domain -s (S/../.././01|L/../15/./02)
service smartd restart
Configure automatic updates
yum install yum-cron cp -i /etc/sysconfig/yum-cron /etc/sysconfig/yum-cron.ori nano /etc/sysconfig/yum-cron
DOWNLOAD_ONLY=yes
MAILTO=system.administrator@your.domain
SYSTEMNAME="server"
service yum-cron restart
Add users
I prefer private group IDs, and manually specify UIDs and GIDs starting at 1000 to conform to the Ubuntu convention:
system-config-users
Activate RAID
Configure the RAID array:
nano /etc/mdadm.conf
ARRAY /dev/md0 level=raid6 num-devices=4 UUID=5f8a6291:38b5cf4e:dc0d88ba:780ef6a3 auto=md
devices=/dev/sdc1,/dev/sdd1,/dev/sde1,/dev/sdb1
MAILADDR system.administrator@your.domain
MAILFROM root.server@your.domain
Shut the system down, connect RAID disks and reboot
shutdown -P now
Once the server finished booting, log in again from your desktop machine connect to the server via ssh:
ssh -X root@server
The kernel should have picked up the RAID info and assemble the array automatically. The result should look like this:
ls -l /dev/md*
brw-rw----. 1 root disk 9, 0 Apr 11 21:59 /dev/md0
cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdb1[3] sdc1[0] sdd1[1] sde1[2] 1953522944 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU] unused devices:
mdadm --detail /dev/md0
/dev/md0: Version : 0.90 Creation Time : Fri Sep 2 17:16:11 2011 Raid Level : raid6 Array Size : 1953522944 (1863.02 GiB 2000.41 GB) Used Dev Size : 976761472 (931.51 GiB 1000.20 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Fri Apr 12 16:08:18 2013 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 5f8a6291:38b5cf4e:dc0d88ba:780ef6a3 Events : 0.4607673 Number Major Minor RaidDevice State 0 8 33 0 active sync /dev/sdc1 1 8 49 1 active sync /dev/sdd1 2 8 65 2 active sync /dev/sde1 3 8 17 3 active sync /dev/sdb1
Once the RAID array is up and running, mount it and make it available via NFS.
Configure md monitoring and data scrubbing
It is absolutely essential to implement a monitoring and data scrubbing regime for RAID arrays, otherwise errors can build up silently, which will result in catastrophic data loss.
The check is run automatically by cron; to modify the time or frequency of the check the cron job has to be edited:
nano /etc/cron.d/raid-checkTo set the parameters of the check:
nano /etc/sysconfig/raid-check
Settings used for checking all RAID volumes:
ENABLED=yes
CHECK=check
NICE=low
CHECK_DEVS=""
REPAIR_DEVS=""
SKIP_DEVS=""
The progress of the check can be seen by
cat /proc/mdstat
and the number of errors can be displayed by
cat /sys/block/md0/md/mismatch_cnt
If the count is not zero the problems will have to fixed manually.
Add more users
For some users on the system, the home directories reside on the RAID volume. One option is to use symbolic links in the /home directory pointing to the user home directories on the RAID volume:
# create new user 'user1' as above sudo mv /home/user1 /home/user1.local sudo ln -s /mnt/raid/homes/user1 /home/
Unfortunately this has the potential to break if changes occur to the directory structure on the RAID volume. Some programs remember the de-referenced paths for symbolic links. If the symbolic link is modified to account for changes in the directory structure, these programs will try to use the old paths that may no longer exist or may point to stale versions of the files.
A better option is to use bind mounts:
# create new user 'user1 as above sudo mv /home/user1 /home/user1.local sudo mkdir /home/user1 sudo nano /etc/fstab
/mnt/raid/homes/user1 /home/user1 none rw,bind 0 0
mount -a
Configure NFS exports
Configure idmapd, which will be needed by the NFS server; this has to match the idmapd configurations on the NFS clients:
nano /etc/idmapd.conf
Lines to change:
# set your own domain here, if id differs from FQDN minus hostname
# Domain = localdomain
Domain = hlan.your.domain
Create mount points in /mnt and in /exports, add fstab entries for the RAID volume and the bind mounts for NFS:
mkdir /mnt/raid mkdir -p /exports/{books,homes,music,opt,pictures,SW,video,VMs} nano /etc/fstab
#
# raid array - sdc1, sdd1, sde1, sdb1
#
UUID=1960fa99-593b-4601-99bf-d5064fdef53e /mnt/raid ext4 relatime 0 0
#
# bind mounts for NFSv4 exports
#
/mnt/raid/homes /exports/homes none rw,bind 0 0
/mnt/raid/music /exports/music none rw,bind 0 0
/mnt/raid/video /exports/video none rw,bind 0 0
/mnt/raid/VMs /exports/VMs none rw,bind 0 0
/mnt/raid/opt /exports/opt none rw,bind 0 0
/mnt/raid/SW /exports/SW none rw,bind 0 0
/mnt/raid/books /exports/books none rw,bind 0 0
/mnt/raid/pictures /exports/pictures none rw,bind 0 0
Mount it all:
mount -a
We are now ready to define our NFS exports:
nano /etc/exports
#
# NFSv4 exports
###############
#
/exports *.hlan.your.domain(ro,no_subtree_check,sync,no_root_squash,fsid=0)
#
# pc1
/exports/homes pc1.hlan.your.domain(rw,no_subtree_check,sync,no_root_squash,nohide)
/exports/opt pc1.hlan.your.domain(ro,no_subtree_check,sync,no_root_squash,nohide)
/exports/VMs pc1.hlan.your.domain(rw,no_subtree_check,sync,no_root_squash,nohide)
/exports/SW pc1.hlan.your.domain(ro,no_subtree_check,sync,no_root_squash,nohide)
/exports/books pc1.hlan.your.domain(ro,no_subtree_check,sync,no_root_squash,nohide)
/exports/music pc1.hlan.your.domain(ro,no_subtree_check,sync,no_root_squash,nohide)
/exports/video pc1.hlan.your.domain(ro,no_subtree_check,sync,no_root_squash,nohide)
/exports/pictures pc1.hlan.your.domain(ro,no_subtree_check,sync,no_root_squash,nohide)
#
# NFSv3 exports
###############
#
# network boot root file systems
# currently not used
#
#/netboot/rootfs/pc2 pc2.hlan.your.domain(rw,no_root_squash,no_subtree_check,async)
Activate NFS with the new shares:
service nfs restart
NTP
Specify ntp servers:
nano /etc/ntp.conf
and add the following lines
server uk.pool.ntp.org
server uk.pool.ntp.org
server uk.pool.ntp.org
Send log messages to separate log file instead of the default /var/log/messages:
nano /etc/sysconfig/ntpd
OPTIONS="-u ntp:ntp -l /var/log/ntpd -p /var/run/ntpd.pid -g"
touch /var/log/ntpd chown ntp.ntp /var/log/ntpd service ntpd restart
If you want to modify the verbosity of the log, take a look at the logconfig directive:
man ntp_misc
System maintenance tasks
Automatic checks on boot were causing long delays at seemingly always at worst possible time, so those checks were disabled. The RAID volume now has to be checked manually:
/etc/init.d/nfs stop umount /exports/* umount /mnt/raid/ fsck -f -C /dev/md0 mount -a /etc/init.d/nfs start
Updates are only downloaded automatically and have to be installed manually:
yum update
If the normally excluded kernel packages are also to be updated:
yum --disableexcludes=all update
Install rsyncd
For this particular server, rsync will only be used for making backups. In this case there is no real benefit in running an rsyncd service, if anything, rsync+ssh is preferable from the security point of view.
If the server was sharing, for example, a software repository mirror, there may be a performance advantage for rsyncd, but even then some other forms of access may still be preferable.
Configure services
The commands below were required on my system; for your installation, a different set of similar commands may be needed.
chkconfig abrtd off chkconfig abrt-ccpp off chkconfig irqbalance off chkconfig lvm2-monitor off chkconfig kdump off chkconfig haldaemon off chkconfig rpcgssd off chkconfig --level 2345 smartd on chkconfig --level 2345 apcupsd on chkconfig --level 2345 yum-cron on chkconfig --level 2345 ntpdate on chkconfig --level 2345 atd on chkconfig --level 2345 network on chkconfig --level 2345 certmonger on chkconfig --level 2345 netfs on chkconfig --level 345 ntpd on chkconfig --level 2 ntpd off chkconfig --level 345 nfs on chkconfig --level 2 nfs off chkconfig --level 345 nfslock on chkconfig --level 2 nfslock off
chkconfig --list | sort --key=3,7 | tee >(wc -l)
; off abrt-ccpp 0:off 1:off 2:off 3:off 4:off 5:off 6:off abrtd 0:off 1:off 2:off 3:off 4:off 5:off 6:off autofs 0:off 1:off 2:off 3:off 4:off 5:off 6:off cgconfig 0:off 1:off 2:off 3:off 4:off 5:off 6:off cgred 0:off 1:off 2:off 3:off 4:off 5:off 6:off firstboot 0:off 1:off 2:off 3:off 4:off 5:off 6:off haldaemon 0:off 1:off 2:off 3:off 4:off 5:off 6:off ip6tables 0:off 1:off 2:off 3:off 4:off 5:off 6:off ipsec 0:off 1:off 2:off 3:off 4:off 5:off 6:off iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off irqbalance 0:off 1:off 2:off 3:off 4:off 5:off 6:off kdump 0:off 1:off 2:off 3:off 4:off 5:off 6:off lvm2-monitor 0:off 1:off 2:off 3:off 4:off 5:off 6:off netconsole 0:off 1:off 2:off 3:off 4:off 5:off 6:off numad 0:off 1:off 2:off 3:off 4:off 5:off 6:off oddjobd 0:off 1:off 2:off 3:off 4:off 5:off 6:off psacct 0:off 1:off 2:off 3:off 4:off 5:off 6:off quota_nld 0:off 1:off 2:off 3:off 4:off 5:off 6:off rdisc 0:off 1:off 2:off 3:off 4:off 5:off 6:off restorecond 0:off 1:off 2:off 3:off 4:off 5:off 6:off rngd 0:off 1:off 2:off 3:off 4:off 5:off 6:off rpcgssd 0:off 1:off 2:off 3:off 4:off 5:off 6:off rpcsvcgssd 0:off 1:off 2:off 3:off 4:off 5:off 6:off saslauthd 0:off 1:off 2:off 3:off 4:off 5:off 6:off sssd 0:off 1:off 2:off 3:off 4:off 5:off 6:off winbind 0:off 1:off 2:off 3:off 4:off 5:off 6:off ypbind 0:off 1:off 2:off 3:off 4:off 5:off 6:off ; runlevel 3 - network services nfs 0:off 1:off 2:off 3:on 4:on 5:on 6:off nfslock 0:off 1:off 2:off 3:on 4:on 5:on 6:off ntpd 0:off 1:off 2:off 3:on 4:on 5:on 6:off rpcidmapd 0:off 1:off 2:off 3:on 4:on 5:on 6:off ; runlevel 2 - networked multi user acpid 0:off 1:off 2:on 3:on 4:on 5:on 6:off atd 0:off 1:off 2:on 3:on 4:on 5:on 6:off auditd 0:off 1:off 2:on 3:on 4:on 5:on 6:off certmonger 0:off 1:off 2:on 3:on 4:on 5:on 6:off crond 0:off 1:off 2:on 3:on 4:on 5:on 6:off cups 0:off 1:off 2:on 3:on 4:on 5:on 6:off mdmonitor 0:off 1:off 2:on 3:on 4:on 5:on 6:off messagebus 0:off 1:off 2:on 3:on 4:on 5:on 6:off netfs 0:off 1:off 2:on 3:on 4:on 5:on 6:off network 0:off 1:off 2:on 3:on 4:on 5:on 6:off ntpdate 0:off 1:off 2:on 3:on 4:on 5:on 6:off portreserve 0:off 1:off 2:on 3:on 4:on 5:on 6:off postfix 0:off 1:off 2:on 3:on 4:on 5:on 6:off rpcbind 0:off 1:off 2:on 3:on 4:on 5:on 6:off rsyslog 0:off 1:off 2:on 3:on 4:on 5:on 6:off smartd 0:off 1:off 2:on 3:on 4:on 5:on 6:off sshd 0:off 1:off 2:on 3:on 4:on 5:on 6:off yum-cron 0:off 1:off 2:on 3:on 4:on 5:on 6:off ; runlevel 1 - single user apcupsd 0:off 1:on 2:on 3:on 4:on 5:on 6:off blk-availability 0:off 1:on 2:on 3:on 4:on 5:on 6:off cpuspeed 0:off 1:on 2:on 3:on 4:on 5:on 6:off sysstat 0:off 1:on 2:on 3:on 4:on 5:on 6:off udev-post 0:off 1:on 2:on 3:on 4:on 5:on 6:off ; 54 services in all
The default runlevel, as specified in /etc/inittab, is 3.
telinit 4 telinit 3 service --status-all
abrt-ccpp hook is not installed abrtd is stopped abrt-dump-oops is stopped acpid is stopped apcupsd (pid 17228) is running... atd (pid 1526) is running... auditd (pid 1588) is running... automount is stopped certmonger (pid 1538) is running... Stopped [cgconfig] cgred is stopped cpuspeed is stopped crond (pid 1515) is running... cupsd (pid 1199) is running... firstboot is not scheduled to run hald is stopped ip6tables: Firewall is not running. IPsec stopped iptables: Firewall is not running. irqbalance is stopped Kdump is not operational lvmetad is stopped mdmonitor (pid 17151) is running... messagebus (pid 1182) is running... netconsole module not loaded Configured devices: lo eth0 Currently active devices: lo eth0 rpc.svcgssd is stopped rpc.mountd (pid 17397) is running... nfsd (pid 17462 17461 17460 17459 17458 17457 17456 17455) is running... rpc.rquotad (pid 17393) is running... rpc.statd (pid 1112) is running... ntpd (pid 18365) is running... numad is stopped oddjobd is stopped portreserve is stopped master (pid 16932) is running... Process accounting is disabled. quota_nld is stopped rdisc is stopped restorecond is stopped rngd is stopped rpcbind (pid 1094) is running... rpc.gssd is stopped rpc.idmapd (pid 17448) is running... rpc.svcgssd is stopped rsyslogd (pid 1058) is running... sandbox is stopped saslauthd is stopped smartd (pid 17210) is running... openssh-daemon (pid 1393) is running... sssd is stopped winbindd is stopped ypbind is stopped Nightly yum update is enabled.
Firewall
Please note well that the standard firewall is disabled above. If your server is connected directly to the internet you should really be running a firewall.
Enable sudo
To enable sudo for one of the local user edit the soders file:
nano /etc/sudoers
and include the lines
%wheel ALL=(ALL) ALL Defaults mailto="system.administrator@your.domain"
Add the local users to the wheel group using
system-config-users
In some cases, e.g. when connecting to CentOS sshd from an Ubuntu openssh client, the XAUTHORITY env variable is not set. If sudo is then used, connections to the X11 display are refused. In this case use the following:
XAUTHORITY=/home/username/.Xauthority sudo -s
Disable root remote login
Modify the sshd configuration:
sudo nano /etc/ssh/sshd_config
to include the line
PermitRootLogin no
Then restart the ssh server:
sudo service sshd restart
Install HW identification tools
wget http://pkgs.repoforge.org/lshw/lshw-2.16-1.el6.rf.i686.rpm sudo yum install ./lshw-2.16-1.el6.rf.i686.rpm sudo lshw > lshw_your.machine.id_20130417.txt
To do list
The following should still be documented:
- Is restorecond service needed?
- RAID and automatic file system checking / Ext4 journalling / write barriers
- Check RAID IO scheduler error message that appears at boot-time
- Fix hardware unsupported boot-time message
- Fix ‘eth0: excessive work at interrupt’ for via_velocity driver
- Compile and install a new kernel specific to my hardware
- …