Troubleshooting large IOWait spikes

30.08.2016 17:28

changed mount options:

  • Before: defaults,noatime,nodiratime,barrier=1,data=ordered,errors=remount-ro
  • After: relatime,barrier=0,errors=remount-ro

pveperf before:

root@pve-01:~# pveperf /
CPU BOGOMIPS:      19244.00
REGEX/SECOND:      959888
HD SIZE:           27.19 GB (/dev/dm-0)
BUFFERED READS:    195.02 MB/sec
AVERAGE SEEK TIME: 0.26 ms
FSYNCS/SECOND:     301.31
DNS EXT:           93.93 ms
DNS INT:           93.76 ms (sqweeb.net)

root@pve-01:~# pveperf /var/lib/vz
CPU BOGOMIPS:      19244.00
REGEX/SECOND:      978722
HD SIZE:           178.85 GB (/dev/mapper/pve-data)
BUFFERED READS:    197.73 MB/sec
AVERAGE SEEK TIME: 0.20 ms
FSYNCS/SECOND:     316.08
DNS EXT:           84.80 ms
DNS INT:           107.74 ms (sqweeb.net)

pveperf after:

root@pve-01:~# pveperf /
CPU BOGOMIPS:      19244.08
REGEX/SECOND:      953290
HD SIZE:           27.19 GB (/dev/dm-0)
BUFFERED READS:    176.81 MB/sec
AVERAGE SEEK TIME: 0.26 ms
FSYNCS/SECOND:     2128.05
DNS EXT:           113.88 ms
DNS INT:           106.68 ms (sqweeb.net)

root@pve-01:~# pveperf /var/lib/vz
CPU BOGOMIPS:      19244.08
REGEX/SECOND:      980483
HD SIZE:           178.85 GB (/dev/mapper/pve-data)
BUFFERED READS:    194.57 MB/sec
AVERAGE SEEK TIME: 0.20 ms
FSYNCS/SECOND:     2376.09
DNS EXT:           104.70 ms
DNS INT:           90.75 ms (sqweeb.net)

I also found there were some rouge monitoring processes running after a host reboot, collectd, splunkd, and filebeat were all running from previous installs. I hunted down and removed/stopped/deleted all associated files I could find on the host and the containers.

find / -name 'splunk'
find / -name 'collectd'
find/ -name 'filebeat'

After these changes I found that the IOWait spikes were still occurring at ~15-20min intervals. Doing some more digging I found something interesting, it seems that the /dev/sda device is the source of delay, and when checking smartctl I found that the firmware is different on this drive compared to a drive of the same model:

/dev/sda

root@pve-01:~# smartctl -a /dev/sda
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.2.6-1-pve] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     SanDisk SDSSDA120G
Serial Number:    154414402842
LU WWN Device Id: 5 001b44 eff65c91a
Firmware Version: Z22000RL
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      1.8 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Aug 31 10:17:57 2016 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

/dev/sdc

root@pve-01:~# smartctl -a /dev/sdc
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.2.6-1-pve] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     SanDisk SDSSDA120G
Serial Number:    154400408149
LU WWN Device Id: 5 001b44 efe903e55
Firmware Version: U21010RL
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Aug 31 10:22:50 2016 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

At this time it isn’t clear if it is even possible to update or change the firmware on these drives..

Also look into further fstab tweaks: http://www.howtogeek.com/62761/how-to-tweak-your-ssd-in-ubuntu-for-better-performance/

Update 9/1/16 - 10:35AM

After looking back over the above documents, mount option tweaks for SSD’s and such I realized I was overlooking one important task; fstrim. Most posts discussed enabling the discard mount option for SSD’s but then other posts suggested this option causes a huge decrease in performance and to instead create a cron job that runs fstrim daily or weekly. I enabled a daily TRIM using the following script:

/etc/cron.daily/fstrim

#!/bin/sh

PATH=/bin:/sbin:/usr/bin:/usr/sbin

ionice -n 7 fstrim -v /
ionice -n 7 fstrim -v /var/lib/vz

I then manually ran the two commands to apply the trim job now and there was an immediate improvement. I am no longer seeing 15-20% spikes in IOWAIT every 20-30 minutes!!

Tags: archive

<< Back Top ^^