SSD Partitioning, Partition Alignment, Optimal Configuration Settings and Performance Testing

Contributed by devil on Jan 09, 2012 – 03:34 PM

This information was collected over the past (2010-2011) year from multiple sources, plus my experimentation on 4 different SSD models. Performance testing procedures are by devil, from his article on the German blog. I have sequentially numbered some of the paragraphs for ease of reference when discussing and/or correcting the information – corrections and improvements are welcome.

Basic Drive Math

The storage architecture of SSDs is consistent with the legacy standards for modern hard disk drives, the so-called “Cylinders-Heads-Sectors” or “CHS” geometry, and is important to understand — even if only for the next 30 minutes.

512 bytes is one sector*, 2 sectors are 1024 bytes=1KB.
When you run "fdisk -l" and see the output in "blocks", those are typically kilobytes (KB) of 2 sectors each.
1 megabyte (MB) is 1KB x 1024, therefore 1MB is also (2x1024) 2,048 sectors.
1 gigabyte (GB) is 1MB x 1024, therefore 1GB is also (2048x1024) 2,097,152 sectors.

Size	Sectors
512 bytes	1
1 kb	2
512 kb	1024
1 MB	2048
1 GB	2097152

”Advanced Format” for new large hard disk drives uses 4,096 bytes per sector – we can expect to see this value used in (distant) future SSD designs

The definition of a logical “cylinder” is not fixed — in modern drives it is a theoretical construct of the number of logical heads and the number of logical tracks, however the number of logical tracks varies based on the drive size. The industry has settled upon a default configuration for most drives of 255 heads and 63 sectors per track, so the value for the number of logical cylinders on a given drive is the output of (total sectors/63)/255. For example, a 40GB OCZ Vertex2 SSD has 78,161,328 sectors, therefore it has 4865 cylinders using default heads and sectors.

SSD Alignment

Most current SSDs are using a 512KB erase block size — but verify this is the case for your SSD. This alignment guide is written for 512K erase blocks, and would not be correct for other sizes. When data are deleted or overwritten, the operation is done in blocks of 512K that are defined in the SSD firmware. For longevity of the SSD, you would prefer that each deletion or overwrite use the minimum number of needed erase blocks. (You can do the reasearch on the lifetime number of erases for each SSD memory block — it is a few thousand, more or less, depending on the technology). Partitioning the SSD with default settings, using GParted or similar tools like you do for a hard drive, will set partition boundaries (and therefore block locations within the partition) that are not in alignment with the underlying SSD firmware, resulting in two erase blocks being involved when one erase block would be sufficient for the amount of data being manipulated. Further research on the reasons for this misalignment is left to the student and his google-foo — it is all out there.

So we want to set up our SSD with partitions (and therefore block locations within the partitions) that begin at the same sector where a 512K erase block begins — that is what we mean by “alignment”.

Methods to Make Aligned Partitions

There are multiple tools and approaches to make aligned partitions. Windows 7 and later automatically aligns during installation, and there are commercial software tools for Windows users, if they want to buy it. Here we will present 3 ways that are suitable for Linux users.

Method A — “fdisk with custom cylinder definition”
NOTE 1:
fdisk is a tool in the util-linux package. Between version 2.17.2-9 (found in Debian 6 “Squeeze” Live CD, aptosid “geras” Live CD, etc.) and version 2.19.1-2 (found in aptosid “Imera” and later, and siduction 2011.1 and later, and other recent Live CDs) of the util-linux package, fdisk lost its ability to write a new CHS geometry to the partition table. You will need the version of fdisk from one of the earlier sources – probably a Hiren’s Boot CD or other legacy hard disk tool that support DOS operations will contain a compatible fdisk version.

NOTE 2:
My experiments with the current version of cfdisk indicate that it follows the current version of fdisk and forces the user to use sector 2048 for the beginning of the first partition, and refuses to allow partitioning by custom-defined cylinders.

This Method A approach offers simplicity – we set both heads and sectors to 32, resulting in cylinders that are 512KB each (32×32=1024 sectors). Thus every cylinder is on an erase block boundary, and everything is automatically aligned on the SSD. The procedure will look like this (using Debian 6 Live CD, 8GB USB stick for example):

First, check the default configuration with “fdisk -lu” – on a SSD (or hard disk drive) the geometry will normally be 255 heads, 63 sectors per track. On my sample USB stick it is:

Disk /dev/sdb: 8029 MB, 8029470208 bytes
249 heads, 62 sectors/track, 1015 cylinders, total 15682559 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x421d1496

Device Boot Start End Blocks Id System
/dev/sdb1 62 15669569 7834754 b W95 FAT32

Next, as root start fdisk using the -H and -S options, and for the first partition you need to choose cylinder number 2 for the start, since the first cylinder is needed for the MBR and bootloader.

root@debian:/home/user# fdisk -H 32 -S 32 /dev/sdb

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
switch off the mode (command 'c') and change display units to
sectors (command 'u').

Command (m for help): o
Building a new DOS disklabel with disk identifier 0xe5d396f1.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
switch off the mode (command 'c') and change display units to
sectors (command 'u').

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-15314, default 1): 2
Last cylinder, +cylinders or +size{K,M,G} (2-15314, default 15314): 7658

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 83

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (1-15314, default 1): 7659
Last cylinder, +cylinders or +size{K,M,G} (7659-15314, default 15314): 9574

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): 82 Changed system type of partition 2 to 82 (Linux swap / Solaris)

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 3
First cylinder (1-15314, default 1): 9575
Last cylinder, +cylinders or +size{K,M,G} (9575-15314, default 15314): 
Using default value 15314

Command (m for help): t
Partition number (1-4): 3
Hex code (type L to list codes): 83

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
Now (after the write is finished) use “fdisk -lu” to verify your results:

Disk /dev/sdb: 8029 MB, 8029470208 bytes
32 heads, 32 sectors/track, 15314 cylinders, total 15682559 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xe5d396f1

Device Boot Start End Blocks Id System
/dev/sdb1 1024 7841791 3920384 83 Linux
/dev/sdb2 7841792 9803775 980992 82 Linux swap / Solaris
/dev/sdb3 9803776 15681535 2938880 83 Linux

Each “Start” sector number can be divided by 1024 exactly, and thus aligns with the 512K erase block structure of the SSD.

In the above example, I decided upon the end sector numbers based on my plan to make a 4GB OS partition, a 1GB swap space, and the balance for user data (a miniature Linux system!). Once I saw, in the first partition operation, that there were 15014 cylinders, and knowing my drive size of ~8GB, I simply divided 15014 in half, and then similarly estimated the number of cylinders needed for the swap, and then accepted the default end of the stick for the third partition. The numbers are a bit approximate – since we’re using whole cylinders it is only important to start each partition immediately after the end of the previous partition.

Method B — “fdisk with default CHS settings”

Or, we could call this method “person in a big rush, no old fdisk version handy”. But, this one requires a calculator, so better plan to exercise some patience, regardless of the rush.

We start again with an off-the-shelf USB stick for our example, “fdisk -lu” shows:

Disk /dev/sdb: 8029 MB, 8029470208 bytes
249 heads, 62 sectors/track, 1015 cylinders, total 15682559 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x421d1496

Device Boot Start End Blocks Id System
/dev/sdb1 62 15669569 7834754 b W95 FAT32

Using the newer version of fdisk found in util-linux 2.19.1-2, there is no use trying to change the heads and sector geometry because our desires will be ignored (contrary to what the man entry tells us), and we will be forced to partition by sectors anyway. When we get to the third step, where we set the beginning of the first partition, notice that we are offered only sector numbers, and the lowest beginning number offered is 2048. While this is a multiple of 512K and thus an aligned sector, we are forced to waste the preceding 1024 sectors which could be used (see the fdisk output from Method A).

root@e6500siduction:/home/don# fdisk /dev/sdb

Command (m for help): o 

Building a new DOS disklabel with disk identifier 0x9cc3361d. 
Changes will remain in memory only, until you decide to write them. 
After that, of course, the previous content won't be recoverable. 

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Partition type:
p primary (0 primary, 0 extended, 4 free)
e extended
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-15682558, default 2048):
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-15682558, default 15682558): 7840767

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): 83

Command (m for help): n
Partition type:
p primary (1 primary, 0 extended, 3 free)
e extended
Select (default p): p
Partition number (1-4, default 2): 2
First sector (7840768-15682558, default 7840768): 
Using default value 7840768
Last sector, +sectors or +size{K,M,G} (7840768-15682558, default 15682558): 9800703

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): 82
Changed system type of partition 2 to 82 (Linux swap / Solaris)

Command (m for help): n
Partition type:
p primary (2 primary, 0 extended, 2 free)
e extended
Select (default p): p
Partition number (1-4, default 3): 3
First sector (9800704-15682558, default 9800704): 
Using default value 9800704
Last sector, +sectors or +size{K,M,G} (9800704-15682558, default 15682558): 
Using default value 15682558

Command (m for help): t
Partition number (1-4): 3
Hex code (type L to list codes): 83

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Confirming that we achieved our desired result, we observe the output of “fdisk -lu” and confirm that each partition begins on a sector number that is evenly divisible by 1024:

Disk /dev/sdb: 8029 MB, 8029470208 bytes
249 heads, 62 sectors/track, 1015 cylinders, total 15682559 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x9cc3361d

Device Boot Start End Blocks Id System
/dev/sdb1 2048 7840767 3919360 83 Linux
/dev/sdb2 7840768 9800703 979968 82 Linux swap / Solaris
/dev/sdb3 9800704 15682558 2940927+ 83 Linux

Supporting calculations:
First partition:
Estimate a 4GB size in sectors: 15682558/2 = 7841294 sectors = ~4GB
Find the nearest whole number multiplier of 1024: 7841294/1024 = 7657.513xxx
Calculate number of sectors and partition ending sector (7657×1024=7840768 – 1 = 784076

Second partition (1GB swap)
Estimate a 1GB size in sectors: 15682558/8 = 1960319.75 sectors per GB
Find the nearest whole number multiplier of 1024: 1960320/1024 = 1914.375
Calculate number of sectors needed (1914×1024=1959936)
and add to the end of the preceding partition to get ending sector number: (7840767 + 1959936) = 9800703

Third partition therefore begins on sector 9800704 (evenly divisible by 1024) and ends at the end of the drive.

Method C — “gdisk”

gdisk is used to support GPT partitioning. Debian versions are found after Debian 6 “Squeeze”, for example running siduction we have:

root@e6500siduction:/home/don# apt-cache policy gdisk
gdisk:
Installed: 0.8.1-1+b1
Candidate: 0.8.1-1+b1
Version table:
*** 0.8.1-1+b1 0
500 http://ftp.us.debian.org/debian/ unstable/main amd64 Packages
100 /var/lib/dpkg/status

The gdisk command is issued to the target device, and the procedure follows closely the “Method B” above, except we do not need the calculator:

root@e6500siduction:/home/don# gdisk /dev/sdb
The gdisk command is issued to the target device, and the procedure follows closely the "Method B" above, except we do not need the calculator:
GPT fdisk (gdisk) version 0.8.1

Partition table scan:
MBR: MBR only
BSD: not present
APM: not present
GPT: not present


***************************************************************
Found invalid GPT and valid MBR; converting MBR to GPT format.
THIS OPERATION IS POTENTIALLY DESTRUCTIVE! Exit by typing 'q' if
you don't want to convert your MBR partitions to GPT format!
***************************************************************


Command (? for help): o
This option deletes all partitions and creates a new protective MBR.
Proceed? (Y/N): y

Command (? for help): n
Partition number (1-128, default 1): 
First sector (34-15682525, default = 34) or {+-}size{KMGTP}: 
Information: Moved requested sector from 34 to 2048 in order to align on 2048-sector boundaries.
Use 'l' on the experts' menu to adjust alignment
Last sector (2048-15682525, default = 15682525) or {+-}size{KMGTP}: +4G
Current type is 'Linux filesystem'
Hex code or GUID (L to show codes, Enter = 8300): 
Changed type of partition to 'Linux filesystem'

Command (? for help): n
Partition number (2-128, default 2): 
First sector (34-15682525, default = 8390656) or {+-}size{KMGTP}: 
Last sector (8390656-15682525, default = 15682525) or {+-}size{KMGTP}: +1G
Current type is 'Linux filesystem'
Hex code or GUID (L to show codes, Enter = 8300): 8200
Changed type of partition to 'Linux swap'

Command (? for help): n
Partition number (3-128, default 3): 
First sector (34-15682525, default = 10487808) or {+-}size{KMGTP}: 
Last sector (10487808-15682525, default = 15682525) or {+-}size{KMGTP}: 
Current type is 'Linux filesystem'
Hex code or GUID (L to show codes, Enter = 8300): 
Changed type of partition to 'Linux filesystem'

Command (? for help): p
Disk /dev/sdb: 15682559 sectors, 7.5 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): F4BA217F-6CBA-472E-93FB-A2A807EA3940
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 15682525
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number Start (sector) End (sector) Size Code Name
1 2048 8390655 4.0 GiB 8300 Linux filesystem
2 8390656 10487807 1024.0 MiB 8200 Linux swap
3 10487808 15682525 2.5 GiB 8300 Linux filesystem

Command (? for help): w

Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!

Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT).
The operation has completed successfully.

Note that we used “p” to check the partition table before writing it, and one can see that the partitions do begin on sectors evenly divisible by 1024. It is aligned as intended.
Back in the terminal we can take a look at what fdisk sees, but in the case of the GPT partitioned drive, it cannot see the partition table:

root@e6500siduction:/home/don# fdisk -lu

WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.

Disk /dev/sdb: 8029 MB, 8029470208 bytes
210 heads, 31 sectors/track, 2408 cylinders, total 15682559 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Device Boot Start End Blocks Id System
/dev/sdb1 1 15682558 7841279 ee GPT 

This concludes the partitioning and partition alignment guidance. Now let’s see what tweaks we can make to maximize the performance of our system.

Optimizing SSD-based System Performance

The goal of these configuration settings, generally, is to enable TRIM, to minimize erase/write cycles that don’t add to system performance, and otherwise optimize the OS to provide a very responsive user experience.

1. Filesystem type and /etc/fstab configuration

We want to use ext4 and take advantage of the journaling feature, for data security, but we want to reduce the frequency of journal commits from the default 5 seconds to a slower rate, to extend the life of the SSD memory blocks. The “commit” mount option no longer controls the commit frequency in recent Debian distributions — instead we find the journal commit frequency is a setting in /usr/lib/pm-utils/power.d/journal-commit.

For the value “JOURNAL_COMMIT_TIME_AC=${JOURNAL_COMMIT_TIME_AC:”, I use “-120” or 2 minutes for my desktop. The default value for “JOURNAL_COMMIT_TIME_BAT=${JOURNAL_COMMIT_TIME_BAT:” is “-600”, or 10 minutes. That is way too long for my taste — I like my data, so I set it back down to “-120” also. Use the “discard” mount option to enable the SSD’s TRIM capability, and change the “relatime” option to “noatime” to eliminate unnecessary disk writes when a file is read but not changed. As a result, the /etc/fstab line that mounts the OS will look like this:

UUID=bea3a748-3411-4024-acd0-39f3882ddaf9 / ext4 defaults,noatime,errors=remount-ro,discard 0 1

We want to mount selected filesystems as “tmpfs”, which lets the OS use memory rather than the SSD for logging and spooling. The wise user will wait for a reasonable period of time after initially installing on the SSD, before these changes are made to /etc/fstab, because until you are sure your system is stable, you should allow the logs to be written on the SSD, for later review. Logs written in memory will not survive a reboot. When you are satisfied that the system is stable and the logs can safely be lost at each reboot, add these lines to the end of /etc/fstab:

none /tmp tmpfs defaults,noatime,mode=1777 0 0
none /var/tmp tmpfs defaults,noatime 0 0
none /var/log tmpfs defaults,noatime 0 0
none /var/spool tmpfs defaults,noatime 0 0
none /run/shm tmpfs defaults,noatime 0 0

2. Outsource the browser cache to /run/shm

Since we now have the shared directory /run/shm in RAM, we can outsource the cache generated during browsing to memory, and eliminate many SSD writes. For example, in the Firefox/Iceweasel address bar we enter “about:config” and confirm the warning. Now right-click in the white space and choose “New ==> String” and we create a new entry called:

“browser.cache.disk.parent_directory”
After double-clicking the new string, we assign it the value:

“/run/shm/firefox-cache”
Now as user in the terminal create a directory:

mkdir /run/shm/firefox-cache

After a Firefox restart, browser caching happens in memory, not on the SSD.

For chromium-browser, the cache location is set with the “–disk-cache-dir=”DIRNAME” launch command option. So to outsource the chromium-browser cache:

mkdir /run/shm/chromium-cache

Open the chromium-browser launch icon for editing, change to the “Application” tab, and edit the start command to read as follows:

/usr/bin/chromium --disk-cache-dir=/run/shm/chromium-cache %U

The new browser cache directory in /run/shm will not survive a reboot. To automate this process, put the following “auto_browser_cache.sh” script in your ~.kde/Autostart folder (for KDE users), and then “chmod +x” to make it executable:

#!/bin/bash
NEWDIR=/run/shm/username-chromium-cache
mkdir "$NEWDIR" &
sleep 1
NEWDIR1=/run/shm/username-firefox-cache
mkdir "$NEWDIR1" &
sleep 1
NEWDIR2=/run/shm/username-opera-cache
mkdir "$NEWDIR2" &
#end

Analogous cache outsourcing configuration can be made for other browsers, if they allow the user to specify the cache location, and the startup script can be adapted to add directories for each browser that the user wants to run.

For a desktop system that remains booted for long periods, and depending on the memory capacity and browsing activities, the outsourced browsing cache could grow to a problematic size and need to be manually cleared to avoid sending the system into swapping.

3. I/O Scheduler selection

Multiple sources that you can find with a google search indicate that, for SSDs, the “deadline” and “noop” schedulers perform better than the default “cfq” scheduler. I have not done any testing to determine which is faster on my SSD installations, so that will be an exercise left to the student. Set the scheduler with an edit to the line “GRUB_CMDLINE_LINUX_DEFAULT=xxx” in /etc/default/grub, so for example:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash elevator=deadline"

4. Virtual memory settings

Depending on how much memory your system has, and how you use it, the same tweaks to vm (swappiness, vfs_cache_pressure, etc.) that you use for a hard disk drive installation can also be applied to a system installed on a SSD. Guidance is available via google search. Here are the lines added to /etc/sysctl.conf on one of my SSD installations:

vm.swappiness=1
vm.vfs_cache_pressure=25
vm.dirty_ratio = 40
vm.dirty_background_ratio = 3
#

References:
http://www.westnet.com/~gsmith/content/linux-pdflush.htm
http://www.cyberciti.biz/faq/linux-kernel-tuning-virtual-memory-subsystem/

Performance Testing (from Ferdinand Thommes’ article)
Before you spend time on performance testing and benchmarking your SSD, you need to determine the firmware version you have, and then check the OEM’s website and learn whether a more recent version is available. Significant performance improvements can result merely from updating your SSD firmware — follow your OEM’s instruction to install updated firmware. To check your firmware version:

hdparm -iv /dev/sdx

5. Verify that TRIM is working

(after setting the “discard” mount option as shown in #1 above).

# cd to some directory on the SSD, then 
dd if = / dev / urandom of = tempfile bs = 512k count = 100 oflag = direct 
hdparm - fibmap tempfile
# here we read the sectors from the tempfile

From the output we copy the number immediately under “begin_LBA” and insert it in the next command:

hdparm - read-sector 1234567 /dev/sdx
# 1234567 replaced with the number from the previous command and /dev/sdx with your device ID

The output should be a longer string. Next:

rm tempfile 
sync 
hdparm - read-sector 1234567 /dev/sdx
# replace 1234567 and /dev/sdx with your values
< (pre>

The sectors will not be cleared instantly due to caching -- wait for some seconds. Then repeat the last command (hdparm - read-sector ...) -- it should (after a short while) come out all zeros. That means TRIM works! If you have problems with "discard" on your SSD and you have verified that your SSD does support TRIM, you can use fstrim which is in the current util-linux package (check "man fstrim"), or use the tool "DiskTrim" from http://disktrim.sourceforge.net/.

6. Throughput Benchmarking

CAUTION: You can benchmark your SSD to a premature death by subjecting it to frequent comprehensive benchmark tests!

6a. Simple hdparm test

hdparm -tT /dev/sdx

Run it twice in rapid succession — normally the second run is fastest.

6b. hdparm with O_DIRECT kernel flag

hdparm --direct -tT /dev/sdx

6c. More reliable benchmark using dd

# cd to some directory on the SSD, then 
$ dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync, notrunc
1024 +0 records in 
1024 +0 records out 
1073741824 bytes (1.1 GB) copied, 2.18232 s, 492 Mb/s

Now (as root) clear the buffer cache to force reading directly from disk:

# echo 3 > /proc/sys/vm/drop_caches 
$ tempfile dd if=of=/dev/null bs=1M count=1024 
1024 +0 records in 
1024 +0 records out 
1073741824 bytes (1.1 GB) copied, 2 , 55234 s, 421 Mb/s

Now we have the last file in the buffer cache and measure its speed:

$ dd if=tempfile of=/dev/null bs=1M count=1024 
1024 +0 records in 
1024 +0 records out 
1073741824 bytes (1.1 GB) copied, 0.122594 s, 8.8 Gb/s

For the most accurate possible value for your SSD, re-run the last command 5 times and average the results.

6d. Other benchmarking tools

bonnie++ and compilebench. Have fun!

CREDITS:
This article was written by Don Boyd [dibl], inspired by my SSD article on the german blog. Thanks a lot for that!

Leave a Reply

Your email address will not be published. Required fields are marked *