Hard Drive Metrics That Matter
How I bought a hard drive that’s great for backups, but awful for my use case.
I’m surprised at how difficult it is to find good information about hard drive performance. If you can find any performance numbers at all, it’s usually just maximum transfer speed. It turns out that in my case, what I really cared about was something called IOPS. Here, I’ll discuss what IOPS are, when they matter, and how to measure them.
First, some background. I’m working on a project that involves storing and processing a fair amount of data. I do some of the work on Amazon’s EC2 platform, and some on my laptop. My old laptop was getting a little crunched for space, and EC2 can get expensive, so I decided to purchase an external drive.
I spent several hours doing research. I wanted to make sure that whatever I wound up with wasn’t a downgrade from my laptop.
I wound up choosing a 2TB Western Digital Easystore that was available at Best Buy for $55. From what I’d been able to find online, it performed about the same as my laptop’s hard drive, and the price seemed like a good deal.
While I was working on my project, I started to notice some strange behavior. Some of the file processing that I was doing was taking far longer than I thought it should have. It turns out that it has to do with the difference between sequential access and random access. And, unfortunately, the WD Easystore, or at least the one that I purchased, performs very poorly on random access.
Anyone who’s shopped for hard drives is probably familiar with their primary performance metric: data transfer speed. In marketing material, vendors will claim that their drive can transfer 100 megabytes per second, or 400 megabytes per second, or whatever. It’s an easy metric for people understand: faster is better.
The catch is that this transfer rate applies to sequential access only. That is, if you’re reading or writing large chunks of data at a time, like when you’re copying files or backing up a drive. Transfer speed is important, but it doesn’t tell the whole story, because copying files isn’t the only purpose of a hard drive.
In addition to sequential access, some applications require random access. This means jumping around from file to file, or jumping around within a certain file. My data analysis was jumping around within files, and this is what slowed it down.
The metric used to measure how well a drive performs on random access is called “IOPS,” which stands for “Input-Output Operations per Second.” It’s the number of reads or writes the drive can do in a second. When you read from or write to a disk, you read or write a block of data at a time, called a “buffer.” A typical buffer size is 512 bytes. Whether you write all 512 bytes or just 1 byte, it still counts as an IO operation.
That means that applications that make many small writes suffer a huge performance hit. If your drive is capable of 10,000 IOPS and you write 512 bytes at a time, you can create a 50 megabyte file in five seconds. If you write 1 byte at a time, it would take over an hour.
Unfortunately, some applications can’t read or write 512 bytes at a time. They have to read a few bytes in this part of a file, write a few bytes in another part of the file, etc. In order to be performant, they need a hard drive that features a very high IOPS.
There are two types of commonly used drives: HDDs, which use spinning platters, and SSDs, which have no moving parts. One of the benefits of SSD devices is that they generally have much better performance for random access. A typical HDD performs at around 100 IOPS, while SSDs might perform at 3,000, 30,000, or more. Unfortunately, as far as I can tell the Easystore’s performance is about the same as an HDD.
I’m not an expert at benchmarking drive performance, so I could certainly be missing something. It’s also possible that my particular drive isn’t representative of Easystore drives as a whole. But, this is how I measured and what I found.
First, I measured sequential write performance by creating a 512mb file using the dd
command.
# Performance on my 2015 MacBook Prodd if=/dev/zero of=dd.out bs=512 count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes transferred in 4.948839 secs (103458605 bytes/sec)# Performance on my WD Easystoredd if=/dev/zero of=dd.out bs=512 count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes transferred in 5.133201 secs (99742833 bytes/sec)
So far, so good. Running the command several times gives slightly different results, but these results are pretty typical of what I saw. If you’re not familiar with the dd
command, it copies bytes from one file or device to another. The if
option stands for “input file,” of
stands for “output file,” bs
stands for “block size,” and count
specifies how many blocks to copy.
In this case, I’m copying 100,000 blocks of 512 bytes each from /dev/zero
(which just returns blocks containing 0) to a file called dd.out
. My MacBook transferred the data at 103mb/sec, and the Easystore at 100mb/sec. We can also determine sequential IOPS. The MacBook’s drive handled 1,000,000 write operations in 5 seconds, or 200,000 IOPS. The Easystore performed basically the same, at 196,000 IOPS. The results varied enough from run-to-run that the difference between the two was negligible.
I executed a similar command to test read performance.
# Performance on my 2015 MacBook Prodd if=dd.out of=/dev/null bs=512
1000000+0 records in
1000000+0 records out
512000000 bytes transferred in 3.273717 secs (156397139 bytes/sec)# Performance on my WD Easystoredd if=dd.out of=/dev/null bs=512
1000000+0 records in
1000000+0 records out
512000000 bytes transferred in 4.813426 secs (106369143 bytes/sec)
Here, we’re reading from the 512mb file that we just wrote, and dumping it to /dev/null
, which means we’re not writing at all. The Easystore performed about the same for both reads and writes. For some reason, it seems like my Mac’s SSD performs better for writes. This was consistent from run-to-run.
So far, we’ve been looking at performance on sequential access. Now let’s look at random access. I wrote a short Python script to make 100,000 very small writes of 16 bytes each, all over the 512mb file that we created earlier.
import os
import random
f = open('dd.out', 'r+')
stat = os.fstat(f.fileno())
size = stat.st_size
for i in range(0, 100000):
f.seek(random.randrange(0, size))
f.write('stompstompstomp...')
This is going to hop all around the time, writing 16 bytes at a time. I then timed the execution.
# Performance on my 2015 MacBook Pro - 100,000 writes
time python random-writes.py
python random-writes.py 1.87s user 2.84s system 32% cpu 14.368 total
My MacBook’s SSD was able to do 100,000 writes in 14.4 seconds, or 6,944 IOPS. That’s well within expectations for a SSD, and way above the 100 IOPS that you’d expect from an HDD.
Running the same test on my Easystore took so long that I reduced the iterations from 100,000 to 1,000.
# Performance for my WD Easystore - 1,000 writestime python random-writes.py
python random-writes.py 0.12s user 0.19s system 3% cpu 9.278 total
Almost 9.3 seconds to do 1,000 writes, or 108 IOPS. My WD Easystore SDD performs about as well as an HDD for a large number of small, random writes!
So, what did I learn from this exercise? I learned that hard drive manufacturers generally don’t provide you with enough information to evaluate hard drives for purposes other than copying files. I don’t feel misled by Western Digital, because the Easystore is marketed as a way to backup your system, and not as a drive that handles serious workloads. Two terabytes of storage at $55 is still a great deal.
If you really need an SSD that performs well for random access, you have to be willing to spend some money. Check out this 2tb 760p drive from IBM:
275,000 IOPS for random writes! Instead of 14 seconds, my 100,000 writes would have taken about a third of a second. It comes with a heavy price tag of $450. On the one hand, it’s ten times the price of my WD Easystore. On the other hand, it can almost 3,000 times as many random writes per second!