Hardware AES Showdown - VIA Padlock vs Intel AES-NI vs AMD HexacoreWritten by Grant
Back in the old Amiga days we knew the value of specialized processors for specialized jobs. The Amiga could do things that computers with 10x the power couldn't do. However when it got to something that it's specialized processors weren't designed for it was slow again. The first time the Intel CPUs added a special processor into the CPU was in the i486 days and that processor was the floating point co-processor. Before the integration floating point math was handled by a special chip on the motherboard or by software routines. The hardware floating point math was somewhere around 20 times faster than in software. Every once in a while Intel and Amd added additional instructions that would speed up some function. Most of these instructions where multimedia related (SSE, 3DNOW) and none of them effected my life any. Fast forward to 2005 or so and we see VIA the underdog in the Intel x86 race add instructions in it's CN400 chipset to speed up encryption algorithms. So for one particular function (AES encryption/decryption) this CPU/chipset combo was blazing fast but for everything else it was still dog slow. Interesting idea but poorly executed perhaps. I did build a mini-ITX router using the VIA chipset. With VIA Padlock providing hardware encryption/decryption it's been great for SSH, OpenSSL and AES disk encryption via Cryptsetup. It however, doesn't make a very fast desktop computer.
Intel has grabbed this idea of hardware AES support and added it to their new CPUs. CPUs that in their own right are fast enough to be useful so I picked up a 2.5 ghz Core i5 system. My main desktop computer is an AMD Phenom II 6 core system at 2.8 ghz which I'll be using as a baseline. It's fairly fast but has no hardware crypt support. I'm going to demonstrate some typical benchmarks to show the general speed of each CPU then focus on the hardware crypt support using OpenSSL, OpenSSH and Cryptsetup.
Before you get all excited because you have a recent Intel CPU and you think it has AES-NI in it you might want to check Intel's beta comparison chart (which seems impossible to find by googling) at http://ark.intel.com/MySearch.aspx?AESTech=true. You can also use this comparison to check other options in the CPU like VT and VT-D. I keep this chart bookmarked before I make any purchases of Intel CPUs otherwise I'd end up with a bunch of stuff I don't want. Do NOT assume anything when dealing with Intel. For instance the Core i5-460 does NOT have AES but the core i5-560 does which sort of makes sense since the i5-560 is a bigger more powerful CPU right (as shown by the bigger model number)? Not necessarily true as seen by a weirder example - the Core i5-650 (AES-NI) and the Core i5-750 (No AES-NI). Why would a more expensive, more powerful CPU with a higher number not include AES-NI? I don't know, ask Intel.
To show the differences between hardware and software encryption I've prepared three computers without hardware crypt support and two that do. The company that started it all was VIA with their padlock instructions in their CN400 chipset so I've included an Epia SP Mini-ITX motherboard sporting a Via C3 at 1.3 Ghz and 1 GB of DDR ram. As a reference I've included a basic MSI Wind netbook with an Intel Atom 230 at 1.6 ghz. I don't expect much of anything out of this dog slow CPU but I think what we'll see is how much encryption impacts it and even though we may choose not to do full disk encryption we may connect to a VPN or log into https websites and feel the pain. The other CPU with hardware encryption instructions is the new Intel Sandy Bridge Core i5-2520 at 2.5 Ghz. This has the aforementioned AES-NI instructions which should speed things up nicely. The Core i5 is a dual core CPU so for contrast I've included an AMD Athlon II X3 - a triple core CPU running at 2.9 Ghz. With another 400 mhz and an additional core this should give the Core i5 some competition. Lastly I've included an AMD Phenom II X6 - a hexacore beast running at 2.8 ghz which through brute force should be able to turn some pretty good numbers.
VIA C3 @ 1333 mhz
1 GB 400 mhz DDR ram
Intel Atom 230 @ 1600 mhz
1 GB DDR2 ram
AMD Athlon II 435 @ 2900 mhz
2 GB ram
Intel Core i5-2520 @ 2500 mhz
8 GB of ram
AMD Phenom II X6 @ 2900 mhz
4 GB of ram
Baseline Benchmarking with Passmark
The first test is a general benchmark to demonstrate the overall performance of the systems. The passmark statistics can be found at http://www.cpubenchmark.net.
The interesting thing to note about this chart is the incredible speed Intel is getting out of each one of it's i5 cores. The Core i5 has 2/3 the performance of the Phenom X6 with 1/3 the cores and at a lower clock speed. This is very impressive. The Athlon II doesn't fare so well against the Core i5 and I think it's safe to say both the Atom and the C3 are dog slow. You don't realize how bad the Atom is until you compare it to a run of the mill desktop CPU and realize it has about 1/10 the power. It has to be noted that the Atom and C3 only have one core but still one core in the Core i5 are 6x times faster than one core in the Atom. Just for kicks I looked up the speed of an ancient AMD Athlon 2600+ from 8 years ago which we wouldn't even think about building a computer out of now it trounces the Atom! The Atom uses less electricity and costs pennies to make which is why it exists.
Our first encryption specific test is using openssl's benchmark argument. I've tested both the aes-128-ecb cipher and the aes-128-cbc ciphers and only posted the results for 8192 Byte blocks.
Woh! Our next to worthless VIA C3 puts a big lump on the Hexacore! In fact the VIA C3 beats the Hexacore in both tests by as much as 50%! The numbers for the VIA C3 are VERY impressive. For a CPU with 1/30th of the power of the Phenom II X6 it's pretty fast for doing this one thing. The Intel Core i5 pulls ahead by a good margin using it's hardware AES-NI instructions which is expected with a 2x - 10x speed improvement over the Hexacore.
Truecrypt 7.0a has support for AES-NI but not for Padlock which shows up in the VIA C3's horrible results. Ten MB/second encryption speed isn't worth anything. I also included results for Twofish with no CPU provides hardware acceleration for which shows the advantages of having hardware crypt support. The Atom is dog slow as we'd expect but almost usable. You could encrypt a USB thumb drive and not notice the impact too much outside of your CPU going to 100% when you write to it.
The Hexacore is about 3x times faster than the X3 which is interesting as it only has twice the number of cores. I can't really explain that one. However the Core i5 pulls ahead by quite a large margin but only because of AES-NI. AES speed is about 2.5x that of the Hexacore which is VERY impressive for a 2 core CPU pulling off 1.9 GB/sec average encryption/decryption speed. However, using twofish as the cipher shows the hexacores muscle and is roughly twice as fast as the Core i5.
Real World Benchmarks
Benchmarks are fun to look at but what we really need is a real world test - although the bars here are not to scale I'm afraid so you'll need to use the numbers. For a real world test I could just encrypt a volume and write to it but the results across this wide range of hardware would be irrelivent because the disk systems and drives themselves are vastly different in speed. So to create a more fair playing field and focus on the encryption abilities I've created a 400 MB ramdisk, then created a file inside and using losetup attached it to a loopback device. Once I have the loopback device I can format or encrypt the device as I please. To limit human error I've created a script named testluks.sh that I used to automate this process on each machine. This script is available in the downloads section at http://grantmcwilliams.com/tech/programming/downloads/category/1-bash-scripts
The commandline I used to do the actual encryption is as follows
cryptsetup -q -c aes-cbc-essiv:sha256 -h ripemd160 luksFormat /dev/loop0 ./secure.key
The secure.key file you'll need to create yourself if you want to duplicate this benchmark. You can make a keyfile by reading data out of /dev/urandom. I won't get into how secure this data is here as I'm mainly interested in testing speed. If you want to Cryptsetup or Truecrypt in production you may want to research keyfiles more thoroughly. It should be noted that Truecrypt has a decent keyfile generator.
sudo dd if=/dev/random of=./secure.key bs=1 count=256
Once I had an encrypted device in ram I used dd to write a large file into it using the following command string
dd if=/dev/zero of=/media/cryptdisk/bigfile.bin bs=10240 count=25000
I found in my research mentions of Padlock needing it's data aligned in order for the hardware encryption to do it's job. I didn't have time to research this but if anyone has an insight I'd like to hear it.
There's a couple of things to note here. The encryption/decryption numbers I got from the software decryptors were rock solid in that they varied very little. The numbers I got from from Core i5 varied by 100 MB/sec and even put encrypted ramdisk speed higher than unencrypted ramdisk speed part of the time. I'm not sure why this variance exists so to get as reliable numbers as possible I ran it a bunch of times and took the average for both encrypted and unencrypted ramdisks. Maybe the variance has to do with immature drivers or possibly even bottlenecks in the CPU that don't always appear. At this point it's just speculation.
The other question these numbers brings up is what happened to Padlock? We can write to our ramdisk at 111 MB/sec but when encrypted we can only write at 31 MB/sec even with hardware encryption? To get some answers I ran some more tests just on the VIA C3 where I stressed the write speeds of the ramdisk, the mounted loopfile in ram and the encrypted loopfile in ram. This shows something very interesting, the C3 doesn't seem to have enough CPU power to handle the overhead of several layers of filesystems and loopbacks. This is a guess of course but you can see that we can write to the ramdisk at 111 MB/sec but to a mounted ext2 formatted file in ramdisk is just 44 MB/sec. We lost 67 MB/sec just in that process so something is a little off. We only lost another 13 MB/sec in the encryption process. I also tested writing to a hard drive partition, a loop device inside a hard drive partition and an encrypted loop device in a hard drive partition. According to hdparm the drive itself can read at about 30 MB/sec so I wasn't expecting too much here. The result was that I can write to the harddrive at 19 MB/sec, to an unencrpyted loopback file on the harddrive at 19 MB/sec and to an encrypted loopback file on the harddrive at 11 MB/sec. It's clear that Padlock works best over a network.
I think the summary here is that VIA did a great job on Padlock and saddled it with a pathetic CPU.
AES-NI rocks but equal results can be had by throwing a lot of horsepower at it. Soon I'll be working on an Intel Hexacore with AES-NI and will be excited to see what kind of numbers I can get out of it. I also think that AMD has no other choice but to jump on the AES bandwagon because Intel is killing them. The AES decrypt speed on Intels Hexacore has been clocked at over 3 GB/sec. I'll be benchmarking one later in the summer.
I'd be interested in playing with the new VIA's new 1.4 Ghz quad core Isaiah chips with Padlock in them. As a general purpose CPU it's posting faster numbers than AMD's new Brazos CPU and I'd bet the AES performance on it is heads above any of the other CPUs. Maybe my mini-ITX router board will get replaced in the near future...