The Advanced Encryption Standard Instruction Set (AES-NI) is an extension to the x86 architecture for microprocessors from Intel and AMD. The purpose of AES-NI is to improve the speed of applications performing encryption and decryption using the Advanced Encryption Standard (AES) like the AES-128 and AES-256 ciphers. AES-NI was designed to provide 4x to 8x speed improvements when using AES ciphers for bulk data encryption and decryption.
AES accelerated CPUs can increase efficiency and performance when setting up an SSL Terminator for your HTTP web cluster, a VPN link, a sshfs file system mount or moving bulk data over an SSH connection using scp or rsync.
The following table lists the results of a quick study of various ciphers used on desktop, laptop and mobile devices. The benchmarks focus on the ciphers available to TLS v1.2 and TLS v1.3 connections made by HTTP/2 , HTTPS clients. The ChaCha20 cipher is used as our baseline. ChaCha20 is a 256 bit stream cipher which is not AES accelerated and relies on raw CPU processing power. The other ciphers are 128 bit and 256 bit AES ciphers which are accelerated by the CPU through AES-NI when AES-NI is enabled through the BIOS. LibreSSL (OpenSSL) is used to test all ciphers on various CPUs we have access to. All numbers are in Megabytes per Second (MB/s) per single CPU core. Higher values are better.
AES Performance per CPU core for TLS v1.2 Ciphers -- Higher Score is Better, Speeds are in Megabytes per Second per CPU core -- ChaCha20 AES-128-GCM AES-256-GCM AES-128-CBC AES-256-CBC Total Score Intel Gold 5412U 719 3321 2957 1885 1381 = 10263 AMD Ryzen 7 1800X 573 3006 2642 1513 1101 = 8835 Intel W-2125 565 2808 2426 1698 1235 = 8732 Intel i7-6700 585 2607 2251 1561 1131 = 8135 Intel Silver 4410Y 519 2386 2123 1353 992 = 7373 Intel Gold 5217 598 2344 2018 1396 1014 = 7370 AMD EPYC 7702 410 2464 2175 1241 904 = 7194 Intel Silver 4215 566 2218 1910 1324 963 = 6981 AMD EPYC 7551 355 2213 1962 1114 811 = 6455 AMD EPYC 7402P 493 2478 2184 1244 907 = 6062 Intel i5-6500 410 1729 1520 1078 783 = 5520 Intel i7-4750HQ 369 1556 1353 688 499 = 4465 AMD FX 8350 367 1453 1278 716 514 = 4328 AMD FX 8150 347 1441 1273 716 515 = 4292 Intel E5-2650 v4 404 1479 1286 652 468 = 4289 Intel i7-2700K 382 1353 1212 763 552 = 4262 Intel i7-3840QM 373 1279 1143 725 520 = 4040 Intel i5-2500K 358 1274 1140 728 522 = 4022 AMD FX 6100 326 1344 1186 671 481 = 4008 AMD A10-7850K 321 1303 1176 685 499 = 3984 AMD A8-7600 Kaveri 306 1246 1108 648 470 = 3778 Intel E5-2640 v3 303 1286 1126 585 419 = 3719 AMD Opteron 6380 293 1203 1063 589 423 = 3571 AMD Opteron 6378 282 1138 986 561 406 = 3373 AMD Opteron 6274 232 1054 926 524 376 = 3112 Intel Xeon E5-2630 247 962 864 541 394 = 3008 Intel Xeon E5645 262 817 717 727 524 = 3047 Intel i7-2635QM 151 989 881 564 404 = 2989 Intel Xeon L5630 225 701 610 626 450 = 2612 Intel E5-2603 v4 236 866 754 382 274 = 2512 AMD Opteron 2382 249 651 485 215 150 = 1750 Intel i7-950 401 256 218 358 257 = 1490 Intel Xeon X5550 287 205 175 305 219 = 1191 AMD Phenom 965 404 84 63 282 198 = 1031 Intel Core2 Q9300 231 126 133 221 161 = 872 AMD X4 610e 225 59 44 198 139 = 665 Intel Core2 Q6600 173 141 79 108 77 = 578 Intel P4 3Ghz Will 109 26 23 55 43 = 256 Intel ATOM D525 98 51 43 28 20 = 240 Snapdragon S4 Pro 131 41 - - - = 172 ARM Cortex A9 73 24 - - - = 97 Testing Notes: AES-NI acceleration enabled if supported by BIOS and CPU Speeds in megabytes per second (MB/s) per real cpu core 8192 byte blocks Five(5) test runs, the average speed reported Snapdragon and ARM Cortex values reported by Google Developers
Let's say we have a project with a 10 gigabit connection to the internet. 10 gigabits per second is 1,250 megabytes per second. The web page designers are expecting the web server to concurrently encrypt and decrypt enough data to saturate the 10 gigabit connection. Let's also say 100% of our clients are using the AES-128-GCM based cipher just to make it easier to compare numbers from the table above.
We will need a CPU which can processes 1,250 MB/s of AES encrypted data per cpu core. Since we need to recieve (decrypt) and send (encrypt) data, the CPU should support at least two(2) CPU cores, each able to sustain 1,250 MB/s. From the test results above, any of the CPUs starting with the "AMD Opteron 6380" and faster would work perfectly as the "AMD Opteron 6380" can process 1,203 megabytes per second of AES data per CPU core. Note that the AMD Opteron 6380 is a 16 core CPU which leaves plenty of other CPU cores to do other work like network I/O, firewall rules or ZFS file system work.
In the real world the situation would be more complicated. Clients connect with a variety of ciphers and the system is not dedicated to just cipher processing. It is also possible that the cipher processing of multiple cpu cores can be added together to reach the desired speed. The "Intel Xeon L5630" has four cores and each core could processes 701 MB/s of AES data for a around 2,804 MB/s; just enough speed for encrypting and decrypting data on a 10 gigabit link using AES-128-GCM.
Note that AES-NI is only supported by real CPU cores and not hyper threaded (HT) or virtual cores.
Using the following commands, download and build LibreSSL. The build process statically builds the LibreSSL binaries and libraries in the local directory. No files are installed to the system. Once the build is done, run each of the cipher speed tests with a 10 second sleep in between to make sure the load of the machine reached zero(0). When you are done testing, delete the build directory and everything is cleaned up.
# NOTE: Ubuntu requires GCC and GNU Make to compile libressl # sudo apt install gcc make cd /tmp wget -4 http://ftp.openbsd.org/pub/OpenBSD/LibreSSL/libressl-3.9.2.tar.gz tar zxvf libressl-3.9.2.tar.gz cd libressl-3.9.2 ./configure && make && echo SUCCESS ./apps/openssl/openssl speed -elapsed -evp chacha sleep 10 ./apps/openssl/openssl speed -elapsed -evp aes-128-gcm sleep 10 ./apps/openssl/openssl speed -elapsed -evp aes-256-gcm sleep 10 ./apps/openssl/openssl speed -elapsed -evp aes-128-cbc sleep 10 ./apps/openssl/openssl speed -elapsed -evp aes-256-cbc echo FINISHED
The LibreSSL (OpenSSL) cipher speed test will print out a few lines of output per test performed. The value we are interested in is on the last line under the label "8192 bytes". Our interests are focused on bulk data transfers and "8192 bytes" is the largest block test shown. The "8192 bytes" value is the amount of data the CPU can process using the cipher specified in thousands of bytes per second. Divide the value shown by one(1) thousand to get megabytes per second which is the same as our results in the table above.
# use dmesg and search for the cpu type. for example, $ dmesg | grep CPU0 [ 0.120426] smpboot: CPU0: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (fam: 06, model: 5e, stepping: 03) # run the series of cipher speed tests, chacha is first... $ ./apps/openssl/openssl speed -elapsed -evp chacha You have chosen to measure elapsed time instead of user CPU time. Doing chacha for 3s on 16 size blocks: 66892965 chacha's in 3.00s Doing chacha for 3s on 64 size blocks: 25017290 chacha's in 3.00s Doing chacha for 3s on 256 size blocks: 6502076 chacha's in 3.00s Doing chacha for 3s on 1024 size blocks: 1692776 chacha's in 3.00s Doing chacha for 3s on 8192 size blocks: 214511 chacha's in 3.00s The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes chacha 356762.48k 533702.19k 554843.82k 577800.87k 585758.04k <---- ... the result is 585758.04k / 1000 = 585 MB/s $ ./apps/openssl/openssl speed -elapsed -evp aes-128-gcm You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-gcm for 3s on 16 size blocks: 134661060 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 64 size blocks: 79432576 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 256 size blocks: 28895019 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 1024 size blocks: 7559486 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 8192 size blocks: 954887 aes-128-gcm's in 3.00s The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-gcm 718192.32k 1694561.62k 2465708.29k 2580304.55k 2607478.10k <---- ... the result is 2607478.10k / 1000 = 2,607 MB/s $ ./apps/openssl/openssl speed -elapsed -evp aes-256-gcm You have chosen to measure elapsed time instead of user CPU time. Doing aes-256-gcm for 3s on 16 size blocks: 125601150 aes-256-gcm's in 3.00s Doing aes-256-gcm for 3s on 64 size blocks: 75507034 aes-256-gcm's in 3.00s Doing aes-256-gcm for 3s on 256 size blocks: 25591359 aes-256-gcm's in 3.00s Doing aes-256-gcm for 3s on 1024 size blocks: 6547497 aes-256-gcm's in 3.00s Doing aes-256-gcm for 3s on 8192 size blocks: 824454 aes-256-gcm's in 3.00s The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-gcm 669872.80k 1610816.73k 2183795.97k 2234878.98k 2251309.06k <---- ... the result is 2251309.06k / 1000 = 2,251 MB/s $ ./apps/openssl/openssl speed -elapsed -evp aes-128-cbc You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-cbc for 3s on 16 size blocks: 250707357 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 64 size blocks: 71204109 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 256 size blocks: 18108237 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 1024 size blocks: 4563775 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 8192 size blocks: 571798 aes-128-cbc's in 3.00s The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 1337105.90k 1519020.99k 1545236.22k 1557768.53k 1561389.74k <---- ... the result is 1561389.74k / 1000 = 1,561 MB/s $ ./apps/openssl/openssl speed -elapsed -evp aes-256-cbc You have chosen to measure elapsed time instead of user CPU time. Doing aes-256-cbc for 3s on 16 size blocks: 185732038 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 64 size blocks: 51745988 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 13073843 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 3280738 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 414517 aes-256-cbc's in 3.00s The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 990570.87k 1103914.41k 1115634.60k 1119825.24k 1131907.75k <---- ... the result is 1131907.75k / 1000 = 1,131 MB/s
Tip: take a look at the Nginx server resource sizing guide for deploying Nginx on bare metal servers and the Nginx testing methodology. The guide shows graduated hardware configurations and how many requests per second, transactions per second and total throughput an https server could achieve.
AES Performance per CPU core for TLS v1.2 Ciphers (Higher is Better, Speeds in Megabytes per Second) ChaCha20 AES-128-GCM AES-256-GCM AES-128-CBC AES-256-CBC Total Score Intel i5-6500 2762 4900 3554 1067 780 = 13063 OpenSSL v1.1.1a 1760 4455 3370 460 402 = 10447 BoringSSL v2017_12 410 1729 1520 1078 783 = 5520 LibreSSL v3.0.1 ### ############### Testing Results ################## ### dmesg | grep -i CPU CPU: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (3192.14-MHz K8-class CPU) cd /tmp wget http://ftp.openbsd.org/pub/OpenBSD/LibreSSL/libressl-3.0.1.tar.gz tar zxvf libressl-3.0.1.tar.gz cd libressl-3.0.1 ./configure && make && echo SUCCESS ./apps/openssl/openssl speed -elapsed -evp chacha The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes chacha 229894.55k 374728.51k 401326.42k 407606.34k 410545.95k ^^^ ./apps/openssl/openssl speed -elapsed -evp aes-128-gcm The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-gcm 578578.66k 1037298.77k 1496023.55k 1667607.21k 1729668.50k ^^^^ ./apps/openssl/openssl speed -elapsed -evp aes-256-gcm The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-gcm 514792.29k 953548.57k 1340996.10k 1478150.01k 1520833.77k ^^^^ ./apps/openssl/openssl speed -elapsed -evp aes-128-cbc The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 1070909.28k 1059120.83k 1084207.69k 1090894.01k 1078315.69k ^^^^ ./apps/openssl/openssl speed -elapsed -evp aes-256-cbc The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 806110.46k 767273.81k 793146.46k 803538.08k 783499.41k ^^^ cd /tmp wget https://www.openssl.org/source/openssl-1.1.1a.tar.gz tar zxvf openssl-1.1.1a.tar.gz cd openssl-1.1.1a ./config && make cp /tmp/openssl-1.1.1a/libssl.so.1.1 /usr/local/lib/ cp /tmp/openssl-1.1.1a/libcrypto.so.1.1 /usr/local/lib/ ./apps/openssl speed -elapsed -evp chacha20 The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes chacha20 320078.35k 547365.25k 1287720.93k 2649847.21k 2762595.49k 2769084.88k ^^^^ ./apps/openssl speed -elapsed -evp aes-128-gcm The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-gcm 453159.25k 1215246.40k 2437021.95k 3909602.78k 4900248.28k 4996923.22k ^^^^ ./apps/openssl speed -elapsed -evp aes-256-gcm The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-gcm 397133.57k 1118061.03k 2050411.88k 3017616.18k 3554319.58k 3603072.56k ^^^^ ./apps/openssl speed -elapsed -evp aes-128-cbc The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-cbc 812677.93k 1037389.63k 1066182.04k 1068901.72k 1067816.15k 1074969.69k ^^^^ ./apps/openssl speed -elapsed -evp aes-256-cbc The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-cbc 720262.90k 757488.79k 775043.00k 776824.49k 780029.74k 792199.17k git clone https://boringssl.googlesource.com/boringssl cmake -GNinja -DCMAKE_BUILD_TYPE=Release .. && ninja cd build/tools ./bssl speed ... Did 544000 AES-128-GCM (8192 bytes) seal operations in 1000170us (543907.5 ops/sec): 4455.7 MB/s Did 412000 AES-256-GCM (8192 bytes) seal operations in 1001476us (411392.8 ops/sec): 3370.1 MB/s Did 215000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1000321us (214931.0 ops/sec): 1760.7 MB/s ... Did 57000 AES-128-CBC-SHA1 (8192 bytes) seal operations in 1014216us (56201.0 ops/sec): 460.4 MB/s Did 50000 AES-256-CBC-SHA1 (8192 bytes) seal operations in 1018187us (49106.9 ops/sec): 402.3 MB/s
# cpu example type: AMD FX 6100 $ dmesg | grep -i cpu [ 0.277326] smpboot: CPU0: AMD FX(tm)-6100 Six-Core Processor (fam: 15, model: 01, stepping: 02) OpenSSL AES-NI = OFF $ OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-gcm You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-gcm for 3s on 16 size blocks: 11810234 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 64 size blocks: 3458208 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 256 size blocks: 2269863 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 1024 size blocks: 612727 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 8192 size blocks: 77820 aes-128-gcm's in 3.00s The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-gcm 62987.91k 73775.10k 193694.98k 209144.15k 212500.48k ... the result is 212500.48k / 1000 = 212 MB/s OpenSSL AES-NI = ON $ openssl speed -elapsed -evp aes-128-gcm You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-gcm for 3s on 16 size blocks: 47814322 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 64 size blocks: 32192031 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 256 size blocks: 13198683 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 1024 size blocks: 3757898 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 8192 size blocks: 497117 aes-128-gcm's in 3.00s The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-gcm 255009.72k 686763.33k 1126287.62k 1282695.85k 1357460.82k ... the result is 1357460.82k / 1000 = 1,357 MB/s
# Test calomel.org using the client/server negotiated cipher echo -n | ./apps/openssl/openssl s_client -connect calomel.org:443 # Test calomel.org using the ChaCha cipher echo -n | ./apps/openssl/openssl s_client -cipher ECDHE-ECDSA-CHACHA20-POLY1305 -connect calomel.org:443 # Test calomel.org using the AES-128-GCM cipher echo -n | ./apps/openssl/openssl s_client -cipher ECDHE-ECDSA-AES128-GCM-SHA256 -connect calomel.org:443