We played around with different tools to benchmark our LBaaS (Haproxy) performance, few of them being Tsung, iperf, ab, etc. After some discussion and investigation, we boiled down to providing 2 metrics to our customers for LBaaS:
- Requests/second
- Throughput
To get optimal performance off our apache servers and LBaaS, we did some TCP optimizations on apache server and the hypervisor where our LBaaS resides
# /etc/sysctl.conf
# Increase system file descriptor limit
fs.file-max = 100000
# Discourage Linux from swapping idle processes to disk (default = 60)
vm.swappiness = 10
# Increase ephermeral IP ports
net.ipv4.ip_local_port_range = 10000 65000
# Increase Linux autotuning TCP buffer limits
# Set max to 16MB for 1GE and 32M (33554432) or 54M (56623104) for 10GE
# Don't set tcp_mem itself! Let the kernel scale it based on RAM.
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.core.optmem_max = 40960
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# Make room for more TIME_WAIT sockets due to more clients,
# and allow them to be reused if we run out of sockets
# Also increase the max packet backlog
net.core.netdev_max_backlog = 50000
net.ipv4.tcp_max_syn_backlog = 30000
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 1
# Disable TCP slow start on idle connections
net.ipv4.tcp_slow_start_after_idle = 0
# If your servers talk UDP, also up these limits
net.ipv4.udp_rmem_min = 8192
net.ipv4.udp_wmem_min = 8192
# Disable source routing and redirects
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
# Log packets with impossible addresses for security
net.ipv4.conf.all.log_martians = 1
We also observed during our tests that the default Haproxy limits the maximum concurrent connections on the loadbalancer to around 4000. We modified the haproxy.cfg file to increase the open file limit and the maxconn to cater to large number of concurrent connections:
global
daemon
user nobody
group nogroup
ulimit-n 200000
maxconn 50000
We conducted different tests with Apache Benchmarking tool with different payload sizes, the standard "It works" index.html page which is 177 Bytes, 1KB, 10KB and 100KB. We also tweaked the number of parallel/concurrent connections while measuring the performance. Following tables explained the results we achieved. We were running 16 vCPU, 30GB Large Apache2 servers for our testing purpose.
HTTP Testing:
1) Requests/second to apache server(across 1 million requests and 3 iterations):
Parallel/ Response size | 10 | 100 | 1K | 5K | 10K | 20K |
177 bytes (index) | 37459 | 35022 | 39330 | 37600 | 35237 | 28000 |
1K | 28000 | 34000 | 37000 | 36000 | 30833 | 26922 |
100K | 11031 | 10562 | 4100 | 4670 | 4433 | 3240 |
2) Requests/second through LBaaS with 2 backend servers (across 1 million requests and 3 iterations):
Parallel/ Response size | 10 | 100 | 1K | 5K | 10K | 20K |
177 bytes (index) | 20812 | 49200 | 40354 | 37000 | 37000 | 36000 |
1K | 20082 | 47800 | 42000 | 35069 | 31404 | 34415 |
100K | 5026 | 6471 | 5682 | 4716 | 4781 | 5148 |
SSL Testing:
1) HTTPS Requests/second to apache server (across 1 million requests and 3 iterations):
Parallel/ Response size | 10 | 100 | 1K | 5K | 10K | 20K |
177 bytes (index) | 21878 | 24596 | 24097 | 21088 | 20095 | 19034 |
1K | 19055 | 22517 | 20069 | 18657 | 18498 | 17633 |
100K | 2022 | 2322 | 2467 | 1967 | 1944 | 1768 |
2) HTTPS requests/second through LBaaS with 2 backend servers (across 1 million requests and 3 iterations):
Parallel/ Response size | 10 | 100 | 1K | 5K | 10K | 20K |
177 bytes (index) | 20071 | 26510 | 19210 | 20633 | 19435 | 19735 |
1K | 18647 | 24895 | 19849 | 19344 | 18678 | 18498 |
100K | 1987 | 2452 | 2322 | 1988 | 2010 | 2068 |
Throughput Testing:
With LBaaS sitting in middle of client and server, there is definitely some impact on the maximum throughput we can get over the wire. Following are the throughputs we observed:
1) Throughput between client and server without LBaaS (measured across 30s of iperf traffic):
Parallel/ | 1 | 10 | 100 | 1000 | 10000 |
Throughput | 8.55 Gbps | 8.81 Gbps | 7.38 Gbps | 6 Gbps | 4.6Gbps |
2) Throughput between client and server with LBaaS (measured across 30s of iperf traffic):
Parallel/ | 1 | 10 | 100 | 1000 | 10000 |
Throughput | 3.60 Gbps | 3.43 Gbps | 3.42 Gbps | 3.09 Gbps | 3.90 Gbps |
We did some tweaking with haproxy config (haproxy.cfg) and figured out that increasing the number of haproxy processes increases throughput significantly. The config change we did was:
global
daemon
user nobody
group nogroup
ulimit-n 200000
maxconn 50000
nbproc 4
Parallel/ | 1 | 10 | 100 | 1000 | 10000 |
Throughput | 3.23 Gbps | 7.28 Gbps | 6.81 Gbps | 6.78 Gbps | 6.33 Gbps |
Please note that this config change is still not in the code, we would require some change in vrouter agent to handle this new config.