Kako smo fino podesili HAProxy da bismo postigli 2.000.000 istodobnih SSL veza

Ako pažljivo pogledate gornju snimku zaslona, ​​pronaći ćete dvije važne informacije:

  1. Ovaj stroj ima 2,38 milijuna uspostavljenih TCP veza i
  2. Količina RAM-a koji se koristi iznosi oko 48 Gigabajta .

Prilično super zar ne? Ono što bi bilo još strašnije je da je netko pružio komponente za postavljanje i podešavanja potrebna za postizanje ove vrste skale na jednom HAProxy stroju. Pa, učinit ću upravo to u ovom postu;)

Ovo je završni dio višedijelne serije o ispitivanju opterećenja HAProxy. Ako imate vremena, preporučujem vam da prvo odete i pročitate prva dva dijela u seriji. To će vam pomoći da shvatite kako su podešavanja razine jezgre potrebna na svim strojevima u ovom postavljanju.

Ispitivanje opterećenja HAProxy (1. dio)

Ispitivanje opterećenja? HAProksi? Ako vam se sve ovo čini grčkim, ne brinite. Pružit ću ugrađene linkove za čitanje o tome što ... medium.com Ispitivanje učitavanja HAProxy (2. dio)

Ovo je drugi dio u trodijelnoj seriji o ispitivanju performansi poznatog TCP uravnoteživača opterećenja i obrnutog proxyja ... medium.com

Puno je malih komponenata koje su nam pomogle da okupimo cjelokupnu postavku i postignemo ove brojke.

Prije nego što vam kažem konačnu konfiguraciju HAProxy koju smo koristili (ako ste super nestrpljivi možete se pomaknuti na dno), želim se nadograditi na nju provodeći vas kroz naše razmišljanje.

Što smo htjeli testirati

Komponenta koju želimo testirati bila je HAProxy verzija 1.6. Ovo trenutno koristimo u proizvodnji na 4 jezgre, 30 Gig strojeva. Međutim, sva povezanost nije zasnovana na SSL-u.

Iz ove vježbe željeli smo testirati dvije stvari:

  1. Postotak povećanja CPU kad smo pomak cijeli opterećenje od ne-SSL veze s SSL veze. Upotreba procesora definitivno bi se trebala povećati zbog duljeg 5-smjernog rukovanja, a zatim i šifriranja paketa.
  2. Drugo, željeli smo testirati ograničenja naše trenutne produkcijske postavke u smislu broja zahtjeva i maksimalnog broja istodobnih veza koje se mogu podržati prije nego što se izvedba počne pogoršavati.

Prvi smo dio zatražili zbog uvođenja glavne značajke koja je u punom jeku, a koja zahtijeva komunikaciju putem SSL-a. Trebali smo drugi dio kako bismo smanjili količinu hardvera namijenjenog proizvodnji HAProxy strojevima.

Uključene komponente

  • Više klijentskih strojeva koji ističu HAProxy.
  • Pojedinačni HAProxy stroj verzije 1.6 na raznim postavkama

    * 4 jezgre, 30 giga

    * 16 jezgri, 30 giga

    * 16 jezgri, 64 giga

  • Backend poslužitelji koji će pomoći u podršci svim tim istodobnim vezama.

HTTP i MQTT

Ako ste prošli prvi članak iz ove serije, trebali biste znati da je cijela naša infrastruktura podržana kroz dva protokola:

  • HTTP i
  • MQTT.

U našem stogu ne koristimo HTTP 2.0 i stoga nemamo funkcionalnost trajnih veza na HTTP-u. Dakle, u proizvodnji maksimalan broj TCP veza koji vidimo je negdje oko (2 * 150k) na jednom HAProxy stroju (ulazni + izlazni). Iako je broj istodobnih veza prilično nizak, broj zahtjeva u sekundi prilično je velik.

S druge strane, MQTT je potpuno drugačiji način komunikacije. Nudi izvrsnu kvalitetu parametara usluge i trajnu povezanost. Dakle, dvosmjerna kontinuirana komunikacija može se dogoditi preko MQTT kanala. Što se tiče HAProxy-a koji podržava MQTT (temeljne TCP) veze, vidimo negdje oko 600–700 000 TCP veza u vrhuncu na jednom stroju.

Željeli smo napraviti test učitavanja koji će nam dati precizne rezultate i za HTTP i za MQTT veze.

Postoji mnogo alata koji nam pomažu da lako učitamo HTTP poslužitelj, a mnogi od njih pružaju napredne funkcije kao što su sažeti rezultati, pretvaranje rezultata temeljenih na tekst u grafikone itd. Međutim, nismo mogli pronaći nijedan alat za testiranje otpornosti na stres za MQTT. Imamo alat koji smo sami razvili, ali nije bio dovoljno stabilan da izdrži takvu vrstu opterećenja u vremenskom okviru koji smo imali.

Stoga smo odlučili otići na klijente za testiranje opterećenja za HTTP i simulirajući postavku MQTT koristeći isti;) Zanimljivo zar ne?

Pa pročitajte dalje.

Početno postavljanje

Ovo će biti dugačak post jer ću pružiti puno detalja za koje mislim da bi bilo od velike pomoći nekome tko radi slična ispitivanja opterećenja ili fina podešavanja.

  • Za početno postavljanje HAProxyja uzeli smo 16-jezgreni stroj od 30 giga. Nismo išli s našim trenutnim proizvodnim postavkama jer smo mislili da bi CPU pogodio zbog SSL prekida koji se dogodio na kraju HAProxy-a, da bi bio strašan.
  • Za kraj poslužitelja išli smo s jednostavnim NodeJs poslužiteljem koji odgovara s pongprimanjem pingzahtjeva.
  • Što se tiče klijenta, u početku smo koristili Apache Bench. Razlog s kojim smo pošli abbio je taj što je bio vrlo poznat i stabilan alat za testiranje opterećenja HTTP krajnjih točaka, a također i zato što pruža prekrasne sažete rezultate koji bi nam puno pomogli.

abAlat nudi mnogo zanimljivih parametara koji se koriste za potrebe testa opterećenja kao što su:

  • - c, concurrency Određuje broj istodobnih zahtjeva koji bi pogodili poslužitelj.
  • -n, no. of requests Kao što naziv sugerira, određuje ukupan broj zahtjeva trenutnog pokretanja učitavanja.
  • -p POST file Sadrži tijelo zahtjeva za POST (ako je to ono što želite testirati.)

Ako pažljivo pogledate ove parametre, vidjet ćete da je moguće podesiti sve tri permutacije. Uzorak ab zahtjeva izgledao bi otprilike ovako

ab -S -p post_smaller.txt -T application/json -q -n 100000 -c 3000 //test.haproxy.in:80/ping

Uzorak rezultata takvog zahtjeva izgleda otprilike ovako

Brojevi koji su nas zanimali bili su

  • 99% latencije.
  • Vrijeme po zahtjevu.
  • Broj neuspjelih zahtjeva.
  • Zahtjevi u sekundi.

Najveći je problem abšto ne pruža parametar za kontrolu broja zahtjeva u sekundi. Morali smo podesiti razinu istodobnosti da bismo dobili željene zahtjeve u sekundi, a to dovodi do puno tragova i pogrešaka.

Svemogući graf

Nismo mogli nasumično raditi višestruke pokrete učitavanja i dalje dobivati ​​rezultate jer nam to ne bi davalo nikakve značajne informacije. Morali smo provesti ove testove na neki specifičan način kako bismo iz toga izvukli značajne rezultate. Pa smo slijedili ovaj grafikon

Ovaj grafikon navodi da će do određenog trenutka, ako nastavimo povećavati broj zahtjeva, latencija ostati gotovo ista. Međutim, nakon određene točke preokreta , latencija će se početi eksponencijalno povećavati. To je točka preokreta za stroj ili postav koji smo namjeravali izmjeriti.

Ganglije

Prije nego što pružim neke rezultate ispitivanja, želio bih spomenuti Ganglia.

Ganglia je skalabilni distribuirani nadzorni sustav za računalne sustave visokih performansi poput klastera i mreža.

Look at the following screenshot of one of our machines to get an idea about what ganglia is and what sort of information it provides about the underlying machine.

Pretty interesting, eh?

Moving on, we constantly monitored ganglia for our HAProxy machine to monitor some important things.

  1. TCP established This tells us the total number of tcp connections established on the system. NOTE: this is the sum of inbound as well as outbound connections.
  2. packets sent and received We wanted to see the total number of tcp packets being sent and received by our HAProxy machine.
  3. bytes sent and received This shows us the total data that we sent and received by the machine.
  4. memory The amount of RAM being used over time.
  5. network The network bandwidth consumption because of the packets being sent over the wire.

Following are the known limits found via previous tests/numbers that we wanted to achieve via our load test.

700.000 TCP uspostavljenih veza,

Poslano 50k paketa, primljeno 60k paketa,

10–15 MB bajtova poslanih kao i primljenih,

14-15 memorija giga na vrhuncu,

7MB mreža.

ALL these values are on a per second basis

HAProksi Nbproc

U početku kada smo započeli s testiranjem učitavanja HAProxy, otkrili smo da je s SSL-om CPU bio prilično rano pogođen u procesu, ali zahtjeva u sekundi bilo je vrlo malo. Istražujući najvišu naredbu, otkrili smo da HAProxy koristi samo 1 jezgru. Dok smo imali na raspolaganju 15 dodatnih jezgri.

Guglanje oko 10 minuta dovelo nas je do pronalaska ove zanimljive postavke u HAProxyu koja HAProxy omogućuje upotrebu više jezgri.

Zove se nbproci da biste bolje razumjeli što je to i kako ga postaviti, pogledajte ovaj članak:

//blog.onefellow.com/post/82478335338/haproxy-mapping-process-to-cpu-core-for-maximum

Tuning this setting was the base of our load testing strategy moving forward. Because the ability to use multiple cores by HAProxy gave us the power to form multiple combinations for our load testing suite.

Load Testing with AB

When we had started out with our load testing journey, we were not clear on the things we should be measuring and what we need to achieve.

Initially we had only one goal in mind and that was to find the tipping point only by variation of all the below mentioned parameters.

I maintained a table of all the results for the various load tests that we gave. All in all I gave over 500 test runs to get to the ultimate result. As you can clearly see, there are a lot of moving parts to each and every test.

Single Client issues

We started seeing that the client was becoming bottleneck as we kept on increasing our requests per second. Apache bench uses a single core and from the documentation it is evident that it does not provide any feature for using multiple cores.

To run multiple clients efficiently we found an interesting linux utility called Parallel. As the name suggests, it helps you run multiple commands in parallel and utilises multiple cores. Exactly what we wanted.

Have a look at a sample command that runs multiple clients using parallel.

cat hosts.txt | parallel 'ab -S -p post_smaller.txt -T application/json -n 100000 -c 3000 {}'
[email protected]:~$ cat hosts.txt//test.haproxy.in:80/ping//test.haproxy.in:80/ping//test.haproxy.in:80/ping

The above command would run 3 ab clients hitting the same URL. This helped us remove the client side bottleneck.

The Sleep and Times parameter

We talked about some parameters in ganglia that we wanted to track. Lets discuss them once by one.

  1. packets sent and received This can be simulated by sending some data as a part of the post request. This would also help us generate some network as well as bytes sent and received portions in ganglia
  2. tcp_established This is something which took us a long, long time to actually simulate in our scenario. Imagine if a single ping request takes about a second, that would take us about 700k requests per second to reach our tcp_established milestone.

    Now this number might seem easier to achieve on production, but it was impossible to generate it in our scenario.

What did we do you might ask? We introduced a sleep parameter in our POST call that specifies the number of milliseconds the server needs to sleep before sending out a response. This would simulate a long running request on production. So now say we have a sleep of about 20 minutes (Yep), that would take us around 583 requests per second to reach the 700k mark.

Additionally, we also introduced another parameter in our POST calls to the HAProxy and that was the times parameter. That specified number of times the server should write a response on the tcp connection before terminating it. This helped us simulated even more data transferred over the wire.

Issues with apache bench

Although we found out a lot of results with apache bench, we also faced a lot of issues along the way. I won’t be mentioning all of them here as they are not important for this post as I’ll be introducing another client shortly.

We were pretty content with the numbers we were getting out of apache bench, but at one point of time, generating the required tcp connections just became impossible. Somehow the apache bench was not handling the sleep parameter we had introduced, properly and was not scaling for us.

Although running multiple ab clients on a single machine was sorted out by using the parallel utility. Running this setup across multiple client machines was still a pain for us. I had not heard of the pdsh utility by then and was practically stuck.

Also, we were not focussing on any timeouts as well. There are some default set of timeouts on the HAProxy, the ab client and the server and we had completely ignored these. We figured out a lot of things along the way and organized ourselves a lot on how to go about testing.

We used to talk about the tipping point graph but we deviated a lot from it as time went on. Meaningful results, however, could only be found by focusing on that.

With apache bench a point came where the number of TCP connections were not increasing. We had around 40–45 clients running on 5–6 different client boxes but were not able to achieve the scale we wanted. Theoretically, the number of TCP connections should have jumped as we went on increasing the sleep time, but it wasn’t working for us.

Enter Vegeta

I was searching for some other load testing tools that might be more scalable and better functionality wise as compared to apache bench when I came across Vegeta.

From my personal experience, I have seen Vegeta to be extremely scalable and provides much better functionality as compared to apache bench. A single Vegeta client was able to produce the level of throughput equivalent to 15 apache bench clients in our load test.

Moving forward, I will be providing load test results that have been tested using Vegeta itself.

Load Testing with Vegeta

First, have a look at the command that we used to run a single Vegeta client. Interestingly, the command to put load on the backend servers is called attack :p

echo "POST //test.haproxy.in:443/ping" | vegeta -cpus=32 attack -duration=10m -header="sleep:30000" -body=post_smaller.txt -rate=2000 -workers=500 | tee reports.bin | vegeta report

Just love the parameters provided by Vegeta. Let’s have a look at some of these below.

  1. -cpus=32 Specifies the number of cores to be used by this client. We had to expand our client machines to 32core, 64Gig because of the amount of load to be generated. If you look closely above, the rate isn’t much. But it becomes difficult to sustain such a load when a lot of connections are in sleep state from the server end.
  2. -duration=10m I guess this is self explanatory. If you don’t specify any duration, the test will run forever.
  3. -rate=2000 The number of requests per second.

So as you can see above, we reached a hefty 32k requests per second on a mere 4 core machine. If you remember the tipping point graph, you will be able to notice it clearly enough above. So the tipping point in this case is 31.5k Non SSL requests.

Have a look at some more results from the load test.

16k SSL connections is also not bad at all. Please note that at this point in our load testing journey, we had to start from scratch because we had adopted a new client and it was giving us way better results than ab. So we had to do a lot of stuff again.

An increase in the number of cores led to an increase in the number of requests per second that the machine can take before the CPU limit is hit.

We found that there wasn’t a substantial increase in the number of requests per second if we increased the number of cores from 8 to 16. Also, if we finally decided to go with a 8 core machine in production, we would never allocate all of the cores to HAProxy or be it a any other process for that matter. So we decided to perform some tests with 6 cores as well to see if we had acceptable numbers.

Not bad.

Introducing the sleep

We were pretty satisfied with our load test results till now. However, this did not simulate the real production scenario. That happened when we introduced a sleep time as well which was absent till now in our tests.

echo "POST //test.haproxy.in:443/ping" | vegeta -cpus=32 attack -duration=10m -header="sleep:1000" -body=post_smaller.txt-rate=2000 -workers=500 | tee reports.bin | vegeta report

So a sleep time of 1000 milliseconds would lead to server sleeping for x amount of time where 0< x <; 1000 and is selected randomly. So on an average the above load test will give a latency of ≥ 500ms

The numbers in the last cell represent

TCP established, Packets Rec, Packets Sent

respectively. As you can clearly see the max requests per second that the 6 core machine can support has decreased to 8k from 20k. Clearly, the sleep has its impact and that impact is the increase in the number of TCP connections established. This is however nowhere near to the 700k mark that we set out to achieve.

Milestone #1

How do we increase the number of TCP connections? Simple, we keep on increasing the sleep time and they should rise. We kept playing around with the sleep time and we stopped at the 60 seconds sleep time. That would mean an average latency of around 30 sec.

There is an interesting result parameter that Vegeta provides and that is % of requests successful. We saw that with the above sleep time, only 50% of the calls were succeeding. See the results below.

We achieved a whooping 400k TCP established connections with 8k requests per second and 60000 ms sleep time. The R in 60000R means Random.

The first real discovery we made was that there is a default call timeout in Vegeta which is of 30 seconds and that explained why 50% of our calls were failing. So we increased that to about 70s for our further tests and kept on varying it as and when the need arose.

We hit the 700k mark easily after tweaking the timeout value from the client end. The only problem with this was that these were not consistent. These were just peaks. So the system hit a peak of 600k or 700k but did not stay there for very long.

Međutim, željeli smo nešto slično ovome

To pokazuje stabilno stanje u kojem se održava 780k veza. Ako pažljivo pogledate gornju statistiku, broj zahtjeva u sekundi vrlo je velik. Međutim, u proizvodnji imamo mnogo manje zahtjeva (negdje oko 300) na jednom stroju HAProxy.

Bili smo sigurni da ćemo, ako drastično smanjimo broj HAProxy-a koje imamo u proizvodnji (negdje oko 30, što znači 30 * 300 ~ 9k povezivanja u sekundi), prvo udariti u ograničenja stroja i broj TCP veza, a ne CPU.

Stoga smo odlučili postići 900 zahtjeva u sekundi i mrežu od 30 MB / s i uspostavljene veze od 2,1 milijuna TCP-a. Dogovorili smo se oko tih brojeva jer bi to bilo 3 puta više od našeg proizvodnog opterećenja na jednom HAProxyju.

Plus, till now we had settled on 6 cores being used by HAProxy. We wanted to test out 3 cores only because this is what would be easiest for us to roll out on our production machines (Our production machines, as mentioned before are 4 core 30 Gig. So for rolling out changes with nbproc = 3 would be easiest for us.

REMEMBER the machine we had at this point in time was 16 core 30 Gig machine with 3 cores being allocated to HAProxy.

Milestone #2

Now that we had max limits on requests per second that different variations in machine configuration could support, we only had one task left as mentioned above.

Achieve 3X the production load which is

  • 900 requests per second
  • 2.1 million TCP established and
  • 30 MB/s network.

We got stuck yet again as the TCP established were taking a hard hit at 220k. No matter what the number of client machines or what the sleep time was, number of TCP connections seemed to have stuck there.

Let’s look at some calculations. 220k TCP established connections and 900 requests per second = 110,000 / 900 ~= 120 seconds .I took 110k because 220k connections include both incoming and outgoing. So it’s two way.

Our doubt about 2 minutes being a limit somewhere in the system was verified when we introduced logs on the HAProxy side. We could see 120000 ms as total time for a lot of connections in the logs.

Mar 23 13:24:24 localhost haproxy[53750]: 172.168.0.232:48380 [23/Mar/2017:13:22:22.686] api~ api-backend/http31 39/0/2062/-1/122101 -1 0 - - SD-- 1714/1714/1678/35/0 0/0 {0,"",""} "POST /ping HTTP/1.1"
122101 is the timeout value. See HAProxy documentation on meanings of all these values. 

On investigating further we found out that NodeJs has a default request timeout of 2 minutes. Voila !

how to modify the nodejs request default timeout time?

I was using nodejs request, the default timeout of nodejs http is 120000 ms, but it is not enough for me, while my…stackoverflow.comHTTP | Node.js v7.8.0 Documentation

The HTTP interfaces in Node.js are designed to support many features of the protocol which have been traditionally…nodejs.org

But our happiness was apparently short lived. At 1.3 million, the HAProxy connections suddenly dropped to 0 and started increasing again. We soon checked the dmesg command that provided us some useful kernel level information for our HAProxy process.

Basically, the HAProxy process had gone out of memory. So we decided to increase the machine RAM and we shifted to 16 core 64 Gig machine with nbproc = 3 and because of this change we were able to reach 2.4 million connections.

Backend Code

Following is the backend server code that was being used. We had also used statsd in the server code to get consolidated data on requests per second that were being received by the client.

var http = require('http');var createStatsd = require('uber-statsd-client');qs = require('querystring');
var sdc = createStatsd({host: '172.168.0.134',port: 8125});
var argv = process.argv;var port = argv[2];
function randomIntInc (low, high){ return Math.floor(Math.random() * (high - low + 1) + low);}
function sendResponse(res,times, old_sleep){ res.write('pong'); if(times==0) { res.end(); } else { sleep = randomIntInc(0, old_sleep+1); setTimeout(sendResponse, sleep, res,times-1, old_sleep); }}
var server = http.createServer(function(req, res) headers = req.headers; old_sleep = parseInt(headers["sleep"]); times = headers["times"] );
server.timeout = 3600000;server.listen(port);

We also had a small script to run multiple backend servers. We had 8 machines with 10 backend servers EACH (yeah !). We literally took the idea of clients and backend servers being infinite for the load test, seriously.

counter=0while [ $counter -le 9 ]do port=$((8282+$counter)) nodejs /opt/local/share/test-tools/HikeCLI/nodeclient/httpserver.js $port & echo "Server created on port " $port
 ((counter++))done
echo "Created all servers"

Client Code

As for the client, there was a limitation of 63k TCP connections per IP. If you are not sure about this concept, please refer my previous article in this series.

So in order to achieve 2.4 million connections (two sided which is 1.2 million from the client machines), we needed somewhere around 20 machines. Its a pain really to run the Vegeta command on all 20 machines one by one and even of you found a way to do that using something like csshx, you still would need something to combine all the results from all the Vegeta clients.

Check out the script below.

result_file=$1
declare -a machines=("172.168.0.138" "172.168.0.141" "172.168.0.142" "172.168.0.18" "172.168.0.5" "172.168.0.122" "172.168.0.123" "172.168.0.124" "172.168.0.232" " 172.168.0.244" "172.168.0.170" "172.168.0.179" "172.168.0.59" "172.168.0.68" "172.168.0.137" "172.168.0.155" "172.168.0.154" "172.168.0.45" "172.168.0.136" "172.168.0.143")
bins=""commas=""
for i in "${machines[@]}"; do bins=$bins","$i".bin"; commas=$commas","$i; done;
bins=${bins:1}commas=${commas:1}
pdsh -b -w "$commas" 'echo "POST //test.haproxy.in:80/ping" | /home/sachinm/.linuxbrew/bin/vegeta -cpus=32 attack -connections=1000000 -header="sleep:20" -header="times:2" -body=post_smaller.txt -timeout=2h -rate=3000 -workers=500 > ' $result_file
for i in "${machines[@]}"; do scp [email protected]$i:/home/sachinm/$result_file $i.bin ; done;
vegeta report -inputs="$bins"

Apparently, Vegeta provides information on this utility called pdsh that lets you run a command concurrently on multiple machines remotely . Additionally, the Vegeta allows us to combine multiple results into one and that’s really all we wanted.

HAProxy Configuration

This is probably what you came here looking for, below is the HAProxy config that we used in our load test runs. The most important part being that of the nbproc setting and the maxconn setting. The maxconn setting allows us to provide the maximum number of TCP connections that the HAProxy can support overall (one way).

Changes to maxconn setting leads to increase in HAProxy process’ ulimit. Take a look below

The max open files has increased to 4 million because of the max connections for HAProxy being set at 2 million. Neat !

Check the article below for a whole lot of HAProxy optimisations that you can and should do to achieve the kind of stats we achieved.

Use HAProxy to load balance 300k concurrent tcp socket connections: Port Exhaustion, Keep-alive and…

I'm trying to build up a push system recently. To increase the scalability of the system, the best practice is to make…www.linangran.com

The http30 goes on to http83 :p

That’s all for now folks. If you’ve it so far, I’m truly amazed :)

A special shout out to Dheeraj Kumar Sidana who helped us all the way through this and without whose help we would not have been able to reach any meaningful results. :)

Do let me know how this blog post helped you. Also, please recommend (❤) and spread the love as much as possible for this post if you think this might be useful for someone.