[noPoll] Stress Test -- Client connection timeouts

Kale, Rahul Rahul.Kale en barco.com
Vie Jun 3 00:35:19 CEST 2016



Hello,

I am facing scalability issues with running stress tests using a
simple client/server websocket application utilizing the noPoll Library.

The stress test code is available at:
https://github.com/rpkale/test-nopoll


When running multiple client together using the nopoll_stress_test.sh wrapper
script, the time taken to establish connections would increase as more and
more simultaneous connections are attempted. I could not reliably establish
100 connections. As you can see, I configure
nopoll_conn_wait_until_connection_ready() to wait for up to two minutes. Most
connections would be established, but the time taken for it would increase up
to 15-20 seconds. Very often, a few of the connections would not get
established. Some connections would fail much earlier than two minutes. Some
would fail after 3 minutes.

Assuming that the 'accept' call is failing under stress in some fashion, I
tried to increase the 'backlog' used in listen call from 5 to 128. (edited
file nopoll_ctx.c). This improved the situation somewhat. The connection times
were still long but they would all eventually succeed most of the time.
However for 200 connections, some would still fail.

Next assumption was that the server is so busy sending data to already
connected clients, it is not 'accepting' new connections fast enough. So I
temporarily modified the code such that the client does not make any data
requests. After establishing the connection, it would just sleep for the
given duration and then exit. (See commented out code in create_client()
function).

This had a very strange result. Till one connection exits, the server would
not even accept the next connection. Running 5 client instances
simultaneously for 5 seconds would take 25 seconds, each client handled
sequentially by the server.

I started looking at the code to debug this issue and I believe I have
located the root cause. In function __nopoll_conn_accept_complete_common()
(near end of file nopoll_conn.c), the function first makes the newly accepted
connection socket blocking. Then for TLS case it does special handling and
then reverts back the socket to be non-blocking. For non TLS case however,
the socket is never reverted back to non-blocking. In effect, all client
connections are handled in blocking fashion by the server.

To fix this issue, I added
the following code to the end of the if block:

        } /* end if */
        else {
            nopoll_conn_set_sock_block (conn->session, nopoll_false);
        }

This fixed the above sequential client connection accept handling issue. This
was also the reason I now started seeing occasional EAGAIN(11) return code
from nopoll_conn_send_binary() on the server side. Before fixing this, I
never ever saw this (see comment in sendResponse() function).

After the above fix, with the client still just sleeping instead of making
requests, all the connections are established pretty fast. I have tested with
500 connections and they all take less than 25 ms.
However after reverting the sleep() code and the clients making normal
requests, the performance is still not up to the mark. I believe this is due
to the server being mostly busy sending data to clients. Handling new
connections has lower priority. I believe that a more scalable solution would
be the application itself handling socket select loop and not depending on
nopoll_loop_wait(). This would be needed even if the application has only the
listener socket to consider. For the EAGAIN case anyway the application has
to retry sending data when client is ready.

Do you have any suggestions for this? Are there any hooks in the noPoll library
to make it more scalable without the application having to build a separate
select loop?

Regards,

Rahul

Rahul Kale

IP Video Systems
Barco, Inc
1287 Anvilwood Ave
Sunnyvale, CA  94089

Tel  +1 408 400 4238

This message is subject to the following terms and conditions: MAIL DISCLAIMER<http://www.barco.com/en/maildisclaimer>
------------ próxima parte ------------
Se ha borrado un adjunto en formato HTML...
URL: <http://lists.aspl.es/pipermail/nopoll/attachments/20160602/b12cb5d5/attachment.html>


Más información sobre la lista de distribución noPoll