<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri","sans-serif";
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Hello,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I am facing scalability issues with running stress tests using a
<o:p></o:p></p>
<p class="MsoNormal">simple client/server websocket application utilizing the noPoll Library.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">The stress test code is available at:<o:p></o:p></p>
<p class="MsoNormal">https://github.com/rpkale/test-nopoll<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">When running multiple client together using the nopoll_stress_test.sh wrapper
<o:p></o:p></p>
<p class="MsoNormal">script, the time taken to establish connections would increase as more and
<o:p></o:p></p>
<p class="MsoNormal">more simultaneous connections are attempted. I could not reliably establish
<o:p></o:p></p>
<p class="MsoNormal">100 connections. As you can see, I configure <o:p></o:p></p>
<p class="MsoNormal">nopoll_conn_wait_until_connection_ready() to wait for up to two minutes. Most
<o:p></o:p></p>
<p class="MsoNormal">connections would be established, but the time taken for it would increase up
<o:p></o:p></p>
<p class="MsoNormal">to 15-20 seconds. Very often, a few of the connections would not get
<o:p></o:p></p>
<p class="MsoNormal">established. Some connections would fail much earlier than two minutes. Some
<o:p></o:p></p>
<p class="MsoNormal">would fail after 3 minutes. <o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Assuming that the 'accept' call is failing under stress in some fashion, I
<o:p></o:p></p>
<p class="MsoNormal">tried to increase the 'backlog' used in listen call from 5 to 128. (edited
<o:p></o:p></p>
<p class="MsoNormal">file nopoll_ctx.c). This improved the situation somewhat. The connection times<o:p></o:p></p>
<p class="MsoNormal">were still long but they would all eventually succeed most of the time.<o:p></o:p></p>
<p class="MsoNormal">However for 200 connections, some would still fail.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Next assumption was that the server is so busy sending data to already
<o:p></o:p></p>
<p class="MsoNormal">connected clients, it is not 'accepting' new connections fast enough. So I
<o:p></o:p></p>
<p class="MsoNormal">temporarily modified the code such that the client does not make any data
<o:p></o:p></p>
<p class="MsoNormal">requests. After establishing the connection, it would just sleep for the
<o:p></o:p></p>
<p class="MsoNormal">given duration and then exit. (See commented out code in create_client()<o:p></o:p></p>
<p class="MsoNormal">function).<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">This had a very strange result. Till one connection exits, the server would
<o:p></o:p></p>
<p class="MsoNormal">not even accept the next connection. Running 5 client instances
<o:p></o:p></p>
<p class="MsoNormal">simultaneously for 5 seconds would take 25 seconds, each client handled
<o:p></o:p></p>
<p class="MsoNormal">sequentially by the server.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I started looking at the code to debug this issue and I believe I have
<o:p></o:p></p>
<p class="MsoNormal">located the root cause. In function __nopoll_conn_accept_complete_common()
<o:p></o:p></p>
<p class="MsoNormal">(near end of file nopoll_conn.c), the function first makes the newly accepted
<o:p></o:p></p>
<p class="MsoNormal">connection socket blocking. Then for TLS case it does special handling and
<o:p></o:p></p>
<p class="MsoNormal">then reverts back the socket to be non-blocking. For non TLS case however,
<o:p></o:p></p>
<p class="MsoNormal">the socket is never reverted back to non-blocking. In effect, all client<o:p></o:p></p>
<p class="MsoNormal">connections are handled in blocking fashion by the server. <o:p>
</o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">To fix this issue, I added <o:p></o:p></p>
<p class="MsoNormal">the following code to the end of the if block:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> } /* end if */<o:p></o:p></p>
<p class="MsoNormal"> else {<o:p></o:p></p>
<p class="MsoNormal"> nopoll_conn_set_sock_block (conn->session, nopoll_false);<o:p></o:p></p>
<p class="MsoNormal"> }<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">This fixed the above sequential client connection accept handling issue. This
<o:p></o:p></p>
<p class="MsoNormal">was also the reason I now started seeing occasional EAGAIN(11) return code
<o:p></o:p></p>
<p class="MsoNormal">from nopoll_conn_send_binary() on the server side. Before fixing this, I
<o:p></o:p></p>
<p class="MsoNormal">never ever saw this (see comment in sendResponse() function).<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">After the above fix, with the client still just sleeping instead of making
<o:p></o:p></p>
<p class="MsoNormal">requests, all the connections are established pretty fast. I have tested with
<o:p></o:p></p>
<p class="MsoNormal">500 connections and they all take less than 25 ms.<o:p></o:p></p>
<p class="MsoNormal"><o:p></o:p></p>
<p class="MsoNormal">However after reverting the sleep() code and the clients making normal
<o:p></o:p></p>
<p class="MsoNormal">requests, the performance is still not up to the mark. I believe this is due
<o:p></o:p></p>
<p class="MsoNormal">to the server being mostly busy sending data to clients. Handling new
<o:p></o:p></p>
<p class="MsoNormal">connections has lower priority. I believe that a more scalable solution would
<o:p></o:p></p>
<p class="MsoNormal">be the application itself handling socket select loop and not depending on
<o:p></o:p></p>
<p class="MsoNormal">nopoll_loop_wait(). This would be needed even if the application has only the
<o:p></o:p></p>
<p class="MsoNormal">listener socket to consider. For the EAGAIN case anyway the application has
<o:p></o:p></p>
<p class="MsoNormal">to retry sending data when client is ready. <o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Do you have any suggestions for this? Are there any hooks in the noPoll library<o:p></o:p></p>
<p class="MsoNormal">to make it more scalable without the application having to build a separate<o:p></o:p></p>
<p class="MsoNormal">select loop?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Regards,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Rahul<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Rahul Kale<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">IP Video Systems<o:p></o:p></p>
<p class="MsoNormal">Barco, Inc<o:p></o:p></p>
<p class="MsoNormal">1287 Anvilwood Ave<o:p></o:p></p>
<p class="MsoNormal">Sunnyvale, CA 94089<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Tel +1 408 400 4238<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
This message is subject to the following terms and conditions: <a href="http://www.barco.com/en/maildisclaimer">
MAIL DISCLAIMER</a>
</body>
</html>