[Vortex] Failure to establish connection between client and server in some of machines with RedHat 6.1 operating system

Tue Aug 21 07:48:43 CEST 2012

Hi

We are developing and testing an application that involves a vortex
(version 1.1.8) based client and a server which uses Java implementation of
BEEP. The application has been running fine on several platforms including
Linux, AIX, HP, Solaris. In one the machines which runs RedHat 6.1 we are
facing issues. The failure happens during connection establishment. The
error message indicates that socket connect operation is in progress.
Following are vortex log messages. Please note that the line numbers may
not match with the vortex 1.1.8 code since I had added a few extra log
messages in the same file.

(debug) vortex_connection.c:1661 executing connection new in blocking mode
to localhost:47720 id=1
(debug) vortex_connection.c:1274 detected connect timeout during 30 seconds
(starting from: 1344323279)
(debug) vortex_connection.c:1294 __vortex_connection_wait_on (sock=4)
operation finished, err=1, errno=115 (Operation now in progress),
wait_for=2 (ellapsed: 0)
(warning) vortex_connection.c:1310 error level set on waiting socket
(debug) vortex_connection.c:1321 timeout operation finished, with err=-6,
errno=115, ellapsed time=0 (seconds)
(warning) vortex_connection.c:1470 unable to connect to remote host
(timeout)
(debug) vortex_connection.c:2373 closing a connection which is not opened,
unref resources..

I looked at the function "__vortex_connection_wait_on" that sets the error
to "-6" and logs the first warning message. The function
"vortex_io_waiting_invoke_wait" which calls "epoll" (epoll is available in
this OS version) returns the number of writable FDs. If the return value is
>0 then vortex calls "getsockopt" to get any error on the socket. This
seems to give value for SO_ERROR as "111", which stands for "Connection
refused". Since the number of file descriptors ready for write is greater
than zero I thought calling "getsockopt" is not necessary and hence removed
the code to call "getsockopt". This means if the number of FDs ready for
write is >0 then I assume the socket is ready and hence break from the
loop. With this change the issue being hit earlier was resolved. This meant
that "getsockopt" returned wrong error code. I am wondering if that is
because of calling getsockopt even in success case.

Note that without the above mentioned change the issue was very consistent
and frequent but with the change the issue was not observed at all.

Now I want to know if the above mentioned change is correct. I see a
comment in "__vortex_connection_wait_on" function which mentions about an
issue with not calling getsockopt API in older version of Linux. Is this
still applicable? I checked with a few of colleagues who have a few years
of experience of using these APIs. They mentioned that it would have been
required in older versions but not in the current versions of operating
systems.

So the question is "Is it fine not to call getsockopt API and check
SO_ERROR when the number of file descriptors ready is greater than zero?".

Thanks
Subrahmanya
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.aspl.es/pipermail/vortex/attachments/20120821/6cdf9294/attachment.htm>