[Vortex] weird behavior or misunderstanding of beep/vortex?

Gustavo Sverzut Barbieri Gustavo.Barbieri at indt.org.br
Fri Jun 9 20:36:32 CEST 2006


On Friday 09 June 2006 11:51, Gustavo Sverzut Barbieri wrote:
> On Friday 09 June 2006 11:31, you wrote:
> > El vie, 09-06-2006 a las 09:26 -0300, Gustavo Sverzut Barbieri escribió:
> > Hi Gustavo!
> >
> > > Now I face a double-free, my backtrace follows. It seems that we have
> > > a race
> > > condition in the test:
> > >
> > >         if ((data != NULL) && (data->message != NULL))
> > >                 g_free (data->message);
> > >
> > >         if (data != NULL)
> > >                 g_free (data);
> > >
> > >
> > > As you may see, we have two problems there: thread can stop after
> > > "data !=
> > > NULL" and free data, which will cause failure or after
> > > "data->message", also
> > > failing. In my case I hit the problem at data->message.
> > >
> > > My solution to this kind of problem is to use an atomic instruction to
> > > swap
> > > things I'll free, like:
> > >
> > >         message = xchg( &data->message, NULL );
> > >
> > > this will make data->message NULL and will return previous contents,
> > > then I
> > > can check & free it.
> > >
> > > But this is just my guess... I need to invetigate further... what is
> > > quite
> > > difficult, since using Valgrind makes the problem go away :-D
> >
> > The race condition description and its possible solution looks fine.
> > However, there is only one thread running the vortex sequencer loop at
> > the same time, which is the only one entity running the
> > __vortex_sequencer_unref_and_clear function.
> >
> > This makes difficult to produce a race condition inside that function,
> > even having the thread suspended between the pointer checking and the
> > pointer deallocation.
> >
> > Did the problem dissapear once applied the patch you propose? Did you
> > modify the source code to start more than one vortex sequencers?
> >
> > If the problem persists maybe you can describe the steps to reproduce
> > the problem.
>
> I'll try to isolate it.
>
> From my investigation so far, problem is that it free() data->message, but
> not make it null... it's the same thread, so it's not a race condition. I'm
> now memset( data, 0, sizeof(*data) ), but it start to break elsewhere.
>
> My debug logs from __vortex_sequencer_run are:
>
> ### 0) begin: data=0xb3804c48 [pid=-1231029328]
> ### 3) step 2:
> ### ... a_frame=0x80ca6c8, the_size=4118
> ### 3.1) channel#5 packet=0x80ca6b0
> ### .... packet: type=1, no.: 0, len.: 4118
> 3) the end: data=(nil)
> ### 0) begin: data=0xb38050e8 [pid=-1231029328]
> ### ... start_resequence: 0xb3804c48
> connection_send_msg: 0x80c8be8 (127.0.0.1:59845), channel 0x80cb960 (#5)
> 0x80c1f60
>
> connection_send_msg: lock mutex 0x80c7438
> ### 0) begin: data=0xb3804928 [pid=-1231029328]
> ### ... start_resequence: 0xb3804c48
> ### 3) step 2:
> ### ... a_frame=0x80ca6c8, the_size=4122
> ### 3.1) channel#5 packet=0x80ca6b0
> ### .... packet: type=1, no.: 0, len.: 4122
> 3) the end: data=(nil)
> ### 0) begin: data=0xb3804928 [pid=-1231029328]
> ### ... start_resequence: 0xb3804c48
> ### 3) step 2:
> ### ... a_frame=0x80ca6c8, the_size=4122
> ### 3.1) channel#5 packet=0x80ca6b0
> ### .... packet: type=1, no.: 0, len.: 4122
> 3) the end: data=(nil)
> ### 0) begin: data=0xb3804928 [pid=-1231029328]
> ### ... start_resequence: 0xb3804c48
> ### 3) step 2:
> ### ... a_frame=0x80ca6c8, the_size=4123
> ### 3.1) channel#5 packet=0x80ca6b0
> ### .... packet: type=1, no.: 0, len.: 4123
> 3) the end: data=(nil)
> ### 0) begin: data=0xb3804928 [pid=-1231029328]
> ### ... start_resequence: 0xb3804c48
> ### 3) step 2:
> ### ... a_frame=0x80ca6c8, the_size=4123
> ### 3.1) channel#5 packet=0x80ca6b0
> ### .... packet: type=1, no.: 0, len.: 4123
> 3) the end: data=(nil)
> ### 0) begin: data=0xb3804928 [pid=-1231029328]
> ### ... start_resequence: 0xb3804c48
> ### 3) step 2:
> ### ... a_frame=0x80ca6c8, the_size=4123
> ### 3.1) channel#5 packet=0x80ca6b0
> ### .... packet: type=1, no.: 0, len.: 4123
> 3) the end: data=(nil)
> ### 0) begin: data=0xb3804928 [pid=-1231029328]
> ### ... start_resequence: 0xb3804c48
> ### 3) step 2:
> ### ... a_frame=0x80ca6c8, the_size=4123
> ### 3.1) channel#5 packet=0x80ca6b0
> ### .... packet: type=1, no.: 0, len.: 4123
> 3) the end: data=(nil)
> ### 0) begin: data=0xb3804928 [pid=-1231029328]
> ### ... start_resequence: 0xb3804c48
> ### 3) step 2:
> ### ... a_frame=0x80ca6c8, the_size=4123
> ### 3.1) channel#5 packet=0x80ca6b0
> ### .... packet: type=1, no.: 0, len.: 4123
> 3) the end: data=(nil)
> ### 0) begin: data=0xb3804928 [pid=-1231029328]
> ### ... start_resequence: 0xb3804c48
> ### 3) step 2:
> ### ... a_frame=0x80ca6c8, the_size=4123
> ### 3.1) channel#5 packet=0x80ca6b0
> ### .... packet: type=1, no.: 0, len.: 4123
> 3) the end: data=(nil)
> ### 0) begin: data=0xb3804928 [pid=-1231029328]
> ### ... start_resequence: 0xb3804c48
> ### 3) step 2:
> ### ... a_frame=0x80ca6c8, the_size=4123
> ### 3.1) channel#5 packet=0x80ca6b0
> ### .... packet: type=1, no.: 0, len.: 4123
> 3) the end: data=(nil)
> ### 0) begin: data=0xb3804928 [pid=-1231029328]
> ### ... start_resequence: 0xb3804c48
> ### 3) step 2:
> ### ... a_frame=0x80ca6c8, the_size=4123
> ### 3.1) channel#5 packet=0x80ca6b0
> ### .... packet: type=1, no.: 0, len.: 4123
> 3) the end: data=(nil)
> ### 0) begin: data=0xb3804928 [pid=-1231029328]
> ### ... start_resequence: 0xb3804c48
> ### 3) step 2:
> ### ... a_frame=0x80ca6c8, the_size=880
> ### 3.1) channel#5 packet=0x80ca6b0
> ### .... packet: type=1, no.: 0, len.: 880
> 4) step3, data=0xb3804c48
> ### ... start_resequence: 0xb3804c48
> ### 3) step 2:
> ### ... a_frame=(nil), the_size=0
> ### 3.2) unref data=0xb3804c48
> *** glibc detected *** double free or corruption (fasttop): 0xb3804c48 ***
>
> Program received signal SIGABRT, Aborted.
> [Switching to Thread -1231029328 (LWP 14296)]
> 0xffffe410 in __kernel_vsyscall ()
>
>
>
> This is exactly the last frame/packet of a large message I'm sending.


Ok, I couldn't fix the problem yet, but looks like problem is both 
"channel->pending_messages" and "vortex_sequencer_queue" holding references 
to same "data", that is free'd in "4) step 3, data=...", right before the 
"goto start_resequence".

I'm still looking how you free these, maybe it's free()d twice, maybe in the 
wrong order, I don't know :-(


-- 
Gustavo Sverzut Barbieri
------------------------
INdT, Recife, Brazil

Jabber: barbieri at gmail.com
   MSN: barbieri at gmail.com
  ICQ#: 17249123
 Skype: gsbarbieri
Mobile: +55 (81) 9927 0010
 Phone:  +1 (347) 624 6296; 08122692 at sip.stanaphone.com
   GPG: 0xB640E1A2 @ wwwkeys.pgp.net



More information about the Vortex mailing list