Program crashes in amqp_abort() on HP-UX 11.31

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Program crashes in amqp_abort() on HP-UX 11.31

Haster
Hi there

My application uses  library rabbitmq-c and sometimes it crashes on HP-UX 11.31 with below stack:
#0 0xc00000000054bc70:0 in _waitpid_sys+0x30 () from /lib/hpux64/libc.so.1
#1 0xc000000000562b80:0 in waitpid ()
#2 0xc00000000a3d9e00:0 in <unknown_procedure> + 0xa0 ()
#3 0xc00000000a3da280:0 in <unknown_procedure> + 0x90 ()
#4 <signal handler called>
#5 0xc000000000211ab0:0 in _lwp_kill+0x30 () from /lib/hpux64/libpthread.so.1
#6 0xc000000000178810:0 in pthread_kill ()
#7 0xc0000000003f8140:0 in raise ()
#8 0xc000000000508d00:0 in abort ()
#9 0xc0000000043df860:0 in amqp_abort (
#10 0xc0000000043e06f0:0 in amqp_handle_input (state=0x600000000480bc60,
#11 0xc0000000043e1fe0:0 in wait_frame_inner (state=0x600000000480bc60,
#12 0xc00000000a9b4ad0:0 in RabbitMQ::RabbitMQQueueImpl::recv (
#13 0xc00000000a992ee0:0 in consumer_impl::execute (this=0x600000000428edf8)
#14 0xc00000000a436af0:0 in threads::thread_proc(void*)+0x1c0 ()
#15 0xc00000000013fb20:0 in __pthread_bound_body ()

And in log file I can see following message:
Failed to start consumer with configurationInternal error: invalid amqp_connection_state_t->state 0

So as I understand, I get corrupted packet and as a result abort is called from amqp_abort.

I think it isn't a good idea to call abort into library because my application is crashed and I can't
managed this situation (for example clean resources and reconnect or something else).

Is it possible to remove abort function from library?
Reply | Threaded
Open this post in threaded view
|

Re: Program crashes in amqp_abort() on HP-UX 11.31

Tony Garnock-Jones-6
Hi,

On 8 July 2013 03:11, Haster <[hidden email]> wrote:
Failed to start consumer with configurationInternal error: invalid
amqp_connection_state_t->state 0

This is not necessarily the result of a corrupted packet, but it is the result of some kind of memory corruption or threading bug. (It could also be a bug in the library.)

There is nothing sensible you can do to recover: memory is already corrupted, possibly extensively. There's no way to know. Calling abort() is the right thing to do in this situation.

If you are using an amqp_connection_state_t from multiple threads, you should either mutex access to it (difficult) or create a separate connection for each thread (easier).

Regards,
  Tony
--
Tony Garnock-Jones
[hidden email]
http://homepages.kcbbs.gen.nz/tonyg/

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Program crashes in amqp_abort() on HP-UX 11.31

Haster
Tony, thanks for reply.

The problem is that I use only one thread in my program (only one thread works with connection to rabbitmq)

I heard that on HP-UX TCP/IP stack protocol API is different to other UNIX SYSTEMS and to fix it I need use libxnet and _XOPEN_SOURCE_EXTENDED define directive.

What do you think can it be a root of problem?
Reply | Threaded
Open this post in threaded view
|

Re: Program crashes in amqp_abort() on HP-UX 11.31

Tony Garnock-Jones-6
On 8 July 2013 09:17, Haster <[hidden email]> wrote:
The problem is that I use only one thread in my program (only one thread
works with connection to rabbitmq)

Is there only one thread in your program, or is there more than one thread but only one which uses librabbitmq? If there's more than one, perhaps another thread is chewing up memory somehow.
 
I heard that on HP-UX TCP/IP stack protocol API is different to other UNIX
SYSTEMS and to fix it I need use libxnet and _XOPEN_SOURCE_EXTENDED define
directive.

This doesn't sound like the problem, but I could be wrong.

From the error message you found in the logs, it looks to me like something is setting state->state to CONNECTION_STATE_IDLE (== 0) between amqp_connection.c lines 234 and 245. The obvious suspect is consume_data but unless things are already corrupted, it's hard to see what could be happening.

If you can run your program in a debugger, try breaking on the abort and examining state's memory at that time. It might give some clues as to what is going wrong.

Does the problem occur when the message rate is high, or low, or is it not correlated with either?

I should also mention I'm looking at git revision 31ecd4f52f17dbf9e189625cd1ba2ad08af29851. Which revision of the library are you using?

Tony
--
Tony Garnock-Jones
[hidden email]
http://homepages.kcbbs.gen.nz/tonyg/

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Program crashes in amqp_abort() on HP-UX 11.31

Haster
Tony. thanks for reply

I was wrong when said that there is only one thread... I have to of them:

first thread creates all things (connection, channels to queue and exchanges), call basic.consume and start new thread that reads messages from queue and processes them...

But first thread can call one method, that add some binds and maybe declare new exchanges

At that moment I don't sync this threads

But can it be the root of problem?
I use separate channel for each used exchange and queue.

And one thing for thinking. My program crashes only on HP-UX 11.31, on windows, Solaris and Linux Red Hat it works well
Reply | Threaded
Open this post in threaded view
|

Re: Program crashes in amqp_abort() on HP-UX 11.31

Tony Garnock-Jones-6
On 9 July 2013 07:30, Haster <[hidden email]> wrote:
first thread creates all things (connection, channels to queue and
exchanges), call basic.consume and start new thread that reads messages from
queue and processes them...

In principle, this is fine. I would still not do it, myself: instead, I would never let a connection "cross over" between threads. But that's probably just superstitious of me :-)
 
But first thread can call one method, that add some binds and maybe declare
new exchanges

Aha! OK. Yes, that is probably the cause of the problem.

Try making sure that each thread has its own private connection to the broker. Perhaps the thread that only occasionally needs to use a connection could create it when needed, and destroy it when it is finished.

Connections *must* not be shared between threads[1].
 
But can it be the root of problem?
I use separate channel for each used exchange and queue.

There is *no* locking in librabbitmq, so *any* use of a connection across threads requires you to do the locking manually.
 
And one thing for thinking. My program crashes only on HP-UX 11.31, on
windows, Solaris and Linux Red Hat it works well

My guess is that this is just chance. If two threads are sharing a single librabbitmq connection without any mutexing, then the bug exists on all platforms, even if it doesn't manifest deterministically.

Regards,
  Tony

[1] Unless you do some locking yourself, which is difficult and error-prone, and not something I'd recommend.
--
Tony Garnock-Jones
[hidden email]
http://homepages.kcbbs.gen.nz/tonyg/

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Program crashes in amqp_abort() on HP-UX 11.31

Haster
Tony, thanks!

Is it enough if I cover reading message by mutex and guarantee that I call bind methods only then I don't call read method?
Reply | Threaded
Open this post in threaded view
|

Re: Program crashes in amqp_abort() on HP-UX 11.31

Tony Garnock-Jones-6
On 9 July 2013 08:46, Haster <[hidden email]> wrote:
Is it enough if I cover reading message by mutex and guarantee that I call
bind methods only then I don't call read method?

I can't guarantee that, and it sounds difficult (= impossible, probably) to implement because what if you want to do a bind() while the other thread is blocked, reading from the socket?

Thinking harder about this, I suspect that only a very few limited kinds of concurrent use of a connection can be implemented by managing your own mutexing; to get the full range, there'd have to be very deep reorganisation of the library's structure to use threads internally. Which was an anti-goal when I wrote it initially.

My recommendation, then, is to never try to share a connection between threads.

Tony
--
Tony Garnock-Jones
[hidden email]
http://homepages.kcbbs.gen.nz/tonyg/

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Program crashes in amqp_abort() on HP-UX 11.31

Haster
Tony,

while other thread is blocked another also will blocked and only after first release mutex will call bind method.

All right, if I will make a new connection in thread that makes bind and will call bind on it that will be enough?
Reply | Threaded
Open this post in threaded view
|

Re: Program crashes in amqp_abort() on HP-UX 11.31

Tony Garnock-Jones-6
On 9 July 2013 09:17, Haster <[hidden email]> wrote:
while other thread is blocked another also will blocked and only after first
release mutex will call bind method.

The thread wanting to bind() might be waiting for a very long time in that case; the consuming thread won't wake up until a frame arrives for it.
 
All right, if I will make a new connection in thread that makes bind and
will call bind on it that will be enough?

Yes: if you make sure that a connection is never used by more than one thread, the problem that caused this mail-thread should go away.

Tony
--
Tony Garnock-Jones
[hidden email]
http://homepages.kcbbs.gen.nz/tonyg/

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Program crashes in amqp_abort() on HP-UX 11.31

Haster
Tony Garnock-Jones-6 wrote
On 9 July 2013 09:17, Haster <[hidden email]> wrote:
The thread wanting to bind() might be waiting for a very long time in that
case; the consuming thread won't wake up until a frame arrives for it.
Anyway in my test case I have to wait until all binds will be done, so it isn't a problem for me
(I need to sync this two threads and continue getting messages only after binding procedure)

Tony, thanks for your help!