RabbitMQ server closing connection every few minutes

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

RabbitMQ server closing connection every few minutes

David Tinker
Hi

We have a Java application using RabbitMQ that keeps getting its
connection closed by the server every few minutes. On the client side
we see this:

2012-05-22 05:30:05,337 [AMQP Connection 127.0.0.1:5671] ERROR
pork.RabbitService  - Connection to Rabbit Exchange shutting down:
com.rabbitmq.client.ShutdownSignalException: connection error; reason:
java.net.SocketException: Connection reset
Reason: java.net.SocketException: Connection reset

On the server side we get this:

=INFO REPORT==== 22-May-2012::05:25:20 ===
accepting AMQP connection <0.32660.612> (127.0.0.1:45458 -> 127.0.0.1:5671)

=ERROR REPORT==== 22-May-2012::05:30:05 ===
closing AMQP connection <0.32660.612> (127.0.0.1:45458 -> 127.0.0.1:5671):
{timeout,running}

It happens more frequently when the system is busy (500-1000
messages/s) than when it is cruising (50 messages/s). The application
and RabbitMQ are on the same box so I don't think its a
networking/firewall problem.

The application only uses one connection and one channel. Messages are
received using Connection.basicConsume(..) and acked or nacked
sometime later on different threads. It can sometimes be several
minutes before a message is acked or nacked. We are using
Connection.basicQos(10000) to limit the number of outstanding messages
but its not often that reaches 10000 and the error happens anytime.
There is defensive code to make sure messages are not acked/nacked
more than once.

Anyone have any ideas? Is there any way to get more information on the
server or client side regarding the error?

RabbitMQ 2.8.1 on Gentoo Linux 64bit
Erlang R15B (erts-5.9) [source] [64-bit]
amqp-client-2.8.2 (also tried amqp-client-2.7.1)
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)

Thanks
David
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RabbitMQ server closing connection every few minutes

Tim Watson-6
On 22/05/2012 07:34, David Tinker wrote:

> Hi
>
> We have a Java application using RabbitMQ that keeps getting its
> connection closed by the server every few minutes. On the client side
> we see this:
>
> 2012-05-22 05:30:05,337 [AMQP Connection 127.0.0.1:5671] ERROR
> pork.RabbitService  - Connection to Rabbit Exchange shutting down:
> com.rabbitmq.client.ShutdownSignalException: connection error; reason:
> java.net.SocketException: Connection reset
> Reason: java.net.SocketException: Connection reset
>
> On the server side we get this:
>
> =INFO REPORT==== 22-May-2012::05:25:20 ===
> accepting AMQP connection<0.32660.612>  (127.0.0.1:45458 ->  127.0.0.1:5671)
>
> =ERROR REPORT==== 22-May-2012::05:30:05 ===
> closing AMQP connection<0.32660.612>  (127.0.0.1:45458 ->  127.0.0.1:5671):
> {timeout,running}
>

Hi David. The second error message will appear in the logs when there
has been an un-handled exception in rabbit's 'reader process'. The
'{timeout, running}' state occurs when receiving data from the network
takes too long (which is where the 'timeout' tag originates) but the
AMQP connection state is 'running' - the second tag. When this condition
arises, the broker's reader process shuts down the connection.

So at first glance, it does look like a networking problem to me,
although TBH I'm not sure about the nature of it just yet. I suspect it
could have something to do with "... It can sometimes be several minutes
before a message is acked or nacked ...".

> It happens more frequently when the system is busy (500-1000
> messages/s) than when it is cruising (50 messages/s). The application
> and RabbitMQ are on the same box so I don't think its a
> networking/firewall problem.
>
> The application only uses one connection and one channel. Messages are
> received using Connection.basicConsume(..) and acked or nacked
> sometime later on different threads. It can sometimes be several
> minutes before a message is acked or nacked. We are using
> Connection.basicQos(10000) to limit the number of outstanding messages
> but its not often that reaches 10000 and the error happens anytime.
> There is defensive code to make sure messages are not acked/nacked
> more than once.
>
> Anyone have any ideas? Is there any way to get more information on the
> server or client side regarding the error?
>

Well the server error seems pretty clear and just to be 100% sure, I can
verify that the error message you're seeing is only generated in one
place in the whole code base:
https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_reader.erl#L241.

The code handling the timeout condition is
https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_reader.erl#L321,
the call site for which is
https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_reader.erl#L284.
The origin of this is
https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_net.erl#L108 
in rabbit_net - what the code does is make the calling process wait for
a message from the Erlang networking layer. So it is the underlying
TCP/IP layer in Erlang which is noticing a timeout on the connection.

How this is mediated through the client's behaviour is another matter.

Hope this helps us get started figuring out what's wrong!

Cheers,
Tim

> RabbitMQ 2.8.1 on Gentoo Linux 64bit
> Erlang R15B (erts-5.9) [source] [64-bit]
> amqp-client-2.8.2 (also tried amqp-client-2.7.1)
> java version "1.6.0_31"
> Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
>
> Thanks
> David
> _______________________________________________
> rabbitmq-discuss mailing list
> [hidden email]
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RabbitMQ server closing connection every few minutes

Tim Watson-6
On 22/05/2012 08:48, Tim Watson wrote:

> On 22/05/2012 07:34, David Tinker wrote:
>> Hi
>>
>> We have a Java application using RabbitMQ that keeps getting its
>> connection closed by the server every few minutes. On the client side
>> we see this:
>>
>> 2012-05-22 05:30:05,337 [AMQP Connection 127.0.0.1:5671] ERROR
>> pork.RabbitService - Connection to Rabbit Exchange shutting down:
>> com.rabbitmq.client.ShutdownSignalException: connection error; reason:
>> java.net.SocketException: Connection reset
>> Reason: java.net.SocketException: Connection reset
>>
>> On the server side we get this:
>>
>> =INFO REPORT==== 22-May-2012::05:25:20 ===
>> accepting AMQP connection<0.32660.612> (127.0.0.1:45458 ->
>> 127.0.0.1:5671)
>>
>> =ERROR REPORT==== 22-May-2012::05:30:05 ===
>> closing AMQP connection<0.32660.612> (127.0.0.1:45458 -> 127.0.0.1:5671):
>> {timeout,running}
>>
>
> Hi David. The second error message will appear in the logs when there
> has been an un-handled exception in rabbit's 'reader process'. The
> '{timeout, running}' state occurs when receiving data from the network
> takes too long (which is where the 'timeout' tag originates) but the
> AMQP connection state is 'running' - the second tag. When this condition
> arises, the broker's reader process shuts down the connection.
>
> So at first glance, it does look like a networking problem to me,
> although TBH I'm not sure about the nature of it just yet. I suspect it
> could have something to do with "... It can sometimes be several minutes
> before a message is acked or nacked ...".
>
>> It happens more frequently when the system is busy (500-1000
>> messages/s) than when it is cruising (50 messages/s). The application
>> and RabbitMQ are on the same box so I don't think its a
>> networking/firewall problem.
>>
>> The application only uses one connection and one channel. Messages are
>> received using Connection.basicConsume(..) and acked or nacked
>> sometime later on different threads. It can sometimes be several
>> minutes before a message is acked or nacked. We are using
>> Connection.basicQos(10000) to limit the number of outstanding messages
>> but its not often that reaches 10000 and the error happens anytime.
>> There is defensive code to make sure messages are not acked/nacked
>> more than once.
>>
>> Anyone have any ideas? Is there any way to get more information on the
>> server or client side regarding the error?
>>
>
> Well the server error seems pretty clear and just to be 100% sure, I can
> verify that the error message you're seeing is only generated in one
> place in the whole code base:
> https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_reader.erl#L241.
>
>

Except that this isn't quite as clear as I suggested initially. The
networking layer isn't going to send a timeout on a blocking receive
like this - the 'timeout' message is being sent to the reader from some
other process. I will investigate further! :)

> The code handling the timeout condition is
> https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_reader.erl#L321,
> the call site for which is
> https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_reader.erl#L284.
> The origin of this is
> https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_net.erl#L108
> in rabbit_net - what the code does is make the calling process wait for
> a message from the Erlang networking layer. So it is the underlying
> TCP/IP layer in Erlang which is noticing a timeout on the connection.
>
> How this is mediated through the client's behaviour is another matter.
>
> Hope this helps us get started figuring out what's wrong!
>
> Cheers,
> Tim
>
>> RabbitMQ 2.8.1 on Gentoo Linux 64bit
>> Erlang R15B (erts-5.9) [source] [64-bit]
>> amqp-client-2.8.2 (also tried amqp-client-2.7.1)
>> java version "1.6.0_31"
>> Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
>> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
>>
>> Thanks
>> David
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> [hidden email]
>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
> _______________________________________________
> rabbitmq-discuss mailing list
> [hidden email]
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RabbitMQ server closing connection every few minutes

Tim Watson-6
On 22/05/2012 09:00, Tim Watson wrote:

> On 22/05/2012 08:48, Tim Watson wrote:
>> On 22/05/2012 07:34, David Tinker wrote:
>>> Hi
>>>
>>> We have a Java application using RabbitMQ that keeps getting its
>>> connection closed by the server every few minutes. On the client side
>>> we see this:
>>>
>>> 2012-05-22 05:30:05,337 [AMQP Connection 127.0.0.1:5671] ERROR
>>> pork.RabbitService - Connection to Rabbit Exchange shutting down:
>>> com.rabbitmq.client.ShutdownSignalException: connection error; reason:
>>> java.net.SocketException: Connection reset
>>> Reason: java.net.SocketException: Connection reset
>>>
>>> On the server side we get this:
>>>
>>> =INFO REPORT==== 22-May-2012::05:25:20 ===
>>> accepting AMQP connection<0.32660.612> (127.0.0.1:45458 ->
>>> 127.0.0.1:5671)
>>>
>>> =ERROR REPORT==== 22-May-2012::05:30:05 ===
>>> closing AMQP connection<0.32660.612> (127.0.0.1:45458 ->
>>> 127.0.0.1:5671):
>>> {timeout,running}
>>>
>>
>> Hi David. The second error message will appear in the logs when there
>> has been an un-handled exception in rabbit's 'reader process'. The
>> '{timeout, running}' state occurs when receiving data from the network
>> takes too long (which is where the 'timeout' tag originates) but the
>> AMQP connection state is 'running' - the second tag. When this condition
>> arises, the broker's reader process shuts down the connection.
>>
>> So at first glance, it does look like a networking problem to me,
>> although TBH I'm not sure about the nature of it just yet. I suspect it
>> could have something to do with "... It can sometimes be several minutes
>> before a message is acked or nacked ...".
>>
>>> It happens more frequently when the system is busy (500-1000
>>> messages/s) than when it is cruising (50 messages/s). The application
>>> and RabbitMQ are on the same box so I don't think its a
>>> networking/firewall problem.
>>>
>>> The application only uses one connection and one channel. Messages are
>>> received using Connection.basicConsume(..) and acked or nacked
>>> sometime later on different threads. It can sometimes be several
>>> minutes before a message is acked or nacked. We are using
>>> Connection.basicQos(10000) to limit the number of outstanding messages
>>> but its not often that reaches 10000 and the error happens anytime.
>>> There is defensive code to make sure messages are not acked/nacked
>>> more than once.
>>>
>>> Anyone have any ideas? Is there any way to get more information on the
>>> server or client side regarding the error?
>>>
>>
>> Well the server error seems pretty clear and just to be 100% sure, I can
>> verify that the error message you're seeing is only generated in one
>> place in the whole code base:
>> https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_reader.erl#L241.
>>
>>
>>
>
> Except that this isn't quite as clear as I suggested initially. The
> networking layer isn't going to send a timeout on a blocking receive
> like this - the 'timeout' message is being sent to the reader from some
> other process. I will investigate further! :)
>

Ok so I think I've identified the source of this behaviour. The
'timeout' message is sent to the reader process by the AMQP heartbeat -
a separate out of band process that is monitoring for 'stale'
connections using heartbeat frames (see
http://www.rabbitmq.com/amqp-0-9-1-reference.html for details).

What I believe is happening is that as the server comes under load, the
heartbeat for this connection is de-prioritised at the network layer and
times out, causing the connection to be closed.

Set your heartbeat to 0 (turning it off) for now and see if it fixes the
problem? That's a short term investigative step though, obviously.

I'll file a bug for improving the log messages when timeout occurs.

>> The code handling the timeout condition is
>> https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_reader.erl#L321,
>>
>> the call site for which is
>> https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_reader.erl#L284.
>>
>> The origin of this is
>> https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_net.erl#L108
>>
>> in rabbit_net - what the code does is make the calling process wait for
>> a message from the Erlang networking layer. So it is the underlying
>> TCP/IP layer in Erlang which is noticing a timeout on the connection.
>>
>> How this is mediated through the client's behaviour is another matter.
>>
>> Hope this helps us get started figuring out what's wrong!
>>
>> Cheers,
>> Tim
>>
>>> RabbitMQ 2.8.1 on Gentoo Linux 64bit
>>> Erlang R15B (erts-5.9) [source] [64-bit]
>>> amqp-client-2.8.2 (also tried amqp-client-2.7.1)
>>> java version "1.6.0_31"
>>> Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
>>> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
>>>
>>> Thanks
>>> David
>>> _______________________________________________
>>> rabbitmq-discuss mailing list
>>> [hidden email]
>>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> [hidden email]
>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
> _______________________________________________
> rabbitmq-discuss mailing list
> [hidden email]
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: RabbitMQ server closing connection every few minutes

David Tinker
Thanks a stack for the quick responses. I did find mentions of
heartbeat when Googling this problem but was under the impression that
we were not using it. It turns out we were and disabling it sorted out
our issues.

Thanks again
David

On Tue, May 22, 2012 at 10:25 AM, Tim Watson <[hidden email]> wrote:

> On 22/05/2012 09:00, Tim Watson wrote:
>>
>> On 22/05/2012 08:48, Tim Watson wrote:
>>>
>>> On 22/05/2012 07:34, David Tinker wrote:
>>>>
>>>> Hi
>>>>
>>>> We have a Java application using RabbitMQ that keeps getting its
>>>> connection closed by the server every few minutes. On the client side
>>>> we see this:
>>>>
>>>> 2012-05-22 05:30:05,337 [AMQP Connection 127.0.0.1:5671] ERROR
>>>> pork.RabbitService - Connection to Rabbit Exchange shutting down:
>>>> com.rabbitmq.client.ShutdownSignalException: connection error; reason:
>>>> java.net.SocketException: Connection reset
>>>> Reason: java.net.SocketException: Connection reset
>>>>
>>>> On the server side we get this:
>>>>
>>>> =INFO REPORT==== 22-May-2012::05:25:20 ===
>>>> accepting AMQP connection<0.32660.612> (127.0.0.1:45458 ->
>>>> 127.0.0.1:5671)
>>>>
>>>> =ERROR REPORT==== 22-May-2012::05:30:05 ===
>>>> closing AMQP connection<0.32660.612> (127.0.0.1:45458 ->
>>>> 127.0.0.1:5671):
>>>> {timeout,running}
>>>>
>>>
>>> Hi David. The second error message will appear in the logs when there
>>> has been an un-handled exception in rabbit's 'reader process'. The
>>> '{timeout, running}' state occurs when receiving data from the network
>>> takes too long (which is where the 'timeout' tag originates) but the
>>> AMQP connection state is 'running' - the second tag. When this condition
>>> arises, the broker's reader process shuts down the connection.
>>>
>>> So at first glance, it does look like a networking problem to me,
>>> although TBH I'm not sure about the nature of it just yet. I suspect it
>>> could have something to do with "... It can sometimes be several minutes
>>> before a message is acked or nacked ...".
>>>
>>>> It happens more frequently when the system is busy (500-1000
>>>> messages/s) than when it is cruising (50 messages/s). The application
>>>> and RabbitMQ are on the same box so I don't think its a
>>>> networking/firewall problem.
>>>>
>>>> The application only uses one connection and one channel. Messages are
>>>> received using Connection.basicConsume(..) and acked or nacked
>>>> sometime later on different threads. It can sometimes be several
>>>> minutes before a message is acked or nacked. We are using
>>>> Connection.basicQos(10000) to limit the number of outstanding messages
>>>> but its not often that reaches 10000 and the error happens anytime.
>>>> There is defensive code to make sure messages are not acked/nacked
>>>> more than once.
>>>>
>>>> Anyone have any ideas? Is there any way to get more information on the
>>>> server or client side regarding the error?
>>>>
>>>
>>> Well the server error seems pretty clear and just to be 100% sure, I can
>>> verify that the error message you're seeing is only generated in one
>>> place in the whole code base:
>>>
>>> https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_reader.erl#L241.
>>>
>>>
>>>
>>
>> Except that this isn't quite as clear as I suggested initially. The
>> networking layer isn't going to send a timeout on a blocking receive
>> like this - the 'timeout' message is being sent to the reader from some
>> other process. I will investigate further! :)
>>
>
> Ok so I think I've identified the source of this behaviour. The 'timeout'
> message is sent to the reader process by the AMQP heartbeat - a separate out
> of band process that is monitoring for 'stale' connections using heartbeat
> frames (see http://www.rabbitmq.com/amqp-0-9-1-reference.html for details).
>
> What I believe is happening is that as the server comes under load, the
> heartbeat for this connection is de-prioritised at the network layer and
> times out, causing the connection to be closed.
>
> Set your heartbeat to 0 (turning it off) for now and see if it fixes the
> problem? That's a short term investigative step though, obviously.
>
> I'll file a bug for improving the log messages when timeout occurs.
>
>
>>> The code handling the timeout condition is
>>>
>>> https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_reader.erl#L321,
>>>
>>> the call site for which is
>>>
>>> https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_reader.erl#L284.
>>>
>>> The origin of this is
>>>
>>> https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_net.erl#L108
>>>
>>> in rabbit_net - what the code does is make the calling process wait for
>>> a message from the Erlang networking layer. So it is the underlying
>>> TCP/IP layer in Erlang which is noticing a timeout on the connection.
>>>
>>> How this is mediated through the client's behaviour is another matter.
>>>
>>> Hope this helps us get started figuring out what's wrong!
>>>
>>> Cheers,
>>> Tim
>>>
>>>> RabbitMQ 2.8.1 on Gentoo Linux 64bit
>>>> Erlang R15B (erts-5.9) [source] [64-bit]
>>>> amqp-client-2.8.2 (also tried amqp-client-2.7.1)
>>>> java version "1.6.0_31"
>>>> Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
>>>> Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
>>>>
>>>> Thanks
>>>> David
>>>> _______________________________________________
>>>> rabbitmq-discuss mailing list
>>>> [hidden email]
>>>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>>
>>>
>>> _______________________________________________
>>> rabbitmq-discuss mailing list
>>> [hidden email]
>>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>>
>>
>> _______________________________________________
>> rabbitmq-discuss mailing list
>> [hidden email]
>> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
>



--
http://twoogleplus.com/ Online service to cross-post tweets to Google+
http://engage.calibreapps.com/ Engage - Web based goal setting and
performance management
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss