Pause minority cluster with publisher confirms losing messages

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Pause minority cluster with publisher confirms losing messages

Miguel Araujo Pérez
Hello,

We've setup a RabbitMQ 3.3.1-1 cluster of 3 nodes in pause minority mode. We are making some tests to make sure we don't lose any messages when a node of the cluster goes down. 

So I've setup a little Python script that uses py-amqp to queue messages, It uses publisher confirms for doing so. The queue is durable and mirrored through a policy to all nodes. I use the script to push to the 3 different nodes in a loop, running 3 separate processes, one message every second, each message containing information of the publisher that produced it. Once I am publishing to the 3 nodes separated I enter node3 and write iptables rules to close connection with the other 2 rabbitmq nodes. It takes the cluster around a minute to decide that one node is down and node3 to stop Rabbit process. publishers to nodes 1 and 2 keep working without issues, however publisher3 blocks right after node3 blocks connections as I would expect as node3 cannot confirm the message as it doesn't see the other 2 nodes. 

The issue is that sometimes after a while publisher3 resumes and continues pushing messages and according to the library receiving acks for them, that goes for a period of 6-8 seconds until an exception is raised because connection is closed (node3 stops Rabbit). Those "acked messages" aren't however in the queue when I consume it later to see what's inside. However, other times it works as i would expect and doesn't enqueue any other message after iptables takes place.

So I thought this could be a library issue, and ported the code to PHP using official php-amqplib and exact same thing happens. My theory is that sometimes node3 after trying to coordinate with other 2 nodes goes into a partition for some seconds, in those seconds it confirms messages and then pause minority cluster policy kicks in and stops Rabbit. 

To be honest, I'm open to suggestions on what to try. We cannot afford losing messages in any situation. 

Thanks, Cheers
Miguel

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pause minority cluster with publisher confirms losing messages

Michael Klishin-2


On 4 June 2014 at 11:58:41, Miguel Araujo Pérez ([hidden email]) wrote:

> > The issue is that sometimes after a while publisher3 resumes  
> and continues pushing messages and according to the library  
> receiving acks for them, that goes for a period of 6-8 seconds  
> until an exception is raised because connection is closed (node3  
> stops Rabbit). Those "acked messages" aren't however in the  
> queue when I consume it later to see what's inside. However, other  
> times it works as i would expect and doesn't enqueue any other  
> message after iptables takes place.
>  
> So I thought this could be a library issue, and ported the code  
> to PHP using official php-amqplib and exact same thing happens.  
> My theory is that sometimes node3 after trying to coordinate  
> with other 2 nodes goes into a partition for some seconds, in those  
> seconds it confirms messages and then pause minority cluster  
> policy kicks in and stops Rabbit.

Yes, it takes time for both RabbitMQ and client libraries to detect
connection failure. This is in part due to how TCP works. You can configure
the interval of inactivity for RabbitMQ nodes:

https://www.rabbitmq.com/nettick.html

and use a low (say, 1-3 seconds) heartbeat interval for client libraries.
This should make the exception be thrown much earlier (given that your client
supports it; Pika should) at the cost of having increased network traffic:

http://www.rabbitmq.com/reliability.html

Beyond that, your apps can publish last N messages (excessively) after a network
failure. If your consumers can de-duplicate them (e.g. every message has an id you can set),
that should work well.

If that's not the case, there is a trick that some companies do: they run a RabbitMQ
node local to machine (which at least greatly reduces the probability of RabbitMQ becoming
unreachable), publish with publisher confirms and a low heartbeat interval to the local
node and use Federation [1] or Shovel [2] to connect that node to other nodes.

By the way, there are only two official clients: Java and .NET.

1. http://www.rabbitmq.com/federation.html
2. http://www.rabbitmq.com/shovel.html
--  
MK  

Software Engineer, Pivotal/RabbitMQ
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pause minority cluster with publisher confirms losing messages

Miguel Araujo Pérez
Hi Michael,

Thanks for your fast reply.

To be honest, I don't mind that when a node goes down in a RabbitMQ cluster it takes a minute or more to decide that the cluster is broken and what to do. What I don't fully understand is why the node fallen stops confirming for a while, then suddenly resumes for some seconds (not always, just some times) and then stops Rabbit process closing the connection and having confirmed messages lost.

It's my understanding that the node should do something like, I cannot see nodes 1 and 2 (connection is broken), I'm by myself here so I cannot confirm your publishes. Then says I've got to stop, because I'm in minority. However, the fact that is confirming messages for a small lapse of time feels like something is not completely working. Also this actually doesn't always happens, sometimes it does it right, so it's not consistent.

To be honest, i like the trick of having a local RabbitMQ, however for us it would be simpler just a cluster. Having a local RabbitMQ, maintaing some federation or shoveling would be a little overkill.

While doing all these tests. Once, when flushing iptables in node3 it has core dumped some Erlang trace. All times before it simply detects network and rejoins cluster without issues. is this something i should report? how?

Thanks, cheers
Miguel


2014-06-04 10:17 GMT+02:00 Michael Klishin <[hidden email]>:


On 4 June 2014 at 11:58:41, Miguel Araujo Pérez ([hidden email]) wrote:
> > The issue is that sometimes after a while publisher3 resumes
> and continues pushing messages and according to the library
> receiving acks for them, that goes for a period of 6-8 seconds
> until an exception is raised because connection is closed (node3
> stops Rabbit). Those "acked messages" aren't however in the
> queue when I consume it later to see what's inside. However, other
> times it works as i would expect and doesn't enqueue any other
> message after iptables takes place.
>
> So I thought this could be a library issue, and ported the code
> to PHP using official php-amqplib and exact same thing happens.
> My theory is that sometimes node3 after trying to coordinate
> with other 2 nodes goes into a partition for some seconds, in those
> seconds it confirms messages and then pause minority cluster
> policy kicks in and stops Rabbit.

Yes, it takes time for both RabbitMQ and client libraries to detect
connection failure. This is in part due to how TCP works. You can configure
the interval of inactivity for RabbitMQ nodes:

https://www.rabbitmq.com/nettick.html

and use a low (say, 1-3 seconds) heartbeat interval for client libraries.
This should make the exception be thrown much earlier (given that your client
supports it; Pika should) at the cost of having increased network traffic:

http://www.rabbitmq.com/reliability.html

Beyond that, your apps can publish last N messages (excessively) after a network
failure. If your consumers can de-duplicate them (e.g. every message has an id you can set),
that should work well.

If that's not the case, there is a trick that some companies do: they run a RabbitMQ
node local to machine (which at least greatly reduces the probability of RabbitMQ becoming
unreachable), publish with publisher confirms and a low heartbeat interval to the local
node and use Federation [1] or Shovel [2] to connect that node to other nodes.

By the way, there are only two official clients: Java and .NET.

1. http://www.rabbitmq.com/federation.html
2. http://www.rabbitmq.com/shovel.html
--
MK

Software Engineer, Pivotal/RabbitMQ


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pause minority cluster with publisher confirms losing messages

Michael Klishin-2
On 4 June 2014 at 14:04:49, Miguel Araujo Pérez ([hidden email]) wrote:
> > It's my understanding that the node should do something like,  
> I cannot see nodes 1 and 2 (connection is broken), I'm by myself  
> here so I cannot confirm your publishes. Then says I've got to  
> stop, because I'm in minority. However, the fact that is confirming  
> messages for a small lapse of time feels like something is not  
> completely working. Also this actually doesn't always happens,  
> sometimes it does it right, so it's not consistent.

While I'm not very familiar with how the pause process works, there is an inherent race
condition between the decision to pause itself and incoming messages that are confirmed.

Once a node decides to pause, there may be messages "in flight" that were already
read from the socket and parsed, and being delivered to queues. These processes
(in both general and Erlang sense) can run in parallel on machines with over 1 core.

I'm not sure there is a one-size-fits-all solution on the server end. Try publishing
batches of messages and wait for confirms for a batch (and not a single message).
Then you'll have to re-try with batches, too, which means if a part of the earlier
batch was lost due to the race condition explained above, they will be retried.

And batching is a recommended practice with publisher confirms anyway. 
--  
MK  

Software Engineer, Pivotal/RabbitMQ
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pause minority cluster with publisher confirms losing messages

Simon MacMullen-2
On 04/06/14 11:22, Michael Klishin wrote:
> While I'm not very familiar with how the pause process works, there is an inherent race
> condition between the decision to pause itself and incoming messages that are confirmed.
>
> Once a node decides to pause, there may be messages "in flight" that were already
> read from the socket and parsed, and being delivered to queues. These processes
> (in both general and Erlang sense) can run in parallel on machines with over 1 core.

Yes, that's correct. Now at the moment the window can be of a reasonable
size, as when minority detection kicks in we do a graceful shutdown. We
could reduce the size of the window by telling all channels to
immediately close before doing anything else, but we can't eliminate the
window altogether.

Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pause minority cluster with publisher confirms losing messages

Michael Klishin-2
In reply to this post by Miguel Araujo Pérez
On 4 June 2014 at 14:55:57, Miguel Araujo Pérez ([hidden email]) wrote:
> > While doing all these tests. Once, when flushing iptables in  
> node3 it has core dumped some Erlang trace. All times before it  
> simply detects network and rejoins cluster without issues.  
> is this something i should report? how?

Miguel,

We have filed a bug for the general issue here. Feel free to post the trace
you see to the list (unless you think it contains sensitive information, which
is probably doesn't). 
--  
MK  

Software Engineer, Pivotal/RabbitMQ
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pause minority cluster with publisher confirms losing messages

Miguel Araujo Pérez
Hi,

Thanks, is there a URL I can access to follow bug status?  

> Once a node decides to pause, there may be messages "in flight" that were already
> read from the socket and parsed, and being delivered to queues. These processes
> (in both general and Erlang sense) can run in parallel on machines with over 1 core.

My understanding is that for confirming a message, a node in the cluster must see the other nodes and get confirmation from them. If that is the case It makes sense it doesn't confirm messages when iptables rules are applied and that is what happens after some seconds, when it resumes and starts confirming messages that are then lost. Not sure I follow how multiple cores make things harder here, I'm probably not seeing some concurrent issue here.

I'm attaching Erlang log from node3 here. if you look at it you will see how first thing it detects is node rabbitmq-2 and rabbitmq-1 are not responding. Then it promotes mirrored queues from slave to master. Cluster minority status detected comes last thing.

I'm not an expert in RabbitMQ internals, I've been reading the code parts that control this flow and it feels like confirms could be paused until being sure things are ok. I mean if node3 knows it's connected to 2 nodes (node2 and node1), then sees both nodes down, looks like something is going wrong. 

The part that most strikes me is that it takes 1 minute and 3 seconds to detect minority since we know both nodes are down?

I will send another email with the Erlang trace.

Thanks, cheers
Miguel


2014-06-04 12:57 GMT+02:00 Michael Klishin <[hidden email]>:
On 4 June 2014 at 14:55:57, Miguel Araujo Pérez ([hidden email]) wrote:
> > While doing all these tests. Once, when flushing iptables in
> node3 it has core dumped some Erlang trace. All times before it
> simply detects network and rejoins cluster without issues.
> is this something i should report? how?

Miguel,

We have filed a bug for the general issue here. Feel free to post the trace
you see to the list (unless you think it contains sensitive information, which
is probably doesn't). 
--
MK

Software Engineer, Pivotal/RabbitMQ


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

node3.log (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pause minority cluster with publisher confirms losing messages

Victor Bronstein
Hi Miguel!
I seem to be observing exactly the same behavior you describe - when the network partition happens, at first the sender is blocked waiting for acknowledgement, then it finally gets it but then there are 6-7 seconds before the partitioned node shuts itself down when the sender can happily send messages to it and get immediate acknowledgements. Naturally, nothing of these messages enter the queue.
Have you been able to understand why it happens and how one could work around it?
Thanks,
Victor
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Pause minority cluster with publisher confirms losing messages

Simon MacMullen-2
On 31/07/14 17:02, victorbr wrote:
> I seem to be observing exactly the same behavior you describe - when the
> network partition happens, at first the sender is blocked waiting for
> acknowledgement, then it finally gets it but then there are 6-7 seconds
> before the partitioned node shuts itself down when the sender can happily
> send messages to it and get immediate acknowledgements. Naturally, nothing
> of these messages enter the queue.
> Have you been able to understand why it happens and how one could work
> around it?

This is a known bug - it is fixed in nightly builds and will go into
3.3.5. Look for 26225 / 26293 in the release notes.

Cheers, Simon
_______________________________________________
rabbitmq-discuss mailing list has moved to https://groups.google.com/forum/#!forum/rabbitmq-users,
please subscribe to the new list!

[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Loading...