Exactly Once Delivery

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Exactly Once Delivery

Mike Petrusis
Greetings,

In reviewing the mailing list archives, I see various threads which state that ensuring "exactly once" delivery requires deduplication by the consumer.  For example the following:

"Exactly-once requires coordination between consumers, or idempotency,
even when there is just a single queue. The consumer, broker or network
may die during the transmission of the ack for a message, thus causing
retransmission of the message (which the consumer has already seen and
processed) at a later point."  http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2009-July/004237.html

In the case of competing consumers which pull messages from the same queue, this will require some sort of shared state between consumers to de-duplicate messages (assuming the consumers are not idempotent).  

Our application is using RabbitMQ to distribute tasks across multiple workers residing on different servers, this adds to the cost of sharing state between the workers.

Another message in the email archive mentions that "You can guarantee exactly-once delivery if you use transactions, durable queues and exchanges, and persistent messages, but only as long as any failing node eventually recovers."

From the way I understand it, the transaction only affects the publishing of the message into RabbitMQ and prevents the message from being queued until the transaction is committed.  If this is correct, I don't understand how the transaction will prevent a duplicate message in the previously mentioned scenarios that will cause a retransmission.  Can anybody clarify?  

On a more practical level:

What's the recommended way to deal with the potential of duplicate messages?  
What do people generally do?
Is this a rare enough edge case that most people just ignore it?


Thanks,

Mike
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

Matthew Sackman-3
Hi Mike,

On Tue, Aug 03, 2010 at 04:43:56AM -0400, Mike Petrusis wrote:

> In reviewing the mailing list archives, I see various threads which state that ensuring "exactly once" delivery requires deduplication by the consumer.  For example the following:
>
> "Exactly-once requires coordination between consumers, or idempotency,
> even when there is just a single queue. The consumer, broker or network
> may die during the transmission of the ack for a message, thus causing
> retransmission of the message (which the consumer has already seen and
> processed) at a later point."  http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2009-July/004237.html
>
> In the case of competing consumers which pull messages from the same queue, this will require some sort of shared state between consumers to de-duplicate messages (assuming the consumers are not idempotent).  
>
> Our application is using RabbitMQ to distribute tasks across multiple workers residing on different servers, this adds to the cost of sharing state between the workers.
>
> Another message in the email archive mentions that "You can guarantee exactly-once delivery if you use transactions, durable queues and exchanges, and persistent messages, but only as long as any failing node eventually recovers."

All the above is sort of wrong. You can never *guarantee* exactly once
(there's always some argument about whether receiving message duplicates
but relying on idempotency is achieving exactly once. I don't feel it
does, and this should become clearer as to why further on...)

The problem is publishers. If the server on which RabbitMQ is running
crashes, after commiting a transaction containing publishes, it's
possible the commit-ok message may get lost. Thus the publishers still
think they need to republish, so wait until the broker comes back up and
then republishes. This can happen an infinite number of times: the
publishers connect, start a transaction, publish messages, commit the
transaction and then the commit-ok gets lost and so the publishers
repeat the process.

As a result, on the clients, you need to detect duplicates. Now this is
really a barrier to making all operations idempotent. The problem is
that you never know how many copies of a message there will be. Thus you
never know when it's safe to remove messages from your dedup cache. Now
things like redis apparently have the means to delete entries after an
amount of time, which would at least allow you to avoid the database
eating up all the RAM in the universe, but there's still the possibility
that after the entry's been deleted, another duplicate will come along
which you now won't detect as a duplicate.

This isn't just a problem with RabbitMQ - in any messaging system, if
any message can be lost, you can not achieve exactly once semantics. The
best you can hope for is a probability of a large number of 9s that you
will be able to detect all the duplicates. But that's the best you can
achieve.

Scaling horizontally is thus more tricky because, as you say, you may
now have multiple consumers which each receive one copy of a message.
Thus the dedup database would have to be distributed. With high message
rates, this might well become prohibitive because of the amount of
network traffic due to transactions between the consumers.

> What's the recommended way to deal with the potential of duplicate messages?  

Currently, there is no "recommended" way. If you have a single consumer,
it's quite easy - something like tokyocabinet should be more than
sufficiently performant. For multiple consumers, you're currently going
to have to look at some sort of distributed database.

> Is this a rare enough edge case that most people just ignore it?

No idea. But one way of making your life easier is for the producer to
send slightly different messages on every republish (they would still
obviously need to have the same msg id). That way, if you detect a msg
with "republish count" == 0, then you know it's the first copy, so you
can insert async into your shared database and then act on the message.
You only need to do a query on the database whenever you receive a msg
with "republish count" > 0 - thus you can tune your database for
inserts and hopefully save some work - the common case will then be the
first case, and lookups will be exceedingly rare.

The question then is: if you've received a msg, republish count > 0 but
there are no entries in the database, what do you do? It shouldn't have
overtaken the first publish (though if consumers disconnected without
acking, or requeued messages, it could have), but you need to cause some
sort of synchronise operation between all the consumers to ensure none
are in the process of adding to the database - it all gets a bit hairy
at this point.

Thus if your message rate is low, you're much safer doing the insert and
select on every message. If that's too expensive, you're going to have
to think very hard indeed about how to avoid races between different
consumers thinking they're both/all responsible for acting on the same
message.

This stuff isn't easy.

Matthew
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

John Apps
Matthew,
  an excellent response and thank you for it! Yes, difficult it is!

It raises a somewhat philosophical discussion around where the onus is placed in terms of guaranteeing such things as 'guaranteed once', i.e., on the client side or on the server side? The JMS standard offers guaranteed once, whereby the onus is on the server (JMS implementation) and not on the client. 

What I am trying to say is that, in my opinion, client programs should be as 'simple' as possible with the servers doing all the hard work. This is what the JMS standard forces on implementors and, perhaps to a lesser extent today, do does AMQP.

Note: the word 'server' is horribly overloaded these days. It is used here to indicate the software with which clients, producers and consumers, communicate.

Oh well, off to librabbitMQ and some example programs written in COBOL...

Cheers, John
On Thu, Aug 5, 2010 at 13:22, Matthew Sackman <[hidden email]> wrote:
Hi Mike,

On Tue, Aug 03, 2010 at 04:43:56AM -0400, Mike Petrusis wrote:
> In reviewing the mailing list archives, I see various threads which state that ensuring "exactly once" delivery requires deduplication by the consumer.  For example the following:
>
> "Exactly-once requires coordination between consumers, or idempotency,
> even when there is just a single queue. The consumer, broker or network
> may die during the transmission of the ack for a message, thus causing
> retransmission of the message (which the consumer has already seen and
> processed) at a later point."  http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2009-July/004237.html
>
> In the case of competing consumers which pull messages from the same queue, this will require some sort of shared state between consumers to de-duplicate messages (assuming the consumers are not idempotent).
>
> Our application is using RabbitMQ to distribute tasks across multiple workers residing on different servers, this adds to the cost of sharing state between the workers.
>
> Another message in the email archive mentions that "You can guarantee exactly-once delivery if you use transactions, durable queues and exchanges, and persistent messages, but only as long as any failing node eventually recovers."

All the above is sort of wrong. You can never *guarantee* exactly once
(there's always some argument about whether receiving message duplicates
but relying on idempotency is achieving exactly once. I don't feel it
does, and this should become clearer as to why further on...)

The problem is publishers. If the server on which RabbitMQ is running
crashes, after commiting a transaction containing publishes, it's
possible the commit-ok message may get lost. Thus the publishers still
think they need to republish, so wait until the broker comes back up and
then republishes. This can happen an infinite number of times: the
publishers connect, start a transaction, publish messages, commit the
transaction and then the commit-ok gets lost and so the publishers
repeat the process.

As a result, on the clients, you need to detect duplicates. Now this is
really a barrier to making all operations idempotent. The problem is
that you never know how many copies of a message there will be. Thus you
never know when it's safe to remove messages from your dedup cache. Now
things like redis apparently have the means to delete entries after an
amount of time, which would at least allow you to avoid the database
eating up all the RAM in the universe, but there's still the possibility
that after the entry's been deleted, another duplicate will come along
which you now won't detect as a duplicate.

This isn't just a problem with RabbitMQ - in any messaging system, if
any message can be lost, you can not achieve exactly once semantics. The
best you can hope for is a probability of a large number of 9s that you
will be able to detect all the duplicates. But that's the best you can
achieve.

Scaling horizontally is thus more tricky because, as you say, you may
now have multiple consumers which each receive one copy of a message.
Thus the dedup database would have to be distributed. With high message
rates, this might well become prohibitive because of the amount of
network traffic due to transactions between the consumers.

> What's the recommended way to deal with the potential of duplicate messages?

Currently, there is no "recommended" way. If you have a single consumer,
it's quite easy - something like tokyocabinet should be more than
sufficiently performant. For multiple consumers, you're currently going
to have to look at some sort of distributed database.

> Is this a rare enough edge case that most people just ignore it?

No idea. But one way of making your life easier is for the producer to
send slightly different messages on every republish (they would still
obviously need to have the same msg id). That way, if you detect a msg
with "republish count" == 0, then you know it's the first copy, so you
can insert async into your shared database and then act on the message.
You only need to do a query on the database whenever you receive a msg
with "republish count" > 0 - thus you can tune your database for
inserts and hopefully save some work - the common case will then be the
first case, and lookups will be exceedingly rare.

The question then is: if you've received a msg, republish count > 0 but
there are no entries in the database, what do you do? It shouldn't have
overtaken the first publish (though if consumers disconnected without
acking, or requeued messages, it could have), but you need to cause some
sort of synchronise operation between all the consumers to ensure none
are in the process of adding to the database - it all gets a bit hairy
at this point.

Thus if your message rate is low, you're much safer doing the insert and
select on every message. If that's too expensive, you're going to have
to think very hard indeed about how to avoid races between different
consumers thinking they're both/all responsible for acting on the same
message.

This stuff isn't easy.

Matthew
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



--
---
John Apps
(49) 171 869 1813

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

Tony Garnock-Jones-5
John Apps wrote:
> The JMS standard offers guaranteed once

What exactly do they mean by that? In particular, how do they deal with
duplicates? Do they report failure, or silently let a dup through in certain
situations? If you could point me to the part of the spec that sets out the JMS
resolution of these issues, that's be really useful.

Tony

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

Tony Garnock-Jones-5
In reply to this post by Matthew Sackman-3
Matthew Sackman wrote:
> As a result, on the clients, you need to detect duplicates. Now this is
> really a barrier to making all operations idempotent. The problem is
> that you never know how many copies of a message there will be. Thus you
> never know when it's safe to remove messages from your dedup cache.

The other piece of this is time-to-live (TTL). Given a finite-length dedup
cache and message TTL, you can detect and report failure. (And if the ack
travels upstream to the publisher, you can report failures at the send end,
too.) Without the TTL, you have silent dups on rare occasions.

Tony

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

David Wragg-4
In reply to this post by Tony Garnock-Jones-5
Tony Garnock-Jones <[hidden email]> writes:
> John Apps wrote:
>> The JMS standard offers guaranteed once
>
> What exactly do they mean by that? In particular, how do they deal
> with duplicates? Do they report failure, or silently let a dup through
> in certain situations? If you could point me to the part of the spec
> that sets out the JMS resolution of these issues, that's be really
> useful.

As an API spec, it's quite easy for JMS to mandate something apparently
impossible, without hinting at how it might actually be implemented.

Most of the spec says that the PERSISTENT delivery mode gives
"once-and-only-once" delivery.  But section 4.4.13 (of JMS 1.1) admits
that there are a number of caveats to this.  So it's really
"once-and-only-once-except-in-some-corner-cases".

I think the wrinkle that might prevent us saying that RabbitMQ gives the
same guarantees is on the publishing side.  The caveats in JMS all seems
to apply only to the consuming side.  But what happens with an AMQP
producer if the connection gets dropped before a tx.commit-ok gets back
to the client?  In that case the client has to re-publish, leading to a
potential dup.  This can be avoided by a de-dup filter on published
messages in the broker.  I don't know if JMS brokers really go to such
lengths.

David

--
David Wragg
Staff Engineer, RabbitMQ
SpringSource, a division of VMware
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

Michael Bridgen-3
In reply to this post by Tony Garnock-Jones-5
> John Apps wrote:
>> The JMS standard offers guaranteed once
>
> What exactly do they mean by that? In particular, how do they deal with
> duplicates? Do they report failure, or silently let a dup through in certain
> situations? If you could point me to the part of the spec that sets out the JMS
> resolution of these issues, that's be really useful.

For consumers, JMS has client ack mode; the application acknowledges
messages, and the server must not resend a message that has been
acknowledged.

A failure in the connection may result in the server resending a message
which the application thinks it has acknowledged.  The spec suggests
"Since such clients cannot know for certain if a particular message has
been acknowledged, they must be prepared for redelivery of the last
consumed message.".  I.e., the client application has to have an
idempotency barrier.

For producers, duplicate publishing is simply prohibited.  As for
failure modes -- "A message that is redelivered due to session recovery
is not considered a duplicate message."

So JMS cannot magically do "exactly once" any more than anything else.


--Michael
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

Matthew Sackman-3
On Thu, Aug 05, 2010 at 03:17:28PM +0100, Michael Bridgen wrote:
> For producers, duplicate publishing is simply prohibited.

So that seems to suggest that every messages is universally unique?

If this is correct, who's responsibility is it to add GUIDs (or some
such) to every message? Does the client library do that automatically?

Matthew
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

Tony Menges

The JMS provider sets the message id. It is supposed to be unique enough to be used for a "historical repository" but the scope of uniqueness is left to the provider. It is recommended that it should be at least unique for a given "installation". I don't think this helps on the publisher side since as you pointed out the notification of the completion of the publish might not make it back to the producer.

JMS requires the provider to set the redelivered flag (and optionally the delivery count) field if it thinks the message has been given to the application before. The application may or may not have seen it but this flag can be used to trigger the check for a duplicate by the application. The use of unique message ids helps on this end.

Tony Menges
VMware, Inc.


On 8/5/10 7:25 AM, "Matthew Sackman" <[hidden email]> wrote:

On Thu, Aug 05, 2010 at 03:17:28PM +0100, Michael Bridgen wrote:
> For producers, duplicate publishing is simply prohibited.

So that seems to suggest that every messages is universally unique?

If this is correct, who's responsibility is it to add GUIDs (or some
such) to every message? Does the client library do that automatically?

Matthew
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

John Apps
In reply to this post by Tony Garnock-Jones-5
From my possibly naive understanding of the spec, it means quite simply that a message will be delivered guaranteed once and only once; but I somehow do not think that that is quite what you were asking?

The nice part about JMS is that it is only an API spec and says nothing about implementation.
I would have to look into the spec to see what the answer is to the question: "...how do they deal with duplicates..." etc. If I find the time, I shall be happy to look at the odd JMS implementation and see what the various vendors do in cases such as that in question.
What I do know is that one can specify notification for when a message with "guaranteed delivery" simply cannot be delivered, for whatever reason. This can be to the client or, more likely, as a message from the 'server' to those that want to know.

A relatively unknown product called Reliable Transaction Router (RTR), architected and developed long ago by DEC and still maintained and developed by HP, warns, when it considers that a message *may* be a duplicate, i.e., has possibly been delivered previously, of the fact. This is also the case when messages are being 'replayed' after a server has been brought down and is now receiving messages which flowed through the network whilst it was down. 

There is much discussion around the word "guaranteed", the objection being that nothing can be "guaranteed". Of course it cannot, but if we take things to that extent, we may as well give up right away!

On Thu, Aug 5, 2010 at 15:48, Tony Garnock-Jones <[hidden email]> wrote:
John Apps wrote:
> The JMS standard offers guaranteed once

What exactly do they mean by that? In particular, how do they deal with
duplicates? Do they report failure, or silently let a dup through in certain
situations? If you could point me to the part of the spec that sets out the JMS
resolution of these issues, that's be really useful.

Tony




--
---
John Apps
(49) 171 869 1813

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

Matthew Sackman-3
In reply to this post by Tony Menges
On Thu, Aug 05, 2010 at 09:16:14AM -0700, Tony Menges wrote:
> The JMS provider sets the message id. It is supposed to be unique enough to be used for a "historical repository" but the scope of uniqueness is left to the provider. It is recommended that it should be at least unique for a given "installation". I don't think this helps on the publisher side since as you pointed out the notification of the completion of the publish might not make it back to the producer.
>
> JMS requires the provider to set the redelivered flag (and optionally the delivery count) field if it thinks the message has been given to the application before. The application may or may not have seen it but this flag can be used to trigger the check for a duplicate by the application. The use of unique message ids helps on this end.

Ahh interesting. It would thus seem that JMS requires slightly more of
the producer when publishing messages (more logic is required in the
client library there) and AMQP possibly requires more at the consumer
side.

Matthew
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

Mike Petrusis
Thanks all for the input. I've got a better understanding of the issues now and it sounds like the issue is the same regardless of the use of transactions.  

Matthew's idea of having producers add a "republish count" to the messages is good suggestion to optimize the de-duplication of messages, but this only helps for messages resent by a producer.

Can messages get duplicated while they are propagating in the broker?  If duplicates are produced in the broker they will have the same "republish count" and this method won't work.  


-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Matthew Sackman
Sent: Thursday, August 05, 2010 9:26 AM
To: [hidden email]
Subject: Re: [rabbitmq-discuss] Exactly Once Delivery

On Thu, Aug 05, 2010 at 09:16:14AM -0700, Tony Menges wrote:
> The JMS provider sets the message id. It is supposed to be unique enough to be used for a "historical repository" but the scope of uniqueness is left to the provider. It is recommended that it should be at least unique for a given "installation". I don't think this helps on the publisher side since as you pointed out the notification of the completion of the publish might not make it back to the producer.
>
> JMS requires the provider to set the redelivered flag (and optionally the delivery count) field if it thinks the message has been given to the application before. The application may or may not have seen it but this flag can be used to trigger the check for a duplicate by the application. The use of unique message ids helps on this end.

Ahh interesting. It would thus seem that JMS requires slightly more of
the producer when publishing messages (more logic is required in the
client library there) and AMQP possibly requires more at the consumer
side.

Matthew
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

Matthew Sackman-3
On Thu, Aug 05, 2010 at 10:28:17PM -0400, Mike Petrusis wrote:
> Can messages get duplicated while they are propagating in the broker?  If duplicates are produced in the broker they will have the same "republish count" and this method won't work.  

Well, a message that is sent to an exchange which then results in the
message going to several queues will obviously be duplicated. But
presumably in that case, your consumers consuming from the different
queues would be doing different tasks with the messages, hence the need
for the different queues in the first place.

That aside, no, within a queue, Rabbit does not arbitrarily duplicate
messages.

Matthew
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

Tim Fox
In reply to this post by David Wragg-4
On 05/08/10 15:16, David Wragg wrote:

> Tony Garnock-Jones<[hidden email]>  writes:
>    
>> John Apps wrote:
>>      
>>> The JMS standard offers guaranteed once
>>>        
>> What exactly do they mean by that? In particular, how do they deal
>> with duplicates? Do they report failure, or silently let a dup through
>> in certain situations? If you could point me to the part of the spec
>> that sets out the JMS resolution of these issues, that's be really
>> useful.
>>      
> As an API spec, it's quite easy for JMS to mandate something apparently
> impossible, without hinting at how it might actually be implemented.
>
> Most of the spec says that the PERSISTENT delivery mode gives
> "once-and-only-once" delivery.  But section 4.4.13 (of JMS 1.1) admits
> that there are a number of caveats to this.  So it's really
> "once-and-only-once-except-in-some-corner-cases".
>
> I think the wrinkle that might prevent us saying that RabbitMQ gives the
> same guarantees is on the publishing side.  The caveats in JMS all seems
> to apply only to the consuming side.  But what happens with an AMQP
> producer if the connection gets dropped before a tx.commit-ok gets back
> to the client?  In that case the client has to re-publish, leading to a
> potential dup.  This can be avoided by a de-dup filter on published
> messages in the broker.  I don't know if JMS brokers really go to such
> lengths.
>    
Some do. It's fairly common for JMS brokers to implement duplicate
detection on the server side, to get around the "lost commit-ok problem"
and give us near as possible once and only once, from the publisher to
the server at least.

The way we do it in HornetQ is we have a well defined header key
"_HQ_DUP_ID". The client can set this with some unique value of it's
choice before sending (e.g. a GUID). When the server receives the
message if the _HQ_DUP_ID header is set, it looks up the value in it's
cache, and if it's seen it before it ignores it. The cache can
optionally be persisted.

On the client side, the producer can resend the message/transaction if
it does not receive a confirmation-ok, so it effectively makes
sends/commits idempotent.
David
>    


--
Sent from my BBC Micro Model B

Tim Fox
JBoss

HornetQ - putting the buzz in messaging http://hornetq.org
http://hornetq.blogspot.com/
http://twitter.com/hornetq
irc://irc.freenode.net:6667#hornetq
[hidden email]

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

Matthew Sackman-3
On Fri, Aug 06, 2010 at 10:43:56PM +0100, Tim Fox wrote:
> The way we do it in HornetQ is we have a well defined header key
> "_HQ_DUP_ID". The client can set this with some unique value of it's
> choice before sending (e.g. a GUID). When the server receives the
> message if the _HQ_DUP_ID header is set, it looks up the value in
> it's cache, and if it's seen it before it ignores it. The cache can
> optionally be persisted.

How do you prevent the cache from growing without bound?

Matthew
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

John Apps
On Sat, Aug 7, 2010 at 13:50, Matthew Sackman <[hidden email]> wrote:
On Fri, Aug 06, 2010 at 10:43:56PM +0100, Tim Fox wrote:
> The way we do it in HornetQ is we have a well defined header key
> "_HQ_DUP_ID". The client can set this with some unique value of it's
> choice before sending (e.g. a GUID). When the server receives the
> message if the _HQ_DUP_ID header is set, it looks up the value in
> it's cache, and if it's seen it before it ignores it. The cache can
> optionally be persisted.

How do you prevent the cache from growing without bound?

Matthew

That's really like the piece of string question, no? Of course it can fill up, as can the DB where things are persisted for those cases where messages cannot be delivered.
Having an unique ID in every message is not something new and not restricted to messaging, of course. It is simply a very good idea!
TCP/IP claims to be a 'reliable' transport...The problem with that is that packets get 'lost' or 'dropped' or simply die of 'old age'. Similar, but more complex, problems exist with queuing.
What has not been touched on in this little discussion so far is the question of transactions, and I do not mean those in the 0.9.1 spec, but those described in the 1.0 spec. Here again, JMS is leading the way with something which in my mind is as necessary as guaranteed once (or at least once). Updating DBs from queues and posting the results of those updates to queues should be atomic; and if I want my debit/credit to happen once rather than many times or not at all, then a combination of transactions and guaranteed delivery becomes very attractive both to the designer and the developers. Yes, ACID comes to mind here...and it is indeed what I am referring to.

It is great to participate in conversations of this nature - thank you for putting up with my sometimes oblique ramblings:-)
 
---
John Apps
(49) 171 869 1813

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

Matthias Radestock-3
John,

John Apps wrote:
> That's really like the piece of string question, no? Of course it can
> fill up, as can the DB where things are persisted for those cases where
> messages cannot be delivered.
> Having an unique ID in every message is not something new and not
> restricted to messaging, of course. It is simply a very good idea!

I believe Matthew was simply trying to point out that many of the
supposed guarantees of messaging systems are a lot softer than most
people think. In reality a "guarantee" is little more than an increase
in the probability that the right thing will happen. Coming clean about
that is going to be important for cloud computing to succeed - improving
the probabilities does come at a price, and for systems at massive
scales the cost/benefit calculations look quite different.

So, for example, using publisher-supplied message ids for de-duping
simply does not scale. Think what a genuine cloud messaging system would
have to do to handle the case where a producer injects the same message
first in a node in Australia and then in New York.

> What has not been touched on in this little discussion so far is the
> question of transactions

Similar considerations apply here. XA in the cloud? Hmmm.


Regards,

Matthias.
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

Tony Garnock-Jones-5
Matthias Radestock wrote:
> So, for example, using publisher-supplied message ids for de-duping
> simply does not scale. Think what a genuine cloud messaging system would
> have to do to handle the case where a producer injects the same message
> first in a node in Australia and then in New York.

What is the problem you're thinking of? Would a setup like the following cope?

 - publishers choose a message ID
 - publishers choose a TTL
 - receivers dedup based on message ID
 - receiver's dedup buffer is expired by (some factor of) TTL
 - each delivery contains an address to which the ACK should be routed

Tony

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

Matthias Radestock-3
Tony,

Tony Garnock-Jones wrote:
> Would a setup like the following cope?
>
>  - publishers choose a message ID
>  - publishers choose a TTL
>  - receivers dedup based on message ID
>  - receiver's dedup buffer is expired by (some factor of) TTL
>  - each delivery contains an address to which the ACK should be routed

That's end-to-end dedup you are thinking of. Nothing wrong with that,
and it doesn't require the broker to do/know anything. The context of
the discussion here was a "broker dedups publishes" feature.

Matthias.
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Exactly Once Delivery

Alexis Richardson-5
In reply to this post by Matthew Sackman-3
On Sat, Aug 7, 2010 at 12:50 PM, Matthew Sackman <[hidden email]> wrote:
> On Fri, Aug 06, 2010 at 10:43:56PM +0100, Tim Fox wrote:
>> The way we do it in HornetQ is we have a well defined header key
>> "_HQ_DUP_ID". The client can set this with some unique value of it's
>> choice before sending (e.g. a GUID). When the server receives the
>> message if the _HQ_DUP_ID header is set, it looks up the value in
>> it's cache, and if it's seen it before it ignores it. The cache can
>> optionally be persisted.
>
> How do you prevent the cache from growing without bound?

AFAIK the normal approach with this system is to bound it arbitrarily.



> Matthew
> _______________________________________________
> rabbitmq-discuss mailing list
> [hidden email]
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
12