Unexaplainable behaviour with shovel plugin.

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Unexaplainable behaviour with shovel plugin.

Claire Fautsch
Hello,

we are currently experiencing some (for us) unexaplainable behaviour with shovel plugin.

Before, I enter into the details, let me first outline our setup.

We have a set of RabbitMQ Servers hosted in the cloud, where incoming messages are queued. On those servers we then have the shovel plugin running, which distributes the incoming messages to a set of RabbitMQ Servers in our local datacenter.
The configuration of the shovel does not directly specifiy a RabbitMQ Server as its destination, but a load balancer, which then distributes to the servers in our local datacenter.

Since a few days we are now encountering the following Problem:
The queues on our cloud servers reach (and pass) the Memory watermarks, because of Unacknowledged Messages. The queue size of ready messages is almost constantly on 0 (i.e., messages are delivered as fast as published) however the unacknowledged messages keep growing.
On the other side, at the same time the statistics on our local servers show, that messages are confirmed almost as fast as published.

This leads to the fact, that we have on the destination servers a total of two hanful of messages that are not yet confirmed, however on the source servers we have millions of messages that are waiting for confirmation (acknowledgement)

We would expect with some threshold that
delivery rate on source = publish rate on destination (which is the case)
confirm rate on destination = acknowledge rate on source (which shows considerable difference)

Does anyone have an idea or suggestion what could be the reason for this? Is it a bad idea to have load balancer as destination in the shovel, or should that work fine? Network issue?

Here some more details on our shovel setup:
ack_mode=on_confirm
prefetch_count=0 (default)
reconnect_delay=5


Thanks for advance for any hint or discussion point,
Regards
Claire           
                   
                   

                        
                         
                        

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

Simon MacMullen-2
On 26/02/14 08:57, Claire Fautsch wrote:
> Hello,

Hi!

> This leads to the fact, that we have on the destination servers a total
> of two hanful of messages that are not yet confirmed, however on the
> source servers we have millions of messages that are waiting for
> confirmation (acknowledgement)
>
> We would expect with some threshold that
> delivery rate on source = publish rate on destination (which is the case)
> confirm rate on destination = acknowledge rate on source (which shows
> considerable difference)
>
> Does anyone have an idea or suggestion what could be the reason for
> this? Is it a bad idea to have load balancer as destination in the
> shovel, or should that work fine? Network issue?

I doubt the load balancer is the problem. I think I have a reasonable
idea where the problem lies.

The issue is that the shovel does not enforce any form of flow control
other than (optionally) using prefetch limiting, which you are not using.

So your source servers are delivering messages into the shovel as fast
as they can, and your destination servers are accepting messages as fast
as *they* can, but they are ending up being a bit slower. Nothing is
creating any back pressure on the source servers, and so messages are
queuing up inside the shovel. Since you are using on_confirm ack mode,
these show as unacknowledged messages on the source.

> Here some more details on our shovel setup:
> ack_mode=on_confirm
> prefetch_count=0 (default)
> reconnect_delay=5

I suspect that if you set prefetch_count to some high-but-not-insane
number (exactly how high depends on your message size + rate but I might
start the bidding at 1,000) this might solve your problem.

Of course, if your destination servers are actually slower than your
source ones, then you might need to do something about that. But turning
on prefetch limiting would make the system better-behaved and make it
clearer where your issues are.

There might be another issue though - on all released versions of
RabbitMQ turning on prefetch limiting reduces performance somewhat. This
will get fixed in the next release.

Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

Claire Fautsch
Hi Simon

thanks for your feedback. This is also what we finally thought. I actually made a small mistake in my comment above, delivery rate on source != publish rate on destination, and confirm and ack rate are equal (so the other way round).

So we are currently thinking of setting the prefetch_count.

Probably this will not really solve the situation, as we will see then the messages waiting as "Ready" instead of as "Unacknowledged", but on the other hand, maybe it avoids the shovel connections to the destination brokers to get in a "flow" or even "blocked" state, where the publishs are limited. (any opinion on this?)

Thanks,
Claire


2014-02-27 13:03 GMT+01:00 Simon MacMullen <[hidden email]>:
On 26/02/14 08:57, Claire Fautsch wrote:
Hello,

Hi!

This leads to the fact, that we have on the destination servers a total
of two hanful of messages that are not yet confirmed, however on the
source servers we have millions of messages that are waiting for
confirmation (acknowledgement)

We would expect with some threshold that
delivery rate on source = publish rate on destination (which is the case)
confirm rate on destination = acknowledge rate on source (which shows
considerable difference)

Does anyone have an idea or suggestion what could be the reason for
this? Is it a bad idea to have load balancer as destination in the
shovel, or should that work fine? Network issue?

I doubt the load balancer is the problem. I think I have a reasonable idea where the problem lies.

The issue is that the shovel does not enforce any form of flow control other than (optionally) using prefetch limiting, which you are not using.

So your source servers are delivering messages into the shovel as fast as they can, and your destination servers are accepting messages as fast as *they* can, but they are ending up being a bit slower. Nothing is creating any back pressure on the source servers, and so messages are queuing up inside the shovel. Since you are using on_confirm ack mode, these show as unacknowledged messages on the source.

Here some more details on our shovel setup:
ack_mode=on_confirm
prefetch_count=0 (default)
reconnect_delay=5

I suspect that if you set prefetch_count to some high-but-not-insane number (exactly how high depends on your message size + rate but I might start the bidding at 1,000) this might solve your problem.

Of course, if your destination servers are actually slower than your source ones, then you might need to do something about that. But turning on prefetch limiting would make the system better-behaved and make it clearer where your issues are.

There might be another issue though - on all released versions of RabbitMQ turning on prefetch limiting reduces performance somewhat. This will get fixed in the next release.

Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal



--

Claire Fautsch

Server Developer
[hidden email]

Goodgame Studios
Theodorstr. 42-90, House 9
22761 Hamburg, Germany
Phone: +49 (0)40 219 880 -0                   

www.goodgamestudios.com                   
                   
Goodgame Studios is a branch of Altigi GmbH 
Altigi GmbH, District court Hamburg, HRB 99869
Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek, Fabian Ritter                  
                   
                   

                        
                         
                        

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

Simon MacMullen-2
On 27/02/14 12:11, Claire Fautsch wrote:
> Probably this will not really solve the situation, as we will see then
> the messages waiting as "Ready" instead of as "Unacknowledged"

Quite possibly. But it will keep memory use in your shovel much more
controlled, which has to be a good thing.

> but on
> the other hand, maybe it avoids the shovel connections to the
> destination brokers to get in a "flow" or even "blocked" state, where
> the publishs are limited. (any opinion on this?)

Well, if the publishing connection is in "flow" state then that just
means that it would like to publish faster, but something (the
downstream queue probably) can't keep up.

Ultimately you probably need more capacity in your destination brokers,
if the source brokers are backing up.

Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

mc717990
We saw similar behavior mostly though related to latency issues between remote sites which actually caused some crashes due to memory and load (we had almost 3 million messages backlogged and shovel went "WTF" when we started it up).  We ended up setting a prefetch of 1500 messages, and these problems went away.  We also ended up using x-consistent-hash to multiplex the shoveling to a remote system, which also helped.  We didn't have a capacity issue so much as a latency issue.  SO I HIGHLY recommend setting a prefetch that's not unlimited if you're doing WAN replication,

Jason


On Thu, Feb 27, 2014 at 6:31 AM, Simon MacMullen <[hidden email]> wrote:
On 27/02/14 12:11, Claire Fautsch wrote:
Probably this will not really solve the situation, as we will see then
the messages waiting as "Ready" instead of as "Unacknowledged"

Quite possibly. But it will keep memory use in your shovel much more controlled, which has to be a good thing.


but on
the other hand, maybe it avoids the shovel connections to the
destination brokers to get in a "flow" or even "blocked" state, where
the publishs are limited. (any opinion on this?)

Well, if the publishing connection is in "flow" state then that just means that it would like to publish faster, but something (the downstream queue probably) can't keep up.

Ultimately you probably need more capacity in your destination brokers, if the source brokers are backing up.


Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



--
Jason McIntosh
https://github.com/jasonmcintosh/
573-424-7612

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

Simon MacMullen-2
On 27/02/14 12:50, Jason McIntosh wrote:
> SO I HIGHLY recommend setting a prefetch that's not unlimited if
> you're doing WAN replication,

So do I. Maybe we should change the default. Hmm.

Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

Claire Fautsch
Great, Thanks for your valuable comments.

We will for sure try this out, and I will provide some feedback on the outcome.

Cheers,
Claire


2014-02-27 14:21 GMT+01:00 Simon MacMullen <[hidden email]>:
On 27/02/14 12:50, Jason McIntosh wrote:
SO I HIGHLY recommend setting a prefetch that's not unlimited if
you're doing WAN replication,

So do I. Maybe we should change the default. Hmm.

Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal



--

Claire Fautsch

Server Developer
[hidden email]

Goodgame Studios
Theodorstr. 42-90, House 9
22761 Hamburg, Germany
Phone: +49 (0)40 219 880 -0                   

www.goodgamestudios.com                   
                   
Goodgame Studios is a branch of Altigi GmbH 
Altigi GmbH, District court Hamburg, HRB 99869
Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek, Fabian Ritter                  
                   
                   

                        
                         
                        

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

Laing, Michael P.
The default used to be 1000. I was surprised that it changed.

ml


On Thu, Feb 27, 2014 at 10:25 AM, Claire Fautsch <[hidden email]> wrote:
Great, Thanks for your valuable comments.

We will for sure try this out, and I will provide some feedback on the outcome.

Cheers,
Claire


2014-02-27 14:21 GMT+01:00 Simon MacMullen <[hidden email]>:

On 27/02/14 12:50, Jason McIntosh wrote:
SO I HIGHLY recommend setting a prefetch that's not unlimited if
you're doing WAN replication,

So do I. Maybe we should change the default. Hmm.

Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal



--

Claire Fautsch

Server Developer
[hidden email]

Goodgame Studios
Theodorstr. 42-90, House 9
22761 Hamburg, Germany
Phone: <a href="tel:%2B49%20%280%2940%20219%20880%20-0" value="+49402198800" target="_blank">+49 (0)40 219 880 -0                   

www.goodgamestudios.com                   
                   
Goodgame Studios is a branch of Altigi GmbH 
Altigi GmbH, District court Hamburg, HRB 99869
Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek, Fabian Ritter                  
                   
                   

                        
                         
                        

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

Simon MacMullen-2
On 27/02/14 16:11, Laing, Michael wrote:
> The default used to be 1000. I was surprised that it changed.

Honestly it didn't, here it is in RabbitMQ 2.0.0:

http://hg.rabbitmq.com/rabbitmq-shovel/file/rabbitmq_v2_0_0/ebin/rabbit_shovel.app

You might be thinking of Federation, which has always defaulted to 1000.

But yes, 0 is not a great default. So I changed it today:

http://hg.rabbitmq.com/rabbitmq-shovel/rev/94df30e8286f

and it will default to 1000 in 3.3.0.

Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

Laing, Michael P.
Ah yes - I must have carried that thought over when we switched from federation to shovels :)


On Thu, Feb 27, 2014 at 11:50 AM, Simon MacMullen <[hidden email]> wrote:
On 27/02/14 16:11, Laing, Michael wrote:
The default used to be 1000. I was surprised that it changed.

Honestly it didn't, here it is in RabbitMQ 2.0.0:

http://hg.rabbitmq.com/rabbitmq-shovel/file/rabbitmq_v2_0_0/ebin/rabbit_shovel.app

You might be thinking of Federation, which has always defaulted to 1000.

But yes, 0 is not a great default. So I changed it today:

http://hg.rabbitmq.com/rabbitmq-shovel/rev/94df30e8286f

and it will default to 1000 in 3.3.0.


Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

Ben Hood
On Thu, Feb 27, 2014 at 5:13 PM, Laing, Michael
<[hidden email]> wrote:
> Ah yes - I must have carried that thought over when we switched from
> federation to shovels :)

What was it about federation that brought you to switch from
federation to shovels?
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

Laing, Michael P.
Our 'wholesale' core topology is like a pancake, with many clusters spread out into regions of the world, currently 3.

The pancake has 3 communication layers: 'input' for swapping inputs for replicated processing, 'output' for distributing processing results, and the 'postoffice' for general communication.

I experimented a lot with federation a year ago and had a few working iterations but found it difficult to create a reliable, maintainable configuration for our use case - federation tries to do so much for you. 

So I turned to shovels for more simplicity and control at the expense of more difficult configuration.

Some of our core clusters support the 'retail' layer of instances that gateway to clients (candles?). We are introducing federation into one of these communication links because we want the propagation of client bindings from the gateway instance to the core - an excellent feature of federation and an important refinement for us.

Initially I had thought that the 'new' federation replaced the 'old' shovel, but this is not true - each tool has its place although their capabilities overlap.

With easier configuration in 3.3, the lowly shovel may get its due!

ml




On Fri, Feb 28, 2014 at 4:45 AM, Ben Hood <[hidden email]> wrote:
On Thu, Feb 27, 2014 at 5:13 PM, Laing, Michael
<[hidden email]> wrote:
> Ah yes - I must have carried that thought over when we switched from
> federation to shovels :)

What was it about federation that brought you to switch from
federation to shovels?
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

Ben Hood
Michael,

On Fri, Feb 28, 2014 at 12:45 PM, Laing, Michael
<[hidden email]> wrote:
> So I turned to shovels for more simplicity and control at the expense of
> more difficult configuration.

Yes, it is quite a low level tool, but I guess sometimes your
requirements are intricate enough to need to reach down to the lower
layer.

> Some of our core clusters support the 'retail' layer of instances that
> gateway to clients (candles?). We are introducing federation into one of
> these communication links because we want the propagation of client bindings
> from the gateway instance to the core - an excellent feature of federation
> and an important refinement for us.

Using federation to implement an AMQP gateway seems like a common
pattern. One wonders why it didn't go into the AMQP spec ....

> Initially I had thought that the 'new' federation replaced the 'old' shovel,
> but this is not true - each tool has its place although their capabilities
> overlap.
>
> With easier configuration in 3.3, the lowly shovel may get its due!

It's interesting to see that the shovel still lives on, despite it
being quite an agricultural component. What sort of message volumes
are you guys processing with this, BTW?

Thanks for being so detailed about your experiences, it's much appreciated.

Cheers,

Ben
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

Laing, Michael P.
Our volumes are quite variable on the shovels, representing a high overall degree of variability in our message volumes.

Just looking over the last 24 hours, shovel volume ranged from 25/sec to 2,500/sec on our Oregon core cluster.

Best,

Michael


On Fri, Feb 28, 2014 at 1:14 PM, Ben Hood <[hidden email]> wrote:
Michael,

On Fri, Feb 28, 2014 at 12:45 PM, Laing, Michael
<[hidden email]> wrote:
> So I turned to shovels for more simplicity and control at the expense of
> more difficult configuration.

Yes, it is quite a low level tool, but I guess sometimes your
requirements are intricate enough to need to reach down to the lower
layer.

> Some of our core clusters support the 'retail' layer of instances that
> gateway to clients (candles?). We are introducing federation into one of
> these communication links because we want the propagation of client bindings
> from the gateway instance to the core - an excellent feature of federation
> and an important refinement for us.

Using federation to implement an AMQP gateway seems like a common
pattern. One wonders why it didn't go into the AMQP spec ....

> Initially I had thought that the 'new' federation replaced the 'old' shovel,
> but this is not true - each tool has its place although their capabilities
> overlap.
>
> With easier configuration in 3.3, the lowly shovel may get its due!

It's interesting to see that the shovel still lives on, despite it
being quite an agricultural component. What sort of message volumes
are you guys processing with this, BTW?

Thanks for being so detailed about your experiences, it's much appreciated.

Cheers,

Ben
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

mc717990
On our systems, we've seen consistent 400/sec on some queues. During a heavy data load roughly 2500/sec per queue (these are short lived usually).  Usually at that point flow control kicks in as our consumers can't quite keep up.  We use x-consistent-hashes to get around network latency and shovel each queue in a hash.  SO publishers publish to a fanout queue with a random routing key, which is bound to an x-consistent-hash exchange bound to 8 queues.  Each of the 8 queues is shoveled independently with a 1500 prefetch.  We've not been able to overload this mechanism easily - the drive IO is typically our limiting factor.  Or the consumers as stated on the remote side.  And that's because we're doing persistent messages, publisher confirms, and a whole lot of checks to make sure we don't ever lose anything.

Jason


On Sat, Mar 1, 2014 at 10:22 AM, Laing, Michael <[hidden email]> wrote:
Our volumes are quite variable on the shovels, representing a high overall degree of variability in our message volumes.

Just looking over the last 24 hours, shovel volume ranged from 25/sec to 2,500/sec on our Oregon core cluster.

Best,

Michael


On Fri, Feb 28, 2014 at 1:14 PM, Ben Hood <[hidden email]> wrote:
Michael,

On Fri, Feb 28, 2014 at 12:45 PM, Laing, Michael
<[hidden email]> wrote:
> So I turned to shovels for more simplicity and control at the expense of
> more difficult configuration.

Yes, it is quite a low level tool, but I guess sometimes your
requirements are intricate enough to need to reach down to the lower
layer.

> Some of our core clusters support the 'retail' layer of instances that
> gateway to clients (candles?). We are introducing federation into one of
> these communication links because we want the propagation of client bindings
> from the gateway instance to the core - an excellent feature of federation
> and an important refinement for us.

Using federation to implement an AMQP gateway seems like a common
pattern. One wonders why it didn't go into the AMQP spec ....

> Initially I had thought that the 'new' federation replaced the 'old' shovel,
> but this is not true - each tool has its place although their capabilities
> overlap.
>
> With easier configuration in 3.3, the lowly shovel may get its due!

It's interesting to see that the shovel still lives on, despite it
being quite an agricultural component. What sort of message volumes
are you guys processing with this, BTW?

Thanks for being so detailed about your experiences, it's much appreciated.

Cheers,

Ben
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss




--
Jason McIntosh
https://github.com/jasonmcintosh/
573-424-7612

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

Laing, Michael P.
Interesting. We don't use persistent messages. In fact the proxy clusters, which stand between our internal clients and the core clusters, explicitly remove persistence in case our clients 'forget'. We rely on replication instead; our persistence requirements are 'outsourced' to a global Cassandra cluster. So no disk IO hence no IO wait - our primo defense against network partitions in AWS/EC2, and a nice performance boost.

Although principles play a part too: idempotency - when in doubt, reconnect/resend/resubscribe; tolerate replicas. And we try to realistically engineer for 5 9's of reliability or more, not 100%, as we can decompose that target into realistic actions/costs.

ml


On Sat, Mar 1, 2014 at 12:21 PM, Jason McIntosh <[hidden email]> wrote:
On our systems, we've seen consistent 400/sec on some queues. During a heavy data load roughly 2500/sec per queue (these are short lived usually).  Usually at that point flow control kicks in as our consumers can't quite keep up.  We use x-consistent-hashes to get around network latency and shovel each queue in a hash.  SO publishers publish to a fanout queue with a random routing key, which is bound to an x-consistent-hash exchange bound to 8 queues.  Each of the 8 queues is shoveled independently with a 1500 prefetch.  We've not been able to overload this mechanism easily - the drive IO is typically our limiting factor.  Or the consumers as stated on the remote side.  And that's because we're doing persistent messages, publisher confirms, and a whole lot of checks to make sure we don't ever lose anything.

Jason


On Sat, Mar 1, 2014 at 10:22 AM, Laing, Michael <[hidden email]> wrote:
Our volumes are quite variable on the shovels, representing a high overall degree of variability in our message volumes.

Just looking over the last 24 hours, shovel volume ranged from 25/sec to 2,500/sec on our Oregon core cluster.

Best,

Michael


On Fri, Feb 28, 2014 at 1:14 PM, Ben Hood <[hidden email]> wrote:
Michael,

On Fri, Feb 28, 2014 at 12:45 PM, Laing, Michael
<[hidden email]> wrote:
> So I turned to shovels for more simplicity and control at the expense of
> more difficult configuration.

Yes, it is quite a low level tool, but I guess sometimes your
requirements are intricate enough to need to reach down to the lower
layer.

> Some of our core clusters support the 'retail' layer of instances that
> gateway to clients (candles?). We are introducing federation into one of
> these communication links because we want the propagation of client bindings
> from the gateway instance to the core - an excellent feature of federation
> and an important refinement for us.

Using federation to implement an AMQP gateway seems like a common
pattern. One wonders why it didn't go into the AMQP spec ....

> Initially I had thought that the 'new' federation replaced the 'old' shovel,
> but this is not true - each tool has its place although their capabilities
> overlap.
>
> With easier configuration in 3.3, the lowly shovel may get its due!

It's interesting to see that the shovel still lives on, despite it
being quite an agricultural component. What sort of message volumes
are you guys processing with this, BTW?

Thanks for being so detailed about your experiences, it's much appreciated.

Cheers,

Ben
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss




--
Jason McIntosh
https://github.com/jasonmcintosh/
<a href="tel:573-424-7612" value="+15734247612" target="_blank">573-424-7612

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

Ben Hood
On Sat, Mar 1, 2014 at 6:00 PM, Laing, Michael
<[hidden email]> wrote:
> We rely on replication
> instead; our persistence requirements are 'outsourced' to a global Cassandra
> cluster.

Interesting. So you're using Rabbit for event notification rather than
reliable(*) transfer of state?

(*) For some value of reliable.
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

Laing, Michael P.
Persistence might increase reliability when you plan to restart nodes and need to regain state. We don't do that.

We have clusters of 3 nodes across independent zones - forming an AWS region - and run with 3 regions, i.e. nine nodes in the core, replicating the processing of important messages across these core clusters. We have several other ancillary clusters for clients, proxies, etc., also in multiple regions.

We target queue lengths of zero and are close most of the time. Anything else stands out like a black spot on a white sheet.

So we never restart nodes that die. Just sync in new ones. Actually we have not yet had any core nodes die in production.

Our Cassandra cluster of 18 nodes, 6 per region, synchronizes globally in less than 1 sec, and persists all of our message traffic in multiple useful inversions. We use a replication factor of 3 per region so every message has nine copies; important ones have many more due to message replication. We push and pull a lot from this cache.

Our instances are ridiculously small and inexpensive to run. We rely on this global, headless, mutually supporting rabbit army for our reliability, paired with a small Cassandra horde.

ml





On Sat, Mar 1, 2014 at 6:21 PM, Ben Hood <[hidden email]> wrote:
On Sat, Mar 1, 2014 at 6:00 PM, Laing, Michael
<[hidden email]> wrote:
> We rely on replication
> instead; our persistence requirements are 'outsourced' to a global Cassandra
> cluster.

Interesting. So you're using Rabbit for event notification rather than
reliable(*) transfer of state?

(*) For some value of reliable.
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss


_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

Simon MacMullen-2
In reply to this post by Ben Hood
On 28/02/2014 6:14PM, Ben Hood wrote:
> Using federation to implement an AMQP gateway seems like a common
> pattern. One wonders why it didn't go into the AMQP spec ....

I dunno, I think federation is really quite specific (I don't think
there are any other brokers which do it in the same way as RabbitMQ) - I
can see the spec authors not wanting to predict how people will want to
federate.

>> Initially I had thought that the 'new' federation replaced the 'old' shovel,
>> but this is not true - each tool has its place although their capabilities
>> overlap.
>>
>> With easier configuration in 3.3, the lowly shovel may get its due!
>
> It's interesting to see that the shovel still lives on, despite it
> being quite an agricultural component.

I think the concept ("I just want to move the damn messages!") makes a
lot of sense.

If I were creating RabbitMQ from scratch, I might rename the shovel to
something like "point-to-point federation" to make it a bit clearer that
it complements federation rather than being replaced by it.

Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Unexaplainable behaviour with shovel plugin.

mc717990
The big thing for us was it was a push vs. pull mechanism.  In a distributed system, where we have a lot of nodes talking to an enterprise, it's much more efficient to have the nodes shovel to the enterprise than to have to have the enterprise have to know about every server connected to it.  

Jason


On Mon, Mar 3, 2014 at 6:26 AM, Simon MacMullen <[hidden email]> wrote:
On 28/02/2014 6:14PM, Ben Hood wrote:
Using federation to implement an AMQP gateway seems like a common
pattern. One wonders why it didn't go into the AMQP spec ....

I dunno, I think federation is really quite specific (I don't think there are any other brokers which do it in the same way as RabbitMQ) - I can see the spec authors not wanting to predict how people will want to federate.


Initially I had thought that the 'new' federation replaced the 'old' shovel,
but this is not true - each tool has its place although their capabilities
overlap.

With easier configuration in 3.3, the lowly shovel may get its due!

It's interesting to see that the shovel still lives on, despite it
being quite an agricultural component.

I think the concept ("I just want to move the damn messages!") makes a lot of sense.

If I were creating RabbitMQ from scratch, I might rename the shovel to something like "point-to-point federation" to make it a bit clearer that it complements federation rather than being replaced by it.


Cheers, Simon

--
Simon MacMullen
RabbitMQ, Pivotal
_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss



--
Jason McIntosh
https://github.com/jasonmcintosh/
573-424-7612

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
12