This post has NOT been accepted by the mailing list yet.
In my organization, we are using RabbitMq in a clustering mode with 3 servers in the cluster. There are 2 such clusters. One of the cluster has RabbitMq version 2.8.7 whereas other one has version 3.3.3. The memory utilization of the latter one having version 3.3.3 shoots up indefinitely sometimes. This scenario occurs mainly on one of the Slave(mirror) node first and then after a considerable amount of time, all the nodes become unresponsive. This issue has occurred almost 4-5 times in the last month. After doing considerable analysis, we found that memory utilization of Queue Processes overshoots to a very high mark, whereas the total memory consumed by all the queues is much much lower.
In the last incident, the memory of one of the slave crossed the water-mark limit of 13 GB. On checking 'rabbitmqctl status' we found that the memory used by Queue Processes was 11GB, whereas the total memory used by individual queues was not more than 100 MB. Also, there was not much load on any of the queues. As a remedy, we reduced the load on the queues to almost nil, but still the memory utilization was above the watermark. After a certain amount of time (close to an hour), the memory utilization of the second slave started to grow at a larger pace in-spite of having any load on queues. Also the 'sync-mode' is manual for all the queues as part of the default policy. The moment we ran 'rabbitmqctl stop_app' and then 'rabbitmqctl start_app' on the first slave node, the memory utilization of both the slave nodes came back to normal, without any loss of messages.
Please provide any solution for this issue as this occurring again & again.
PS- I have attached the output of rabbitmqctl report. The status of faulty slave node was -