I encountered a similar issue once. The first indication something was wrong was weird corruption issues across a variety of services in our kubernetes cluster. In particular I focussed in on a service that took gzipped messages from a queue, which was reporting that some messages could not be decompressed.
First I confirmed that I could pull the corrupt message from the queue and it was in fact corrupt - so the problem was not in the consumer (which was throwing the error) or (probably) the queue, but rather the producer which created the compressed message.
On a hunch, I took a corrupted message (about 64KB in total) and wrote a quick program that took each bit of the message and tried the decompress operation with that bit flipped. Sure enough, there was one bit at offset 13000 or so which, if flipped, made the message decompress and at least visually appear intact.
Anyway, it turned out to be a single node with a hardware issue of some kind - rather than diagnose it fully we ended up just replacing the node. Repairing all the corrupted stuff that services on that node sent out was a much bigger concern.
First I confirmed that I could pull the corrupt message from the queue and it was in fact corrupt - so the problem was not in the consumer (which was throwing the error) or (probably) the queue, but rather the producer which created the compressed message.
On a hunch, I took a corrupted message (about 64KB in total) and wrote a quick program that took each bit of the message and tried the decompress operation with that bit flipped. Sure enough, there was one bit at offset 13000 or so which, if flipped, made the message decompress and at least visually appear intact.
Anyway, it turned out to be a single node with a hardware issue of some kind - rather than diagnose it fully we ended up just replacing the node. Repairing all the corrupted stuff that services on that node sent out was a much bigger concern.