Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But… this does drop data? Only the start and end timestamp are preserved; the middle ones have no time. How can this be called lossless?

Genuinely lossless compression algorithms like gzip work pretty well.



Exactly my thoughts, the order of these events by timestamp is itself necessary for debugging.

If I want something like per-transaction rollup of events into one log message, I build it and use it explicitly.


Was going to point out the same thing - the original article's solution is losing timestamps and possibly ordering. They also are losing some compressibility by converting to a structured format (JSON). And if they actually include a lot of UUIDs (their diagram is vague on what transaction IDs look like), then good luck - those don't compress very well.

I worked at a magnificent 7 company that compressed a lot of logs; we found that zstd actually did the best all-around job back in 2021 after a lot of testing.


We have a process monitor that basically polls ps output and writes it to JSON. We see ~30:1 compression using zstd on a ZFS dataset that stores these logs.

I laugh every time I see it.


Agreed.

If you used something like sequential IDs (even in some UUID format) it can compress pretty well.


As a member of the UUIDv7 cheering squad let me say 'rah rah'! :D


Which compression level of zstd worked best in terms of the ideal balance between compression ratio vs. run time?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: