I wouldn't say it's a 'nightmare'. It's just more complicated than what regular folk think computers work when it comes to time sync. There's nothing nightmareish or scary about this, it's just using the best solution for your scenario, understanding limitations and adjusting expectations/requirements accordingly, perhaps relaxing consistency requirements.
I worked on the NTP infra for a very large organization some time ago and the starriest thing I found was just how bad some of the clocks were on 'commodity hardware' but this just added a new parameter for triaging hardware for manufacturer replacement.
This is an ok article but it's just so very superficial. It goes too wide for such a deep subject matter.
Maybe. But I remember one game developer told that they face even a more challenging problem, which is the synchronization between players in multiplayer real-time games. Just imagine different users having significantly different network latencies in a multiplayer shooter where a couple milliseconds can be decisive. Someone makes a headshot when the game state is already outdated. If you think about this you can appreciate how it's complicated just to make the gameplay at least not awful...
I took to distributed systems like a duck to water. It was only much later that I figured out that while there are things I can figure out in one minute that took other people five, there were a lot of others that you will have to walk them through step by step or they would never get there. That really explained some interactions I’d had when I was younger.
In particular I don’t think the intuitions necessary to do distributed computing well would come to someone who snoozed through physics, who never took intro to computer engineering.
> I don’t think the intuitions necessary to do distributed computing well would come to someone who snoozed through physics
Yeah. I was a physics major and it really helped to have had my naive assumptions about time and clocks completely demolished early on by taking classes in special and general relatively. When I eventually found my way into tech a lot of distributed systems concepts that are difficult to other people (clock sync, indeterminate ordering of events, consensus) came quite naturally because of all that early training.
I think it's no accident that distributed systems theory guru Leslie Lamport had written an unpublished book on General Relativity before he wrote the famous Time, Clocks and the Ordering of Events in a Distributed System paper and the Paxos paper. In the former in particular the analogy to special relatively is quite plain to see.
Sometimes hardware that has PTP support in the specs doesn't perform very well though, so if you do things at scale, being able to validate things like switches and network card drivers is useful too!
It's to the point timing server vendors I've spoken to have their own test labs where they have to validate network gear and then publish lists of recommended and tested configurations.
Even some older cards where you'd think the PTP issues would be solved still have weird driver quirks in Linux!
I worked on the NTP infra for a very large organization some time ago and the starriest thing I found was just how bad some of the clocks were on 'commodity hardware' but this just added a new parameter for triaging hardware for manufacturer replacement.
This is an ok article but it's just so very superficial. It goes too wide for such a deep subject matter.