There are lots of reasons to want to immediately respond to an external event be...

benlivengood · on July 13, 2021

Long-polling is the way to immediately retrieve events. It's more efficient and lower latency than waiting for a sender to initiate a TCP and TLS handshake.

andrewstuart2 · on July 13, 2021

A persistent connection has a cost. Your statement may be true in some circumstances but definitely not all. Namely, for infrequent events it is much more efficient to be notified than to be asking nonstop. Sure, the latency is lowest if the connection is already established, but for efficiency the answer is not cut and dry but is rather a tradeoff decision based on the expected patterns.

eptcyka · on July 13, 2021

Also, there's the case of ISP's just dropping idle TCP connections. It can also take a while to determine that a TCP connection is broken.

wruza · on July 13, 2021

Long-polling is usually configured to reset at the both sides after a timeout preferred by the client-side (/events?t=30), long before any network effects kick in, e.g. 10-30 seconds. A client then simply spams requests in a loop, backing off only at http errors. If you have some crazy firewall in between, just set “t” appropriately.

runeks · on July 13, 2021

What’s the issue with that? This will be discovered as soon as the endpoint tries to send an event, right? At which point the client will see that the connection has been closed, reconnect, and receive the event.

kelnos · on July 13, 2021

No, the server will try to send an event, and the server will notice the connection has dropped. The client will still have no idea until some sort of timeout is reached, as the client will usually not be sending any data over the connection, as the connection's sole purpose is for the server to send events to the client.

A way to fix this is to use an application-level keepalive (TCP keepalives are generally useless), but then that increases the load on the server and adds a scaling burden.

Meanwhile, unless the event stream is stateful (more overhead!), the client has lost all events since the connection has dropped, and the client can't even be sure when the connection actually dropped.

With webhooks, assuming the callback sending service has a generous retry policy, and the customer's receiving service does not return 200 unless the webhook has been completely processed, or persisted to storage, you won't lose events.

I've been at Twilio for the past 10 years. We recently started offering an event stream service (that customers had been requesting for some time), but it's complicated to get right (on both the server and client side) and difficult to scale, and, frankly, webhooks have worked fine for most customers for a very long time.

rad_gruchalski · on July 13, 2021

> No, the server will try to send an event, and the server will notice the connection has dropped. The client will still have no idea until some sort of timeout is reached, as the client will usually not be sending any data over the connection, as the connection's sole purpose is for the server to send events to the client.

Exactly why mqtt has the ping packet for the client.

fragmede · on July 13, 2021

yeah. a(n improperly configured) firewall is going to start dropping packets if it thinks a connection is idle for too long, so the system never sees an RST and think the connection’s been terminated.

jandrese · on July 13, 2021

Why for the love of God does the firewall not send the RST when it drops the connection?

lmz · on July 14, 2021

Because what usually happens is the connection is just forgotten from the NAT table. Both sides still see it as connected but the middle box will no longer forward any packets.

jandrese · on July 14, 2021

It doesn’t just “fall off” the NAT table. Some process in the firewall chose that entry in the NAT table to drop at that moment. It could use the entries from that NAT table to construct RST packets to both sides of the connection. This should be easy and obvious.

dnautics · on July 13, 2021

Exactly. Perhaps an event happens once or twice-ish a day per customer, and never on the weekends.

nmcfarl · on July 14, 2021

I've got an API where an event happens once a month (+/- 2 weeks) for a large percentage of our customers.

foxhill · on July 14, 2021

> A persistent connection has a cost.

are you sure? specifically, are you sure a persistent connection has _more_ of a cost than repeatedly re-establishing a connection & TLS, etc.?

in terms of energy costs alone, DNS resolution, establishing routes, generating cryptographic session keys, etc. it's definitely not as cheap.

in terms of today's computation power, the "memory" costs of maintaining a connection are minuscule, and the performance "penalties" are negligible.

example: lets say you have 50k event subscribers. if nothing happens, then, aside from a few TCP keepalives (which are not strictly speaking required, and can happen very infrequently), no traffic moves. if instead you have polling once every second, then that's nearly ~13-14 connections a second, each one with at least 4 round trips of traffic. that's a measurable amount of load.

mikepurvis · on July 13, 2021

One nice benefit of long polling is the built in catch-up-after-a-break functionality: When the client initiates the poll, it tells the server the state it knows about (timestamp, sequence number, hash, whatever), and the server either replies right away if it's different, or waits and replies once it's different.

With webhooks, as in the article, you only get state changes; you need some separate mechanism to achieve (or recover) the initial state.

tshaddox · on July 13, 2021

That's true, although it's also true of any `/events` endpoint that doesn't go back to the beginning of time. Stripe's endpoint only goes back 30 days, so you still need to solve for the initial state unless you have launch all of your desired functionality at the very beginning of your Stripe account!

mikepurvis · on July 13, 2021

Hopefully if it's a system like payments where you not only need to know state, you also need to know the time and nature of all transitions, there's a way to query all of that information.

I'm thinking of simpler situations like my source host's CI spinner that seems to get stuck all the time due to missing the ping back from Jenkins about build statuses. In that case it really would be fine to always just say "I think the state is X, please answer me now or in the future whenever the state is other than X." I don't care about anything other than an up to date sync.

hakunin · on July 13, 2021

Someone has to maintain an always-running listener for `/events`. If a server does that, and triggers client calls, we call that webhooks. If a client does that, and triggers internal functions, it's what the op describes. I think that for APIs, `/events` should indeed be the fundamental feature, and "webhooks" should be a nice-to-have service on top of `/events`, for those who don't want to maintain a local subscriber.

sk5t · on July 13, 2021

If the webhook events are coming at some sort of a brisk pace, the sender well may be able to reuse an already-open connection. And if they're rather infrequent, is the efficiency or latency likely to be a significant concern?

lmm · on July 14, 2021

Yes, it is - latency-sensitive but infrequent events are an extremely common use case.

sk5t · on July 14, 2021

In the general sense, yes, but your assertion rings false in my opinion when the situation presents only a choice between webhook or long polling.

lmm · on July 14, 2021

I don't understand your statement? Rare but latency-sensitive events are a very common use case for webhooks or long polling.

IshKebab · on July 13, 2021

If you're using HTTP use websockets or server-sent events, not long polling. Long polling is obsolete.

azinman2 · on July 13, 2021

My understanding is that long polling is the thing that will reliably work at scale. Perhaps this changed in the past few years, but I’ve asked various companies like PubNub why they only use long polling and the answer was that there are too many incompatibilities out there in the wild for anything but that.

IshKebab · on July 14, 2021

Server-Sent Events are very reliable. What you might be thinking of is the fact that you probably shouldn't rely just on server push. But that doesn't mean you should use long polling.

You should use normal short polling and Server-Sent Events.

Also it makes no sense to say long polling is more reliable than SSE, because SSE is essentially a non-hacky implementation of long polling.

gremlinsinc · on July 13, 2021

Websockets can cause issues especially if you're not closing sockets properly, or have too much activity on a small server etc... Livewire for instance accounts for this by just polling every 2 seconds for changes, this is much more performant than keeping 10000 sockets open if people leave open the page/app but don't actually do anything...

Straight long-polling should be avoided, but intermittent polling is a good solution for performance when you don't want to use all your socket bandwidth.