It seems Gmail does globally filter some campaigns if enough users (or enough of...

mmt · on July 16, 2018

Even the success of such an experiment as you describe (which I believe I've previously run, unsucessfully, i.e. it took more than one mark-as-spam of a popular, otherwise-legit sender to get it reliably in my Spam folder) wouldn't convince me that there exist any such local, per-account or even per-domain (unless I'm paying for the Postini service) filters.

It would merely be evidence that there's a datum of "spam: From:LinkedIn To:mmt" going into the global filter database and that matching both From and To is sufficiently strong evidence to that global filter.

mcbits · on July 16, 2018

Huh? "To:mmt" would be a local filter (or a localized signal in the filter - implementation detail). At least, hopefully nobody else is receiving email addressed to you.

If you start marking someone's emails as spam and their emails start automatically landing in your spambox but not everyone else's, then obviously they do have localized filtering. They almost certainly do because some people actually like to receive "newsletters" and junk mail to clip the coupons or whatever.

mmt · on July 16, 2018

> "To:mmt" would be a local filter (or a localized signal in the filter - implementation detail)

You imply that this distinction doesn't matter, but I assert that it matters very much.

The difference is that a local filter can reasonably be expected to exclude any remote (to my domain, for example) "localized signal" when making a decision, whereas to a global filter, there's no such thing as "remote".

Looked at another way, imagine mmt.example.com with 10 users and mcbits.example.com with 800 users are both served by gmail (and we're the first two early adopters). Then imagine none of my users ever mark From:LinkedIn mail as spam (and maybe even mark it as ham in some way), whereas 25% of your users mark From:LinkedIn as spam and the rest merely ignore it (never marking it as ham, though perhaps reading it, mark-as-read-ing it and/or labeling/archiving it either manually or with a filter).

With two local filters, I would expect my users never to have to search their Spam folder for message from LinkedIn. With a global filter, I would expect the relatively small quantity of relatively weak data indicating From:LinkeIn is ham (which happens to be associated with "localized signal" of instances of "From:@mmt.example.com") to be overwhelmed by more numerous strong signals from your users that "From:LinkedIn" is spam and for my users to have to check the Spam folder for messages from LinkedIn at some point.

The latter is what I have observed actually happens.

mcbits · on July 16, 2018

There's little doubt that they have global filtering informed by user flagging. The question is whether they also have localized filtering for items where users may disagree on what's spam. If so, I would expect email from LinkedIn to be far more likely to be automatically marked spam for those 25% of users who have previously marked those emails as spam. That seems to be what happens, at least from what I've heard. (Gmail is essentially my spambox already, so I don't bother to flag individual items.)

That doesn't preclude the global filters from also occasionally binning some of LinkedIn's email when the global spam score is so high that the local ham score fails to override it. And the same thing could happen whether they implement it as a hierarchy of filters or just one filter with localized signals.

mmt · on July 17, 2018

I'm still failing to understand the distinction you're trying to make with a local filter, if it exists only in addition to, and is generally overridden by the global filter.

> I would expect email from LinkedIn to be far more likely to be automatically marked spam for those 25% of users who have previously marked those emails as spam.

This is true regardless of whether or not the filter is global or local, though. That 25% provides the majority behavior for both a local filter and a global filter.

> That doesn't preclude the global filters from also occasionally binning some of LinkedIn's email when the global spam score is so high that the local ham score fails to override it.

It does preclude it, because with a truly local filter, those people are a majority (unanimous, even). Even occasional miscategorization as spam would be completely unexpected. It's only when the filter is global do they end up subject to tyranny-of-the-majority behavior.