Home / Blogs

Turn the Table on Content Filtering

The diagram below depicts two mail transmitters relaying mail on behalf of two users each, and a target MX receiving that mail for four recipients. The difference between the two transmitters is how they deal with content filtering.

Why do we run content filters at the recipient’s side? Paul Graham’s Plan for Spam introduced them that way. After several years, we can say that plan doesn’t work very well. Email has become much less reliable. One way to recover reliability, at least between trusted parties, is to run filters at the sender’s side. Let’s look at the diagram in more detail.

Users are connected through authenticated and possibly encrypted connections, both senders and recipients. Some users are connected through the Internet, some directly to the relevant server.

The first MSA (?) relays according to current SMTP standards. The receiving server runs content filtering, but doesn’t know what to do in case an accepted message turns out to be spam. The amount of mumbo jumbo required for effective spam filtering is high, and may involve delays. If the message is considered spam, most times it will be silently dropped. This is where unreliability stems from.

The second MSA (!) relays according to the VHLO proposed SMTP extension. The sender knows how to handle spam, because it knows any required detail about the authenticated sender. The recipient trusts the sender, not because they have specific arrangements, but because the sender identifies itself, e.g. providing its domain registration reference, and relays for its own users only, at least for the illustrated session.

VHLO, Verified Hello, provides a reliable channel that can be used in parallel with existing EHLO traffic. It employs the authentication, authorization, and vouching techniques that have been developed during the past years, and allows postmasters to manage them, e.g. getting aware of what conditions a receiver MTA requires… But I’m not going to describe protocol’s details, that are being discussed. I want to ask: Are you ready to turn the table on spam?

By Alessandro Vesely, Tiny ISP and freelance programmer

Filed Under

Comments

no Carl Byington  –  Jul 3, 2009 10:27 PM

You will need some other property of that MTA (other than the fact that it issued VHLO rather than EHLO) to trigger the acceptance of mail from it. Otherwise, spammers will just use VHLO. Whatever property you need in addition to that VHLO verb could more easily be done with an EHLO extension. In particular, the AUTH extension seems to cover almost all of this case.

If that MTA is “trusted” and has a static ip address, then you could simply use a local whitelist on the receiver to get the same effect.

“The receiving server runs content filtering, but doesn’t know what to do in case an accepted message turns out to be spam”. In that case, the receiving server is broken. You should *almost never* actually accept (2xx response to DATA) a message and then make some spam/ham decision about it.  Modern systems make that decision *before* returning the status code to the DATA command.

Not quite The Famous Brett Watson  –  Jul 4, 2009 7:47 AM

You will need some other property of that MTA (other than the fact that it issued VHLO rather than EHLO) to trigger the acceptance of mail from it.
VHLO provides a means for the sending system to assert association with a particular domain name, providing information as to how the recipient may verify this assertion (through SPF records, PTR records, etc). A message is accepted without further filtering only if the domain identity in question is both verifiable and whitelisted (either locally, or by the recommendation of a trusted third party).
In particular, the AUTH extension seems to cover almost all of this case.
The AUTH extension is for authentication by prior arrangement. VHLO offers a weaker kind of authentication at the level of the domain name which does not require explicit prior arrangement or PKI certification.
If that MTA is "trusted" and has a static ip address, then you could simply use a local whitelist on the receiver to get the same effect.
True, particularly if third-party whitelists allow reference by IP address instead of domain name. The advantage of using a domain name identity is that it decouples identity from network addressing, which simplifies management: it means you don't run into trouble every time you fiddle with your outgoing pool of mail hosts. Whether or not this benefit is worth the effort of developing and implementing VHLO is open to question, of course.
You should *almost never* actually accept (2xx response to DATA) a message and then make some spam/ham decision about it. Modern systems make that decision *before* returning the status code to the DATA command.
That's an ideological stance. I used to agree with it. I'll agree that accepting and then bouncing a message is a bad idea (unless you can somehow verify that the owner of the bounce address has authorised the bounce -- a very special case indeed). As for classification during SMTP being "modern", however, Gmail begs to differ. I've done a more detailed analysis of spam classification and acceptance/rejection strategies in my PhD thesis (due for submission this month), and there are other points to be made in favour of accepting all mail (with post-classification of some kind). If there's any interest, I may publish an extract of that work as an article here. (Use CircleID's "send message" facility to express interest or see contact details here.)

new esmtp extension easier than new smtp verb Carl Byington  –  Jul 4, 2009 4:45 PM

“The AUTH extension is for authentication by prior arrangement.”

My point, poorly expressed, was that it will be *far* easier to gain widespread acceptance of an ESMTP/EHLO extension than it will be to gain widespread acceptance of another SMTP verb. We already have ESMTP which allows the receiver to advertise extensions that it understands.  Any data that you might want to pass with VHLO could be passed with something like “MAIL FROM:

VHLO=arguments”.

“and there are other points to be made in favour of accepting all mail (with post-classification of some kind)”

The work of classification still needs to be done at some point. If you do that work during the SMTP session, you will either accept and deliver the message, or the SMTP client will get an unambiguous rejection indication, so the sender will know that their message was not delivered, and they can arrange alternative means of communication. *Any* system that does that classification work after accepting the message *will* be unreliable unless the classification mechanism is perfect. When it makes a mistake, and classifies a wanted message as spam, then the sender will think it has been delivered, and the receiver won’t see it. This is the entire reliability point that the OP was claiming that VHLO partially solves. But it can never be solved with post-acceptance spam classification schemes.

You might argue that it is difficult to arrange to do that work in-line during the SMTP transaction, but that would be incorrect. The sendmail milter mechanism allows one to run arbitrary code of your choice during the SMTP transaction. If the MTA of your choice does not allow that, then perhaps you need a more flexible MTA. My point is that *any* computation that you could do post-acceptance can also be done before you send the answer to the DATA command. Why not be polite and let the SMTP client know the results of that computation?

Again, not quite The Famous Brett Watson  –  Jul 5, 2009 8:33 AM

My point, poorly expressed, was that it will be *far* easier to gain widespread acceptance of an ESMTP/EHLO extension than it will be to gain widespread acceptance of another SMTP verb.
I have reservations about the design of VHLO, but this particular objection strikes me as odd. An SMTP extension is an SMTP extension: the question of whether the extension is best expressed as a new verb or as parameters to an existing verb is driven by the semantics of the extension. In this case, the identity being conveyed is potentially relevant to the entire SMTP session, not a single mail transaction within the session, so a new verb seems more appropriate than a parameter to MAIL. On what basis do you say it is so much easier to deploy parameter-based extensions than verb-based extensions?
The work of classification still needs to be done at some point. If you do that work during the SMTP session, you will either accept and deliver the message, or the SMTP client will get an unambiguous rejection indication, so the sender will know that their message was not delivered, and they can arrange alternative means of communication.
This is a bad thing when the sender is a spammer. I don't know what the current trend in spammer ratware is -- at one point it was notorious for ignoring the final post-DATA response code, and if it does, then post-DATA rejects are a good idea (so long as the software used by your preferred senders isn't similarly ratty). If spammers pay attention to such response codes, however, it gives them data on their delivery rates and whether changes to their text are improving delivery rates or not. Ultimately, if you tell a spammer you've rejected his mail, it's just an invitation for him to try, try again. The ideal way to treat spam is to accept and drop it. Explicit rejection is a better strategy in cases where the classification is less certain, however.
This is the entire reliability point that the OP was claiming that VHLO partially solves. But it can never be solved with post-acceptance spam classification schemes.
Given that the point of VHLO is to facilitate whitelisting, thereby eliminating some receiver-side classification efforts entirely, I'm not sure I see the substance of your complaint here. Perhaps the point you wanted to make was, "mail wouldn't get lost in the first place if recipients rejected spam in-line." Overlooking the issues I've already raised with explicit rejection, VHLO can still contribute to the mail process by facilitating whitelisting, thereby reducing server load and the incidence of false positives.
You might argue that it is difficult to arrange to do that work in-line during the SMTP transaction, but that would be incorrect.
It is not technically difficult, but it results in greater resource requirements than the alternative. In-line checking increases the per-session resource burden and reduces your maximum SMTP concurrency accordingly. If you take the checking out of line, then it need only keep pace with your average rate of message arrival, not your peak rate. How much difference this makes depends on message arrival patterns, of course.

Yet another FUSSP Suresh Ramasubramanian  –  Jul 5, 2009 3:27 AM

FUSSP = http://www.rhyolite.com/anti-spam/you-might-be.html

As for running filters at the sending MTA - outbound filtering is widely recommended as a best practice.

However, most spam out there is sent through compromised hosts, botnets etc - that may not even be “regular” smtp servers at all. Or they might relay through the smtp server of their broadband provider etc - which is where the outbound filtering best practice comes in.

Then - if the entire world was following best practices we wouldnt have the sort of spam volumes that we’re seeing. So .. you’re left with inbound filtering.

You dont need VHLO, fancy maps, wild theories from Paul Graham etc (and by the way bayes is not the cureall - doesnt even scale all that well across a distributed mail flow, as I and a few other people had the “pleasure” of telling him back in late 2003 when he was a guest speaker at a SciAm meet the expert type session, at some pub or the other near the MIT campus)

Based on what I see here - I would personally never deploy VHLO with a bargepole.

Spam filtering to clean up a forwarded mail stream is not a bug, its a feature.

——
  This memo defines an extension to the SMTP service that provides
  protocol support for weak authentication of SMTP clients.  Weakly
  authenticated clients enjoy an intermediate level of trust: they have
  no relying privileges, but can attempt to deliver mail to local
  users, are whitelisted from some filters, and may receive DSNs as
  needed.

  Note that this treatment is what SMTP recommends for all clients.
  However, most servers operate filters to limit spam, thereby
  affecting the reliability of the mail forwarding system.  Verified-
  Hello recovers that reliability by providing for uncensored mail
  transmission in a framework where authenticated domains are
  responsible for the messages they send.  In addition, support is
  provided for an extensible set of authentication mechanisms, so that
  they can be managed and branded.
——-

Just SSP Alessandro Vesely  –  Jul 5, 2009 8:45 AM

FUSSP = http://www.rhyolite.com/anti-spam/you-might-be.html
I'd have appreciated if you mentioned which item in Vernon's list would best identify vhlo as a FUSSP. It's not. Actually, it is neither final nor ultimate. To wit: its list of authentication/reputation mechanisms is extensible (not final), and its informative references point to further work to do (not ultimate). It is a possible solution to the spam problem, though. The current state of affairs is such that, just like it's not possible to agree on a definition of spam, it is also not possible to find a solution for it. The anti-anti-spam paradigm enforces preemptive rebuttal of any solution proposed. Thus, it turns out we need to find a solution to the anti-spam problem before we can proceed. I knew that, that's why I asked for readiness. Thank you for your answer.
As for running filters at the sending MTA - outbound filtering is widely recommended as a best practice.
So is the submit protocol. However, the advantages of running them are confined within the environment where they operate. An outbound MTA wishing to distinguish itself from unreliable transmitters has no verb in SMTP to express that it has authenticated the sender, scanned the content, found all fine, and takes accountability for what it is about to transmit.
Then - if the entire world was following best practices we wouldnt have the sort of spam volumes that we're seeing. So .. you're left with inbound filtering.
Vhlo does not require cooperation of the whole world. Although it aims at recovering the reliability that mail had in the 90s, it only operates in transmissions between trusted parties. The relevant alternative to inbound filtering are whitelists. For example, an office having much activity with a given company, may whitelist their outbound MTAs. By the same argument, one who trusts the outbound filtering operated by Gmail may wish to whitelist them. However, whitelisting can be hardly afforded by medium/small organizations because it requires heavy maintenance: by the time one has built a significant list of hosts, it has to be rebuilt. In facts, most mail domains do not publish the IP addresses or FQDNs of their outbound MTAs in such a way that they can be retrieved automatically. Of course, as Brett noted, it is difficult to reckon if whitelisting by domain is relevant enough to deserve standardization. The authentication/reputation mechanisms considered in the current draft are flexible enough to allow for fine grained definitions of the set of domains one wishes to whitelist, including the explicit definition of a list of vouched mail domains. Those mechanisms are standardized already. The adjective weak used to characterize them is meant in a precise technical sense, similar to other acceptations it has in computing and science, not in the generic sense of something that doesn't work well.
Spam filtering to clean up a forwarded mail stream is not a bug, its a feature.
Since clean up means dropping messages at random, or according to non-specified and seldom predictable ways, that feature heavily affects reliability.

Did you read the FUSSP doc? Suresh Ramasubramanian  –  Jul 5, 2009 3:18 PM

The ones about senior-ietf-member-* and programmer-*?

Vhlo does not require cooperation of the Michael Hammer  –  Jul 9, 2009 1:40 PM

Vhlo does not require cooperation of the whole world. Although it aims at recovering the reliability that mail had in the 90s, it only operates in transmissions between trusted parties.
There is no such thing as a trusted party unless you are willing to accept the consequences of misplaced trust. Trust me, I know. What happens to your proposed system when the sending MTA is subverted and stops filtering? Of course, that would never happen. Where Dave speaks of trust models I start from a position of distrust. Must be the anti-abuse background in me coming out. It really is quite simple. Emitting hosts which cause problems (why get hung up on the word "SPAM"?) will find remote MTAs unwilling to accept mail from them or throttling throughput. Dynamic ranges of IPs increasingly are blocked from the start. Before throwing another acronym into the hopper I would prefer to see the impact of wider deployment (both sending and receiving) of DKIM/ADSP as well as receiver side respect of strong (-all) SPF assertions. I will grant that these approaches do not directly attack SPAM but do make it easier to sort the wheat from the chaff with regard to good actors from bad actors based on authentication.

Suresh is right Dave Crocker  –  Jul 5, 2009 5:41 PM

What is really difficult about proposals new anti-spam techniques is not the technique but the global modelsthey presume.  I think the requirement should be for folks proposing things to present their model in terms of trust, administration and operation in a way that uses no technical references.  Tell us how folks (users, operators) would interact and why anyone should believe that this will really happen and what the possible abuse are still possible.  What are the assumptions about motives and participation effort? 

If the model sounds plausible in terms of that trust and operations overhead, then it might be worth considering the technical details for instantiating it.

/d

Comment Title:

  Notify me of follow-up comments

We encourage you to post comments and engage in discussions that advance this post through relevant opinion, anecdotes, links and data. If you see a comment that you believe is irrelevant or inappropriate, you can report it using the link at the end of each comment. Views expressed in the comments do not represent those of CircleID. For more information on our comment policy, see Codes of Conduct.

CircleID Newsletter The Weekly Wrap

More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.

I make a point of reading CircleID. There is no getting around the utility of knowing what thoughtful people are thinking and saying about our industry.

VINTON CERF
Co-designer of the TCP/IP Protocols & the Architecture of the Internet

Related

Topics

DNS

Sponsored byDNIB.com

Domain Names

Sponsored byVerisign

Cybersecurity

Sponsored byVerisign

Threat Intelligence

Sponsored byWhoisXML API

Brand Protection

Sponsored byCSC

IPv4 Markets

Sponsored byIPv4.Global

New TLDs

Sponsored byRadix