Home / Blogs

Email in the World's Languages - Part III

John Levine

In our last installments (Part I / II) we discussed the various ways to encode non-ASCII character sets, of which UTF-8 is the winner, and some complex approaches that tried to make UTF-8 mail backward compatible with ASCII mail. After years of experiments, the perhaps surprising consensus is that if you're going to do international mail, you just do it.

The revisions to RFC 5335 and 5336 currently nearing completion in the IETF's EAI working group remove most of the complexity from the existing design, and simply define a new international mail stream with UTF-8 text in mail headers, and UTF-8 addresses in the SMTP session. (MIME and the 8BITMIME ESMTP option already handle UTF-8 message bodies.) A mail server announces that it can handle internationalized mail by announcing an ESMTP option, which currently has the ugly placeholder name UTF8SMTPbis, but will presumably be changed to something snappier by the time the revised RFC is issued. I'll call it EAI here.

A mail client, if it sees that a server announces EAI capability, can put the EAI flag on the MAIL FROM command in the SMTP session, to tell the server that this message is an internationalized one, and then sends the message. Any mail server that supports EAI also supports 8BITMIME, so the message can contain any UTF-8 text without special coding. The RFC also defines EAI options for the SMTP commands EXPN and VRFY, which accept and return e-mail addresses, and some new extended return codes for status messages. If a client doesn't use the EAI option, it sends legacy 2821/2822 mail, same as always. The option is set or not for each message, so each message moving through the mail system is an EAI message or a legacy message.

The RFC for internationalized mail bodies allows UTF-8 nearly anywhere that ASCII can occur, other than in a few places intended to be parsed by computers, such as Message IDs and time zones in date stamps. It defines a new MIME body type, message/global, for attached or included internationalized mail messages. And that's about it. All of the down-conversion stuff to turn EAI messages into RFC 2821/2822 compatible mail is gone.

As the diagram above illustrates, this defines a new EAI (or whatever we call it) mail stream that is parallel to the existing legacy SMTP stream. If a message has UTF-8 headers, or a UTF-8 envelope address, it has to go in the EAI stream. If it doesn't it can go in either the legacy stream or the EAI stream.

The solid diagonal arrow shows that a legacy message can always move into the EAI stream, since the spec for the latter is a superset of the spec for the former. The dotted arrow shows that an EAI message can sometimes go into the legacy stream, if an MTA looks through the message and notices that it has an all ASCII body and an all ASCII envelope, something I've nicknamed Deep Message Inspection. (It's not required to do this, just allowed.)

If a client wants to send an EAI message to a server that doesn't handle EAI, too bad, there's no standardized way to downgrade.

This doesn't mean that mail systems are forbidden to downgrade messages, just that the IETF isn't trying to define a standard way to do so. It's easy to imagine useful special cases. For example, if an MSA (the program that accepts mail from user mail programs) is configured with backup legacy addresses for its EAI users, and it noticed EAI mail from one of its users to a legacy address, it could rewrite the message's headers to replace the user's EAI address with her legacy address and MIME encode the UTF-8 header text, turning the message headers into ASCII, and send it along as a legacy message. That sort of thing may turn out to be fairly popular, but at this point, it's not well enough understood to standardize.

There are mail communities where all the users read and write languages written in non-ASCII scripts, so it may turn out that everyone's mail supports EAI, and legacy compatibility doesn't matter. Or they may treat legacy mail as a special case, perhaps by using a different user mail program.

The EAI work has defined the core of a fully internationalized mail system, allowing users to send mail using their preferred language, encoded as UTF-8, in essentially all parts of e-mail messages. The EAI spec is conceptually straightforward, and the changes required for adequate if not superb support in MTAs are fairly simple. (Deep Message Inspection is harder, but it's optional.) The details of migrating from legacy mail to EAI mail are a work in progress, as are the details of how EAI users will send mail to legacy mail users, but the imminent completion of the EAI work is a big forward step for mail in all the world's languages.

By John Levine, Author, Consultant & Speaker. More blog posts from John Levine can also be read here.

Related topics: DNS, Email, Multilinguism

WEEKLY WRAP — Get CircleID's Weekly Summary Report by Email:

Comments

When? Avtal Meren  –  Jul 12, 2011 11:04 PM PDT

John,

Thank you for this interesting and informative series.  It led to quite a discussion on IDNForums, by the way.

Two questions:

1) When do you think this will start rolling out?

2) What do you think of Afilias's IDN email scheme?

Thanks,

Avtal

1) in China and Japan, very soon. John Levine  –  Jul 13, 2011 4:10 AM PDT

1) in China and Japan, very soon.  In western Europe and North America, hard to say

2) It's hard to tell from what's on their web site.  It looks like punycoding the mailbox, which as I noted has technical problems in the general case.

To post comments, please login or create an account.

Related Blogs

Related News

Topics

Industry Updates – Sponsored Posts

Independent Endorsement of Dot Chinese Online & Dot Chinese Website by by FiarWinds Partners

Nominum Announces Future Ready DNS

Non-English "IDN Email" Addresses Are Finally Working!

TLD Registry to Speak at Inaugural World Domain Day India

Independent Endorsement of Dot Chinese Online & Dot Chinese Website

.WANG General Availability Opens on June 30, 2014

TLD Registry Sponsored Xinnet's Partner Conference in Nanjing

.WANG Enters Landrush This Week

Public Interest Registry Offers New Internationalized Domain Names to General Public

Dyn Acquires Internet Intelligence Company, Renesys

Introducing getdns: a Modern, Extensible, Open Source API for the DNS

New gTLD .WANG Launched - Here Is Why "Wang" Is Both "King" and "Net" to the Chinese

Public Interest Registry Announces Sunrise Period for New Internationalized Domain Names

New Chinese "Mobile" Top-Level Domain Now Available

Why We Decided to Stop Offering Free Accounts

The Future of Chinese Domain Names (a Panel Discussion)

Tony Kirsch Announced As Head of Global Consulting of ARI Registry Services

24 Million Home Routers Expose ISPs to Massive DNS-Based DDoS Attacks

Dyn Acquires Managed DNS Provider Nettica

Afilias Welcomes "Dot Chinese Online" and "Dot Chinese Website" Top-Level Domains to the Internet

Sponsored Topics