The term Email Deliverability is used to describe how well a mail flow can reach its intended recipients. This has become a cornerstone concept when discussing quality metrics in the email industry and as such, it is important to understand how to measure it.
Email Deliverability is considered to be affected by a mythical metric, the reputation of the sender, which is a measure of that sender behavior over time — and the reactions of the recipients to his messages.
As many industry veterans, over the years I've crafted a working definition that I use for this metric, which in my experience work very well as a performance predictor for the well-behaved mailing operators that I've had the pleasure to work with — and that has spared me of dealing with the other type of mailing operator. I'll call this metric a pseudo-reputation.
A Working Model
Nowadays email is an instant communication medium in the sense that messages can be delivered to the recipient's mailbox almost instantly most of the time. However, not everybody is waiting by the mailbox to read every new message as it arrives — although the mobile penetration is changing this.
Based on the demographics of the specific mailing list, messages will be read and acted upon within a time window that typically starts with a peak near the actual delivery and has a long tail. Chances are that a large proportion of the messages will be seen or acted upon within the first business day after delivery, but it's wise to allow a little extra time.
Then there's the engagement: The idea that senders need to reach frequently to their audiences to maintain their attention. Many senders resort to daily communications, so as to generate a repeated pattern of communication.
Different mail senders try to reconcile these facts as part of their very own secret sauce. In my case, I've found a mechanism that I called "n-day Window" for lack of a better name. The n-day Window allows the calculation of the pseudo-reputation in an easy way. The idea is to look at the rolling averages in a n-day time window over a comparatively long period to identify peaks and changes in the metrics.
The n in n-day is chosen based in the time it takes for a sizable percentage of the recipients to receive their messages — and ignoring comparatively long periods with no sending activity.
So, using this model with a 3-day window, a spreadsheet containing the total number of messages sent, the soft and hard bounces and the complaints received broken down per day can be easily put together. Then, the last 3-day totals would be compared producing an average of data spread over time.
While it's reasonably simple to have data for the hard and soft bounces as well as for the feedback loop data — a.k.a. complaints — the model might not benefit from this directly. In many scenarios, it's valid to simply sum the number of complaints and the hard bounces, ignoring the soft bounces. After all, well maintained lists typically show small proportions of soft bounces.
This technique will produce a simple daily index that can easily be used to track the pseudo-reputation over time, correlating it with other metrics such as open rates, clicks, etc. This will allow the sender to fine tune the model, find better values of n and so on.
Thresholds and Goals
A typical question is what is the limit on the psedo-reputation value. And the answer is not easy. In general, senders are targeting recipients over many ISPs, each with different policies, response times, etc.
The typical sender probably has a few large ISPs that concentrate a large proportion of the recipients and naturally the tools have been tuned to work well in those cases: Bounces from the big ISPs are parsed and understood correctly, feedback loops are in place and are analyzed, etc.
But chances are that problems are not restricted to a single ISP. A mailing list that becomes uninteresting for its audience, will cause complaints and reactions all over the place, not only among the customers of a given ISP. However chances are that due to the status of the tools or differences in ISP policies — think of this as the ISP response time — the sender will get the feedback through one of the ISPs first, while the others will likely remain normal.
Because of this, tracking pseudo-reputation per-ISP is a good practice, provided that actions are triggered by the worst metric in the bunch.
The specific maximum acceptable value for the pseudo-reputation is harder to assess. Abrupt responses such as blocking a sender are not that frequent for well-behaved senders. Chances are that changes will happen over a longer period of time. And changes will happen at different speeds.
The industry as a whole seems to have converged in a the target for bounces or complaints around 0.1% (1 bounce or complaint per 1,000 messages sent). But in practice, problems start way earlier.
My observations suggest that when tracking pseudo-reputation with the model discussed above, effects can be measured with as little as 0.01% (1 bounce or complaint per 10,000 messages sent). Of course not all cases are alike — but long streaks of pseudo-reputation levels above this 0.01% figure often predict the onset of deliverability issues.
Simply put, there's no well-defined line in the sand. Instead, there's a wide gap. The higher the pseudo-reputation value, the faster your delivery probability decreases.
By Luis Muñoz
Related topics: Email
|Cybersquatting||Policy & Regulation|
|DNS Security||Registry Services|
|IP Addressing||White Space|
Neustar DNS Services
Neustar DDoS Protection
Minds + Machines