Digging Into IPv6 Traffic to Google: Is 28% Deployment Really the Limit?

By Christofer Flinta
Christofer Flinta

After some years of accelerating IPv6 deployment, we are now into a period of slower growth and it's not clear where we are heading. It is therefore interesting to try to predict the future of IPv6 over the coming years. At Ericsson Research, we have been working on this topic since 2013, but just recently created a forecast model that seems to be quite accurate. However, it gives a disappointing message of a very low final level of IPv6 deployment at less than 30%!

The model is based on the commonly used data set, "percentage of users that access Google over IPv6", provided by Google, from which we use the time series for native IPv6 traffic. We assume this data set can be used as an approximative indicator of global IPv6 deployment, even though some countries, like China, are not properly represented, due to national regulations. Figure 1 shows a recent snapshot of the Google data set from 2008 to 2019.

Figure 1. Percentage of users that access Google over native IPv6.

The data set is quite noisy, and we also want to avoid impact from the well-known intra-week periodicity and from variations at the start and end of the months. Therefore, we sample the data monthly, using an average of two weeks in the middle of each month.

We then create a growth model for the sampled data based on logistic growth. This type of model is common when describing the evolution of new technology, where there is an accelerating phase in the beginning and a decelerating phase at the end, forming an S-shaped curve that approaches a maximum level over time. The results from the model is shown in figure 2, where the predicted values are shown in red, and the real data is shown in blue. The last data point from May 2019 is at 24.5%.

Figure 2. Forecast of the percentage of users that access Google over native IPv6. The red curve indicates the predicted values, while the blue curve shows monthly sampled data.

We can see in the figure that the predicted S-curve fits the data set quite well. Currently, the model predicts a surprisingly low final level of IPv6 deployment at only around 28%. According to the forecast, the IPv6 share will grow slower over the coming years and be close to this estimated end level in late 2022.

The predicted curve can be interpreted as a single step of growth, going from zero to 28% over a 15-year period. This is a bit unexpected since there has been a lot of hope that IPv6 would replace IPv4 quite soon and then 100% would be the obvious asymptotic end level. If our model is correct, IPv6 will not replace IPv4 - or even be the dominant network protocol - in the foreseeable future!

Model history

Considering the strange forecast, how much can we trust this model? We don't know, but the model has evolved in our lab and each time the fit of predictions to real data have become better. Let's look at the model history.

The first model was created back in 2013 when IPv6 deployment had been growing with accelerating rate for some years. It was by then natural to create a model based on simple exponential growth. For a year, the predictions were quite accurate, but then the predicted and the real data started to deviate, so this model had to be abandoned. Also, from a theoretical point of view, exponential growth is not sustainable in the long run.

The next model was based on logistic growth, which is a commonly used model for all types of growth where there is an upper limit. In our first attempt, we expected the limit to be 100% and used that value as a fixed parameter in our prediction model.

However, the predictions from this model didn't make a perfect fit either — the real data tended to oscillate around the predicted curve. As a fix, we added a sine-wave oscillator around the logistic-growth curve, estimated from the sinusoidal difference between the growth model and the data. The idea was that if there is some feedback mechanism in the market that creates an oscillation, the model should be able to catch it. Both models are shown in figure 3, where the pure logistic growth model is shown in green and the model with an added oscillator is shown in yellow.

Figure 3. Different growth models for the percentage of users that access Google over native IPv6. Green and yellow curves show a logistic model with a final level of 100%, without and with an added oscillation. Red curve shows a logistic model with an estimated end level of around 28%. Blue curve shows monthly data.

This oscillating model seemed to work for some years, but at the beginning of 2018, the forecasts again started to deviate too much from real data, so also this model had to be abandoned. We had assumed growth in one big step from zero to 100%, but apparently, this is not a correct assumption. Furthermore, the sine-wave correction was just an ad-hoc fix, not based on any specific market mechanism. From figure 3 it is obvious that a model based on a single logistic step up to 100%, with or without sine wave corrections, is not compatible with the current growth trajectory of real IPv6 data, shown as a blue curve.

In 2018 we, therefore, decided to skip the idea with a fixed terminal level of 100%, but instead tried to fit the data to a pure logistic growth curve with a terminal level not known in advance. The end level is thus estimated from the data set. As we can see from the red curve in figure 3, the new model gives quite a good fit to historical data without any need for corrections — in fact, the previous oscillations can be fully captured by this logistic model having the end level at around 28% instead of at 100%.

Model stability

Our new model seems to be quite stable over time. Experiments were performed to see how long a training period is needed to get good predictions. For all experiments, the time series is split up in one training period and one test period, both with varying lengths. The training period always starts in September 2008 but ends in different months, while the test period is the remaining part of the data set.

It turns out that, for all experiments having the training period ending in any of the last 13 months (March 2018 or later), the predicted end levels are confined within a very small interval between 27.5% and 28.5%. Even shorter training periods give similar results, but with a larger spread — for training periods ending in any of the last 24 months (May 2017 and later), we get predicted end levels in the interval between 24% and 32%. Our conclusion is that during the last two years, the model is not very sensitive to the length of the training period, indicating a stable logistic growth, with a final level of around 28%.

The statistic metric R2 is consistently very high for all experiments during the last 13 months period — at 0.99 for the training sets and around 0.85 for the test sets. R2 can be interpreted as how large part of the variance of the data set a model is explaining, where a higher value is better. The high values in our experiments indicate a good fit of the model to the data.

We also tried to see if the sampling method affected the predicted final level, but it seems to be quite independent of data points being sampled daily, monthly or quarterly.

The future

So, this is it? Will the great hope for the future of Internet be stalling at a mere 28% of global deployment? Perhaps this is not the end of the story - there is always a possibility that the growth of IPv6 takes place in steps, like the evolution of many other technologies. One scenario is that, after a couple of years with an IPv6 deployment level of around 28%, there might be a start of a new period of accelerating growth, leveling out at a higher percentage. So far, there is no sign of any such next step, but even if there would soon be a new boost of IPv6 rollout, we will probably have to wait for a long time before IPv6 deployment is getting close to 100%.

By Christofer Flinta, Senior Researcher at Ericsson Research

Related topics: IP Addressing, IPv6

Comments

A final state of 30% worldwide but with existing deployments over 90%... George Michaelson  –  May 29, 2019 4:18 PM PDT

I also have looked into this space, https://blog.apnic.net/2017/06/06/five-years-ipv6-whither-next-five/

What I find unsatisfying, is that a single figure model appears to need its error bars and margins better defined. We already have Jio/Reliance (mobile) and Sky (broadband) over 95% and they represent large populations. There is also evidence that whilst slow, there is a reactive market element and in both India and the UK, large scale deployment by competitors is happening. So, whilst a global IPv6 deployment average might fit 30% on the projections, the spread includes significant scale economies by GDP and population which have much much larger deployment levels.

I would however agree with your conclusion. Waiting for 100% is not sensible. Deciding how to plot a future in the next 3/5/10 years, of a mixed-protocol world is unavoidable.

I believe that what is going on is a bifurcation into two models of TCO. In one, mixed-protocol costs are assessed as a better fit, and a CGN is deployed but with higher retained levels of IPv4, and the cost burden is acquisition of small amounts of globally routable IPv4 but larger than the other model. The second model is to deploy IPv6 aggressively, almost to single-stack, but accept the smaller burden of a small CGN to cope with what is now a legacy protocol cost. The IPv4 acquisition cost is far lower, and the actual network operation cost is lower, but the conversion cost is very probably higher if you have a large V4 legacy network. Therefore this suits either clean (new) deployment, or an aggressive re-capitalisation.

I think what I'm saying is that the rate of IPv6 deployment reflects the rate and nature of capital and operational investment in the platform.

I think this sentence say it "there Jordi Palet Martinez  –  May 30, 2019 1:01 AM PDT

I think this sentence say it "there is always a possibility that the growth of IPv6 takes place in steps ...", even I will say in small jumps.

I think it is clear from several studies, and my personal experience, that IPv6 is happening in each country at a different rate, and basically it starts with one of the major ISPs deploying it, and most of the time the other ones following. This means that there may be no increase in the deployment in several weeks or even months, but then you have a sudden change, because even a single ISP can mean hundreds of thousands or even millions of customers.

Specially in the residential side, when an ISP deploys IPv6, typically over 65% (up to 85%), of the traffic goes to CDNs/caches, which are already IPv6 enabled, so those leaps are on that order.

there is no sign of any such next step ... Abraham Chen  –  Jun 12, 2019 2:24 PM PDT

This is a very intriguing report. It uses pure mathematical curve fitting technique to cut through a lot of confusions created through opinions and interpretations expressed by different interests. When I was first exposed to this Google data around 2014-2015, it did look very much ready to take off exponentially. Then, I got quite puzzled by the increasingly noisy curve with a tendency to lowering its slope with time. Thanks for a model that filters out the noise and arrives at an asymptotic prediction conclusion. This provides the baseline for a very concise visual of the trend.

A.  Instead of using this Google data whose source is somewhat specific, may I recommend you to apply your technique to the IPv6 / IPv4 comparison statistics by AMS-IX that you likely know about at

https://stats.ams-ix.net/sflow/ether_type.html

This data is more general because it is from their peering business serving users across different categories. Although, there would be some other factors influencing the exact meaning of the information, as well. Of course, the challenge will be whether your model has the resolution capability to see the trend of this nearly stationary data, although also chronologically recorded in fine detail.

B.  The reason that we have been keeping an eye on this types of statistics is because we are working on an approach that may relieve much of the IPv4 related issues and concerns, thus affecting the future IPv6 traffic. Appendix C of the following IETF Draft outlines a snapshot of the current status of our efforts. I believe that its implication is worth your review and comment.

https://tools.ietf.org/html/draft-chen-ati-adaptive-ipv4-address-space-05

A parameter of the Sigmoid function is Paul Wilson  –  Jun 14, 2019 8:49 AM PDT

A parameter of the Sigmoid function is its “maximum value”, which in the absence of a specific limiting factor can only be 100%.  It is a fundamental mistake, I believe, to assume an arbitrary lower value, just because it fits the data.  The ceiling you have derived simply doesn’t exist in reality.

This is amply demonstrated by the fact that the USA already exceeds 50% deployment:

https://stats.labs.apnic.net/ipv6/US

There is NOTHING stopping the world average from likewise exceeding 30%, then 50%, and then more. I suggest you try another curve, or else persist with the S-curve, with a maximum value of 100%, and watch how we track into an unpredictable future.  That may yet be an interesting analysis.

Percentage of IPv6 traffic David Crowe  –  Jul 11, 2019 7:38 PM PDT

The statement that IPv6 represents "28% of global deployment" is almost certainly an overestimate. This is just google traffic, and in many countries google is not as dominant as in English speaking countries. The statistics from the Amsterdam Internet Exchange (https://www.ams-ix.net/ams) are probably more representative and they show that traffic on July 5 2019 was 2.3%. Unfortunately they don't supply historical data, but I have obtained some from the internet archive. This shows that IPv6 traffic rose from about 0.5% in late 2013 to 2.2% in late 2017, but since then has stalled. The pattern of the curve may be similar, but at about 1/10th of the google curve.

A lot of IPv6 traffic is off-exchange George Michaelson  –  Jul 11, 2019 8:02 PM PDT

David, I am led to believe that quite a lot of traffic in IPv6 flows in direct private peering. The APNIC measure is capability: if given an IPv6 webblot, can you fetch it. When we say at aggregate 20-25% can do this, thats how we account for uptake. 

It wouldn't surprise me if both things are true: many people are capable of IPv6, and use IPv6 when in direct contact with a source, but for public-routing packet exchange, the figures strongly suggest a historical overarching IPv4 traffic volume.

Mobile segments moving aggressively to IPv6 George Michaelson  –  Jul 11, 2019 8:07 PM PDT

As a distinct class of behaviour, IPv6 is now at high levels of penetration in the mobile sector, such as Reliance, and there are signs of active competition in provision of IPv6 in the Indian telephony sector.  This traffic isn't going to show up at an Exchange very much because the kind(s) of engagement people do in mobile don't flow on public paths. They tend to caches, and direct/embedded service models. Given the deployment of a native IPv6 mobile with CGN either as overlay or dualstack, I could believe the "deployment" figure because by market share, India as mobile is huge, and by market share, Reliance in India is huge.

Almost all the significant uptick in IPv6 in China is coming from the mobile sector. Because the federated states model inside China has distinct ASN, it doesn't show as a single line aggregate and again, peering in china is opaque and unlikely to show true levels of traffic because the modalities of usage are just different.

Akamai's figures closely track APNIC and Google, for most ASN. Akamai are fully independently measuring IPv6 without reference to Googles figures. APNICs measurement does use Google advertising but it is run distinct from any numbers Google publishes directly and the tested clients are distributed worldwide in mobile, tablet and pc/desktop. They aren't a measurement of fetching of google assets, they are adverts placed by google/doubleclick, but in general purpose websites, games, and other ad-revenue apps.

Let's Focus on Some Prinicples Abraham Chen  –  Jul 15, 2019 12:52 PM PDT

We need to focus on principles to avoid getting into details that distract the focus leading to divergence in the discussion:

A. Sigmoid function: This is a mathematical equation used to model many phenomena and events. It has asymptotic maximum and minimum limits which are commonly normalized to be 0 — 100% or +/- 1. By itself, its curve does not have much meaning, until physical quantities of a subject matter are associated with it.

If a product is to address a particular field, it can treat the maximum possible demand from that field as the 100% target. If such field is part of an industrial sector with more than one field, the product can not expect to fill the entire sector, thus the projected maximum demand for this product has to be less than 100%. Since IPv6 is only one of a few protocols that are currently carrying Internet traffic, it is not unreasonable to accept that IPv6 will handle less than the entire traffic. Unless, we have a definitive knowledge that other protocols will fade out at certain future time. With the Dual-Stack scheme expected to be in operation for a long time to come, it is clear that IPv6 can not assume 100% of the Internet traffic at least for the same duration.

B. Deployment vs. Traffic: The APNIC statistics cited above is a chronicle equipment readiness record. It is fundamentally different from that of AMS-IX traffic data. The former can be expected to reach 100% someday when all IoTs are IPv6 capable. The latter is normally shared among several protocols. So, IPv6 traffic can not be 100%. In addition to Dual-Stack, IPv6's backward incompatibility will discourage adoption. Therefore, the percentage of Internet traffic carried by IPv6 will be capped at an even lower level. Until such handicaps are removed, we should not assume that the IPv6 target can be 100%. On the other hand, if some new event come along that has negative impact on IPv6, this cap may become lower still.

A possible exercise as suggested in my initial comment is to multiple the IPv6 % value in the AMS-IX statistics by a factor of 10. The resultant numerical numbers would then be in the 20's which are in the same range as those in the Google data used by the author of this article. It would be interesting to see if the two curves have similar shape and projection?

C. Backbone Peering: One argument often presented for the sake of the IPv6 is that it is deploying very fast and significant in various sectors. This may be true but hard to debate, unless the relevant worldwide data is fully disclosed. On the other hand, an article about IPv6 peering dispute among backbone router businesses sheds an interesting light:

https://www.theregister.co.uk/2018/08/28/ipv6_peering_squabbles/

In essence, this article reasoned that because the peering arrangements for IPv6 was not as mature as those for IPv4, larger portion of IPv6 traffic compared to that of IPv4 was being diverted to the IXs, such as AMS-IX. In other words, if peering arrangements for both IPv4 and IPv6 are about the same, IPv6 traffic seen in the AMS-IX statistics will be even lower!

Abe (2019-07-15 15:51)