Thoughts on the Open Internet - Part 3: Local Filtering and Blocking

Home / Blogs

Thoughts on the Open Internet - Part 3: Local Filtering and Blocking

	By Geoff Huston Author & Chief Scientist at APNIC
	October 07, 2015 Views: 13,543 Add Comment

The public policy objectives in the area of content filtering and blocking space are intended to fulfil certain public policy objectives by preventing users within a country from accessing certain online content. The motives for such public policies vary from a desire to uphold societal values through to concessions made to copyright holders to deter the circulation of unauthorised redistribution of content. This chapter will not evaluate motives for content filtering, but look at the technology that can be used to support filtering and the potential side effects of such approaches.

Content filtering, or preventing users accessing certain online content, can be achieved in a number of ways, including routing filtering, DNS name resolution filtering and traffic interception

Route Filtering

Route Filtering takes the IP address of the service to the filtered and creates specific routing forwarding rules to treat all packets directed to this address in a manner that prevents the packets reaching their intended destination.

This can be achieved in a number of ways. One way is for all ISPs and transit operators to use a list of routes to be filtered and use this as a filter list to be used to filter routing information learned by routers. This is not altogether effective, in so far as a set of addresses may be encompassed within an aggregate route object. Removing the entire route object may have unintended consequences for third parties whose services are addressed in the span of addresses covered by the aggregate announcement, and leaving the aggregate route objects in place may also support continued access to the addresses to be filtered. This approach is only effective if uniformly applied, of course.

Another approach to blocking at the routing level is a technique of advertising specially crafted false route attributes for the IP addresses as “specific” routes. Local routers pick up the filtering route advertisement and due to the falsified attributes of the advertisement will prefer this route over the route to the original content service location. Aggregate routes that span this filtered route will still be learned by the local router, so access to third parties remains unaltered. The new route for the filtered addresses can lead to a “null route” which will cause the router to discard the traffic, or it can redirect the traffic to a traffic interceptor, as described below. A concern with this form of use of the routing protocol itself to convey the addresses to be blocked is that the propagation of the false route itself needs to be carefully contained. A further concern is that this filtering route is indistinguishable from a routing attack, in so far as a third party is attempting to inject a route into the network that causes a drop in reachability to a particular destination. Where service providers already deploy routing systems that are intended to detect and drop efforts to inject bogus routes into the network these “negative” routes will be filtered and dropped unless each ISPs has specific measures in place to detect and accept these particular routes.

The Pakistan YouTube Incident.

In early 2008 engineers at Pakistan Telecom created a couple of false routes that were intended to prevent Internet users within Pakistan from access the YouTube video sharing social media web site. The technique used was a common one, namely the creation of “more specific” route advertisements, so that local routers would prefer these synthetic routes over the genuine routes advertised by YouTube itself.

As soon as the routes were created they quickly permeated beyond Pakistan and were heard through the much of the Internet, causing YouTube to be inaccessible for these users.

The ease with which the false routes were created and the speed at which they were propagated well outside of their intended scope has caused others to respond by revitalising the work on securing the routing infrastructure such that router would be capable of believing such synthetic route advertisements.

The ongoing issues with the exhaustion of the supply of IPv4 addresses has meant that there is no longer a clear association of an IP address with a particular service, a particular content publisher, or particular content. Web hosting providers use a technique of “virtual name hosting” to place the content from many different sites behind a service portal which uses a single public IP address. Blocking that IP address at the routing level not only blocks the intended service point, but also blocks the entire set of sites that exist behind the same IP address. Such collateral damage limits the efficacy of address-based content filtering.

The Australian Melbourne Free University Incident

An Australian federal government agency, the Australian Securities and Investments Commission, used powers under the Australian Telecommunications Act to require that Australian ISPs block access to a particular IP address in February 2013. The Commission told a Senate hearing that “Recently, in targeting a scam investment website, we received information that another site [Melbourne Free University] that shared the same internet address was also blocked. ASIC was un-aware the IP address was shared by other websites.” Evidently more than 1,200 different web sites were hosted on the same IP address at the time.

Not only is there increasing use of shared IP addresses for hosting content, content itself is increasingly agile across IP addresses by adding further hosts for their content. IP level blocked sites can readily circumvent such IP-level interception mechanisms by shifting their content to other hosting agencies. When content is passed into a widely distributed Content Distribution Network the content is no longer associated with a set of IP addresses, but often is served from the CDN provider’s IP addresses, along with all the other content hosted by the CDN. Such measures negate the effectiveness of content filtering by IP address by removing the stable relationship between content and address. At the same time end users can readily circumvent localised IP routing filtering by using public Virtual Private Network (VPN) services to perform a “virtual relocation” of their point of interconnection to a location where the local route blocking no longer applies.

Name Filtering

The DNS is also used as a means of enforcing filtering of content. In the simplest form of name filtering a list of proscribed DNS names is circulated to internet Service Providers, and this list is used to configure their user-facing DNS resolvers, so that queries directed to these resolvers for the filtered names result in a synthetic response.

If the user uses the configuration settings as provided by default, then their DNS resolution function will direct their queries to the ISP provided resolvers who, in turn, will apply the filter to the proscribed names list. There are a number of potential DNS responses, and operational practice varies. Some providers elect to return a response to the name query, but provide a private (unrouted) address in response to the query. Some elect to provide an address that points to a resource that describes why the name has been redirected. Others elect to send a DNS result code that the name does not exist.

Such name filtering operations are readily circumvented, and many users appear to learn of the availability of unfiltered open DNS resolvers, such as those operated by Google (Google’s Public DNS), OpenDNS or Level 3. By replacing the reference to the ISP’s resolver with a reference to one of more of these open resolvers in their devices, the user effectively restores a complete view of the Internet’s name space and bypasses the locally imposed name filter.

However, such local efforts of remediation are not without their own downsides. It can be that the use of these non-local resolvers reduces the overall perception of performance of the Internet, particularly when the non-local resolver is located far from the user’s device, as the DNS transaction to resolver the domain name takes longer for more distant resolvers.

The use of non-local resolvers also impacts on content distribution systems and content localisation. A number of content systems direct the user to different content depending on the supposed location of the user, and the way this can be done is by assuming that the user is located in the same locale as their DNS resolver. By providing a particular response based on the locale of the resolver that is asking the DNS query, non-local use of DNS resolvers effectively teleports the user into the locale of the remote DNS resolver. Some times this is undertaken deliberately by the user as a mediation against content blocking, and in other cases it may be the cause of user complaint about inadvertent exposure to inappropriate content.

The use of non-local DNS resolvers also leads to information leakage. A DNS query is the precursor to almost all forms of transactions on the Internet, and knowledge of the sequence of DNS queries being made by a user can be analysed to provide insights into user’s behaviour, both in the aggregate and, with appropriate data volume and analysis, data profiles that can potentially close in on individual users. While national regulatory frameworks may safeguard the collection, storage and use of data as it relates to data about a country’s citizens, the safeguards that relate to non-local users may not be present.

It may also be that the name filtering function includes traffic inspection and blocking attempts to use non-local DNS resolvers. Again this is readily circumvented. The most obvious form of circumvention is to place the IP address of the blocked site in the user’s local hosts.txt configuration file. In this case the user’s applications can then perform the name to address translation without using the DNS and thereby circumvent the DNS block. More general circumventions against DNS blocking encompass techniques that encapsulate DNS queries within other commonly used protocols to bypass the traffic inspection and blocking function (HTTP secure access, TCP port 443, is a common circumvention method). This form of circumvention is less commonly used at present, as there is a slightly higher technical barrier to using such DNS tunnelling solutions, but as with other circumvention methods, the more widespread the blocking the more widespread the tools and techniques for circumvention. Tunnelling DNS queries through the traffic filter makes the user’s DNS queries completely opaque to the local network operator, and hides the user’s behaviour behind a widely used conventional form of payload encryption. Providing further incentives for users to turn to deliberate obfuscation of their online behaviours may be seen in a positive light by some interests, but as a highly retrograde step by others.

These DNS-based name filtering interventions also rely on changing the result of the DNS query and providing the user with a synthetic result that is not the original DNS data. The adoption of DNSSEC, the security system for the DNS can prevent this synthetic data being accepted by the user. If the blocked name is signed using the DNSSEC technology, and the user is performing validation of DNS results, then the attempt to substitute a synthetic response would be flagged by DNSSEC validation as an attempt to subvert the DNS, and the response without be withheld from the application. The user is then aware that the name has been filtered by an intermediary, and the substitute response is not accepted by the user’s DNSSEC validating resolver.

The diverse and uncoordinated nature of the application of name filters make a subsequent task of undoing the name filters extremely challenging. If a name has been hijacked and used for purposes that trigger the imposition of name filters, then once the name is restored the subsequent task of identifying if and where such filters have been applied is extremely daunting.

Name filtering can be an expeditious and efficient way of blocking access to inappropriate content. However it is readily circumvented, and the mechanisms for circumvention often lead to an outcome where users are further motivated to adopt technologies that are intended to hide their existence and their actions from the local network. Conventional tasks such as attribution, localisation, and customisation are impeded by such steps.

Facebook and China

A example of name filtering can be found in use in China, where there is widespread use of DNS interception and synthetic responses to DNS queries, providing unrouted addresses in response to queries to resolve blocked web sites (https://cyber.law.harvard.edu/filtering/china/...).

There have been instances where the redirection via these DNS interceptors has used a third party’s IP address. In one reported case it lead to a denial of service attack, and in another it lead to a form of cross-border information leakage.

http://www.theguardian.com/technology/2015/jan/23/what-happens-when…
http://www.potaroo.net/presentations/2013-10-15-facebook.pdf

Traffic Interception

The technique of route redirection can be coupled with traffic interception in order to address some of the shortfalls of IP address filtering. This approach uses some form of routing level interception to direct the traffic to some form of traffic interceptor. At this point the interception agent functions like a conventional web proxy unit, undertaking the initial protocol with the end user as a proxy, then receiving the URL being sought by the user. At this point the interceptor can determine if the URL is part of some blocked list, in which case the connection can be terminated by the agent, or whether the proxy can forward the fetch request to the intended destination as a conventional proxy.

This form of traffic interception has fewer side effects. When a single IP address is used by multiple web sites, the original routing redirection operates at the level of an individual address, then the traffic interceptor works at the application level and applies the content access policy at the particular web sites that are the subject of the policy.

Many web sites use SSL (secure socket layer) tunnelling to prevent certain forms of traffic interception and eavesdropping. Using an appropriate traffic cipher algorithm the interactions of the application are impervious to application level interception, as the steps taken to encrypt the application session happen at the level of the initial protocol connection (immediately following the TCP handshake) rather than as a function performed within the application level interaction. However this does not prevent all forms of interception, it just prevents interception of the application level interaction. As the server’s certificate name is transmitted in the clear to the client, an interception engine can proxy the initial setup of the SSL session and then read the server name, as described in the server certificate, and decide whether or not to terminate the session at that point. Once the session proceeds past the initial crypto handshake visibility into the session is lost. This is equivalent in functionality to the name filtering approach examined above.

Cleanfeed: UK Traffic Interception

A hybrid approach of routing and traffic interception is used by the ‘Cleanfeed” system used within the United Kingdom. Route redirection is used to redirect the client traffic destined to IP addresses that are associated with filtered content to a web proxy unit. This web proxy completes the initial protocol handshake and session setup up to the point of the HTTP protocol GET command, which contains the named URL that the user is attempting to fetch. If the named URL is on the list of filtered content, then the proxy session is terminated towards the user by responding with an HTTP 404 response (Page Not Found). Otherwise, the proxy performs as a conventional proxy and initiates a new session with the requested content, relays the content request and passes the result back to the user.

There have been reports of a number of issues and shortcomings with this approach, including that the blocking filter operates only on the Domain Name part of the URL and not the trailing locator string, which has meant that large content sites (such as Wikipedia) have been filtered by Cleanfeed in the past and Cleanfeed reportedly was unable to distinguish served content beyond the simple domain name. Another reported limitation is that Cleanfeed is unable to proxy SSL connections, so sites that are being a non-Server Name Indication SSL connection cannot be filtered in this manner by Cleanfeed, as the GET commend is performed. If, however, the site uses SSL with Server Name Indication, then the intended DNS part of the URL is exposed in the clear prior to the start of the channel encryption process.

https://wiki.openrightsgroup.org/wiki/Cleanfeed
http://thenextweb.com/insider/2012/01/04/as-the-sopa-debate-rages…

By Geoff Huston, Author & Chief Scientist at APNIC

(The above views do not necessarily represent the views of the Asia Pacific Network Information Centre.)

Visit Page

Filed Under

Comments

The Weekly Wrap

More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.

I make a point of reading CircleID. There is no getting around the utility of knowing what thoughtful people are thinking and saying about our industry.

VINTON CERF
Co-designer of the TCP/IP Protocols & the Architecture of the Internet