Home / Industry

Upgrading Infrastructure With Agile Principles Using Dark Architecture

At Dyn, we've grown. A lot. Growth requires constant evolution of every piece of your company, not the least of which is your technical infrastructure. When your business is the business of Internet Infrastructure as a Service (namely managed DNS and email delivery), you better get good at it.

We typically upgrade our infrastructure and system architecture for one of three reasons:

  • For scale (targeting 10x or 100x current capacity)
  • For performance (trying to remove a system bottleneck to process work faster)
  • For decoupling (for greater reliability, maintainability, and future scaling efforts)

These efforts historically are planned as large forklift upgrades. Folks come up with a three month plan, begin working diligently on the new system, and hope to flip a switch three months later when they're ready to go live. Often what actually happens is three months turns into six months (things inevitably take longer than expected), and the business is forced through a high risk "all or nothing" flag day exercise to migrate over to the new system.

All the while during the six months, the business had no flexibility for other efforts or changing of priorities. The forklift upgrade is an all or nothing proposition, and until the team finishes the effort and migrates over to the new system, there is no business value delivered.

We needed to find a better way, so we adopted a Dark Architecture approach.

Dark Architecture is a way of thinking about, and technical approach to, solving the scale/performance/coupling problems while enabling the business to succeed and maintaining the sanity of your staff. We do this by:

  • Prioritizing migration of "flows" through a system rather than components of a system
  • Running legacy and dark architectures in parallel
  • Sending system inputs to both systems, collecting two outputs, comparing values of outputs, but throwing one away

Let's walk through a hypothetical example of a legacy approach to forklift upgrading a system.

On project start, we have inputs going to our legacy system and outputs leaving our legacy system, and 100% of the functionality (or "flows" of information) is serviced by the legacy system. Off to the side, we've started day 1 of getting a new system built.

Teams work diligently for weeks or months to complete the system, and eventually emerge victorious with a 100% functional equivalent implementation, and the time has come to migrate systems on a carefully chosen flag day. Likely, the project has slipped behind schedule, and folks were forced to power on to completion to get the system live, and little flexibility was granted to the organization for what other priorities to shift gears to along the way without completely shelving the upgrade effort.

There is no delivered customer value until the system is fully deployed and cut-over. As flag day arrives, the systems are cut-over, and 100% of the functionality is now serviced by the new system, and the legacy system is sent off to the farm to live out its days.

There are specifically three things that we want to improve about this approach:

  • Deliver value sooner to our customers. (We don't want to have to wait for the whole new system to be completed and put live before we deliver any value.)
  • Reduce the risk of failure on introduction to production. (We want to avoid an "all or nothing" migration plan.)
  • Offer flexibility to the business to switch priorities and at least have delivered the most critical value. (If we need to switch gears, we want to know that we've solved and delivered the important solutions first.)

Here's how this hypothetical example would play out at Dyn applying a Dark Architecture approach:

Before we begin touching code and systems, we begin by prioritizing the "flows" of data through system in order of pain, opportunity, business value, or whatever metric makes sense for your business. Rather than speaking on component terms (e.g., swap the reporting database backend from MySQL for Cassandra), we think in flow terms (e.g., rendering a graph of wildcard queries for customer X is taking 40 seconds to render, while all other graph types for this customer render perfectly quickly, and this graph type for all other customers renders perfectly quickly). This exercise will force you to hone scope to exactly where the pain is so you can focus on delivering the solution to this pain first and save others for later.

Once we have our priority flow through the system (let's assume it's 2% of the overall functionality), we begin first not by building that functionality. Instead, we build the scaffolding around it to allow two inputs and two outputs (comparing outputs, logging when they differ, but throwing one output away) to our overall system.

Practically speaking, this might be duplicating web service calls (one legacy, one new) or duplicating database interaction calls (one legacy, one new), and then comparing the return values and logging to a file or server or message bus when they're different.

With the input/output scaffolding in place, we’re now ready to start writing functionality. We’re going to implement that most painful 2% flow of the system, and as soon as it’s ready, we’re going to push it to the Dark Architecture in production. It will receive production input, yet we will be throwing away the output, but comparing the value to the legacy system output.

If they differ, we’ll log it so we can inspect. We’ll be instrumenting the performance improvement our new system has, and gain operational experience in working with it.

Side note: Have you caught on to why we call it Dark Architecture yet? It’s “dark” because it’s receiving production input but not being used for production output until we’ve gained confidence in its implementation and operation. Only after we’re confident the system behaves the way we expect it to do we turn it “on” for producing production outputs.

Once we’re confident this new system works the way we expect it to and delivers the desired performance or scalability improvement, we’ll switch which output gets thrown away, thereby realizing the value of the new system for solving the most painful 2% of the functionality, while still relying on the legacy system to service the remaining 98% of the functionality.

At this point, our teams have successfully delivered to the customers a solution to the most painful 2% of system functionality in a fraction of the time it would take to re-implement 100% of the functionality.

Benefits?

  • Morale is high! (What technologists don’t love seeing their work put to use?)
  • Customers are cheering! (Their pain is solved.)
  • The business assumed little risk! (The functionality was de-risked by running in production with one output thrown away.)

Let’s assume there are a few more high priority flows to tackle, representing 20% of the overall system flows. Following a Dark Architecture approach, the business will soon find itself with a choice:

  • Continue upgrading flows of functionality until 100% has been migrated, or
  • Assess the remaining 80% of functionality against other business priorities.

This is a powerful difference between the legacy approach to upgrading infrastructure and a Dark Architecture approach.The business now has a choice partway through the effort. Some circumstances may warrant completing 100% of the functionality migration, some circumstances may warrant shelving future migration all together and find operating two systems in parallel a perfectly reasonable solution (not ideal technically or operationally, but it’s business!), while some circumstances may warrant slowly migrating the remaining functionality as technical debt while also pursuing more pressing endeavors.

That opportunity for choice is a cornerstone of an agile process, and having it in our toolbox for evolving our systems has been pivotal for achieving our scale.

Watch Dave Connors’ GigaOM/Structure speech on Dark Architecture:

About Dyn

Dyn

Dyn solutions are at the core of Internet Performance. Through traffic management, message management and performance assurance, Dyn is connecting people through the Internet and ensuring information gets where it needs to go, faster and more reliably than ever before. Incorporated in 2001, Dyn's global presence services more than four million enterprise, small business and personal customers. Visit dyn.com to learn more about how Dyn delivers. (Learn More)

Related topics: DNS

WEEKLY WRAP — Get CircleID's Weekly Summary Report by Email:

Related Blogs

Related News

Topics

Industry Updates – Sponsored Posts

Nominum Announces Future Ready DNS

Dyn Acquires Internet Intelligence Company, Renesys

Introducing getdns: a Modern, Extensible, Open Source API for the DNS

Why We Decided to Stop Offering Free Accounts

Tony Kirsch Announced As Head of Global Consulting of ARI Registry Services

24 Million Home Routers Expose ISPs to Massive DNS-Based DDoS Attacks

Dyn Acquires Managed DNS Provider Nettica

Why Managed DNS Means Secure DNS

SPECIAL: Video Interviews from NamesCon 2014 in Las Vegas

Rodney Joffe on Why DNS Has Become a Favorite Attack Vector

Motivated to Solve Problems at Verisign

Dyn Announces Largest Quarter In Company History

Diversity, Openness and vBSDcon 2013

How Does Dyn Deliver on Powering the Internet? By Investing in Standards Organizations Like the IETF

Neustar's Proposal for New gTLD Collision Risk Mitigation

Dyn Announces the Opening of New Data Center in Mumbai, India

15 Facts About .net to Celebrate 15 Million Registrations

SPECIAL: Updates from the ICANN Meetings in Durban

Dyn Building a Lineup of Technical Talent

DCA Registry Services Contribute to Second Africa DNS Forum, Durban, SA

Sponsored Topics