At Dyn, we've grown. A lot. Growth requires constant evolution of every piece of your company, not the least of which is your technical infrastructure. When your business is the business of Internet Infrastructure as a Service (namely managed DNS and email delivery), you better get good at it.
We typically upgrade our infrastructure and system architecture for one of three reasons:
These efforts historically are planned as large forklift upgrades. Folks come up with a three month plan, begin working diligently on the new system, and hope to flip a switch three months later when they're ready to go live. Often what actually happens is three months turns into six months (things inevitably take longer than expected), and the business is forced through a high risk "all or nothing" flag day exercise to migrate over to the new system.
All the while during the six months, the business had no flexibility for other efforts or changing of priorities. The forklift upgrade is an all or nothing proposition, and until the team finishes the effort and migrates over to the new system, there is no business value delivered.
We needed to find a better way, so we adopted a Dark Architecture approach.
Dark Architecture is a way of thinking about, and technical approach to, solving the scale/performance/coupling problems while enabling the business to succeed and maintaining the sanity of your staff. We do this by:
Let's walk through a hypothetical example of a legacy approach to forklift upgrading a system.
On project start, we have inputs going to our legacy system and outputs leaving our legacy system, and 100% of the functionality (or "flows" of information) is serviced by the legacy system. Off to the side, we've started day 1 of getting a new system built.
Teams work diligently for weeks or months to complete the system, and eventually emerge victorious with a 100% functional equivalent implementation, and the time has come to migrate systems on a carefully chosen flag day. Likely, the project has slipped behind schedule, and folks were forced to power on to completion to get the system live, and little flexibility was granted to the organization for what other priorities to shift gears to along the way without completely shelving the upgrade effort.
There is no delivered customer value until the system is fully deployed and cut-over. As flag day arrives, the systems are cut-over, and 100% of the functionality is now serviced by the new system, and the legacy system is sent off to the farm to live out its days.
There are specifically three things that we want to improve about this approach:
Here's how this hypothetical example would play out at Dyn applying a Dark Architecture approach:
Before we begin touching code and systems, we begin by prioritizing the "flows" of data through system in order of pain, opportunity, business value, or whatever metric makes sense for your business. Rather than speaking on component terms (e.g., swap the reporting database backend from MySQL for Cassandra), we think in flow terms (e.g., rendering a graph of wildcard queries for customer X is taking 40 seconds to render, while all other graph types for this customer render perfectly quickly, and this graph type for all other customers renders perfectly quickly). This exercise will force you to hone scope to exactly where the pain is so you can focus on delivering the solution to this pain first and save others for later.
Once we have our priority flow through the system (let's assume it's 2% of the overall functionality), we begin first not by building that functionality. Instead, we build the scaffolding around it to allow two inputs and two outputs (comparing outputs, logging when they differ, but throwing one output away) to our overall system.
Practically speaking, this might be duplicating web service calls (one legacy, one new) or duplicating database interaction calls (one legacy, one new), and then comparing the return values and logging to a file or server or message bus when they're different.
With the input/output scaffolding in place, we’re now ready to start writing functionality. We’re going to implement that most painful 2% flow of the system, and as soon as it’s ready, we’re going to push it to the Dark Architecture in production. It will receive production input, yet we will be throwing away the output, but comparing the value to the legacy system output.
If they differ, we’ll log it so we can inspect. We’ll be instrumenting the performance improvement our new system has, and gain operational experience in working with it.
Side note: Have you caught on to why we call it Dark Architecture yet? It’s “dark” because it’s receiving production input but not being used for production output until we’ve gained confidence in its implementation and operation. Only after we’re confident the system behaves the way we expect it to do we turn it “on” for producing production outputs.
Once we’re confident this new system works the way we expect it to and delivers the desired performance or scalability improvement, we’ll switch which output gets thrown away, thereby realizing the value of the new system for solving the most painful 2% of the functionality, while still relying on the legacy system to service the remaining 98% of the functionality.
At this point, our teams have successfully delivered to the customers a solution to the most painful 2% of system functionality in a fraction of the time it would take to re-implement 100% of the functionality.
Let’s assume there are a few more high priority flows to tackle, representing 20% of the overall system flows. Following a Dark Architecture approach, the business will soon find itself with a choice:
This is a powerful difference between the legacy approach to upgrading infrastructure and a Dark Architecture approach.The business now has a choice partway through the effort. Some circumstances may warrant completing 100% of the functionality migration, some circumstances may warrant shelving future migration all together and find operating two systems in parallel a perfectly reasonable solution (not ideal technically or operationally, but it’s business!), while some circumstances may warrant slowly migrating the remaining functionality as technical debt while also pursuing more pressing endeavors.
That opportunity for choice is a cornerstone of an agile process, and having it in our toolbox for evolving our systems has been pivotal for achieving our scale.
Watch Dave Connors’ GigaOM/Structure speech on Dark Architecture:
Dyn is a cloud-based Internet Performance company. Dyn helps companies monitor, control, and optimize online infrastructure for an exceptional end-user experience. Through a world-class network and unrivaled, objective intelligence into Internet conditions, Dyn ensures traffic gets delivered faster, safer, and more reliably than ever. Learn More
Related topics: DNS
|Cybersquatting||Policy & Regulation|
|DNS Security||Registry Services|
|IP Addressing||White Space|
Afilias - Mobile & Web Services
Minds + Machines