Protecting an Enterprise from Cyber Catastrophe

Home / Blogs

Protecting an Enterprise from Cyber Catastrophe

	By Tom Evslin Nerd, Author, Inventor
	December 07, 2020 Views: 10,270 Add Comment

We are suffering an epidemic of cyberattacks while in a viral pandemic. This post is for those who have responsibility for assuring that the IT-based services offered by their enterprise can quickly recover in the case of successful cyber-attack or other disaster.

University of Vermont Medical Center (UVMMC) is an excellent hospital. I owe my life to treatment there and am grateful for both the skill and the kindness of UVMMC staff. They have been devastated by a cyber-attack.

It took a full month for UVMMC to recover the use of its patient database after the attack, and the institution recently blamed the failure to report COVID cases on the after-effects. It is not possible to avoid all disasters; it is possible to recover quickly—but only if recovery has been planned and practiced in advance. There are several lessons in UVMMC’s travails for every organization and every business with a critical database.

At this point, it would be reasonable and prudent for readers to ask whether I’m qualified to give this advice. I blog about many things like education, politics, and economics, which I’m not an expert in. You don’t want to rely on amateur advice for service security.

At Microsoft in the early 90s, I was responsible for the development of server-based products, including Outlook and Exchange. Later I led the development and rollout of AT&T’s first ISP, AT&T WorldNet Service. ITXC, which my wife Mary and I founded, had a network that spanned 200 countries and provided a VoIP service despised by most of the world’s telcos and quite a few governments. It had to be hacker resistant. NG Advantage, which we also founded, has an extensive Internet of Things (IoT) network. I’m a nerd, so I was deeply involved in the technology of all these products and services. More boasting here.

I’m no longer an expert in how to prevent a hacker attack, although I did write a novel called hackoff.com. The technologies for intrusion and intrusion detection and prevention change so rapidly that only those active in the field have any hope to keep up. Fortunately, the principles of preparing for and accomplishing catastrophe recovery are largely the same no matter what tools mother nature or a hacker group used to bring your servers and your services down. This post is about preparing for recovery, a very separate subject than preventing attacks.

Recovery planning starts with the assumption that there will be a disaster which renders all your organization’s computers unusable. Could be a fire, a flood, a cyberattack or something else. UVMMC and the Green Mountain Care Board, which is their regulator, have been citing attacks on other hospitals and the continuing arms race between black-hat hackers and defenders. If you know that there is a possibility of a successful attack, there is no excuse for not having and rehearsing a recovery plan. Even the “unsinkable” Titanic didn’t put to sea without lifeboats.
Recovery capability requires an off-premise backup of ALL critical data. In the olden days, we used to truck magnetic tapes with backup data to places like Iron Mountain in New York. Now the backup data can move over the Internet, but the principle is the same. The backup data must not be on the same premises or, equally important, on the same network as the servers which are being used to provide the service.
The off-premise backup data must be current. For many operations, including running a hospital, restoring the data as it was a month or even a week before the disaster struck means a significant loss of function. Even though it is only practical to backup an entire huge database periodically, changes to the database can also be sent offsite. Ideally, these changes are applied to a shadow copy of the database so that almost all data can be restored immediately when required. The process of updating the shadow database must also be off-premise and off-network and not rely on any of the software used for the day-to-day service.
Recovery of function must not depend on use of the original hardware. During Tropical Storm Irene, the State of Vermont’s computers in the basement of the Waterbury complex drowned. In the UVMMC disaster, whatever malware had been loaded on to the computers apparently took a month to eradicate. There didn’t use to be a good solution to the problem of quick access to replacement servers.

Now getting new server hardware up and running immediately sounds hard and expensive but is actually cheap and almost trivially easy. As long as preparation has been made in advance, it is possible to spin up a practically unlimited amount of computer power from cloud-providers like Amazon, Microsoft, or IBM within minutes. There is no significant standby cost for this capability. Once the cloud equipment is no longer needed, it can be shut down, and the cloud billing meter stops.

Apparently, the desktop computers and laptops (and possibly tablets) which are used at UVMMC to access data were also infected and unusable. Recovery of function cannot depend on restoring the access devices any more than it can rely on restoring the servers. In practice, this means that access to all essential functionality must be possible from a web browser on any properly authenticated laptop, computer, or smartphone. There must be a small backup supply of devices to restore key functionality immediately. New ones can be purchased and placed in service in days so long as they don’t have to be loaded with special software.

Recovery must be practiced frequently and after any change to the IT environment. Experience says that a recovery plan which has not been practiced before an emergency can be counted on to fail when disaster strikes. Lifeboat drill is mandatory. If an organization’s servers are not already in the cloud (as most should be), the organization must periodically practice bringing up its applications and restoring its data on cloud computers. Losing a few minutes’ data is excusable; losing access for up to an hour may be unavoidable. Losing access for a month means recovery has not been sufficiently planned or practiced.
The functional recovery team must be separate from the hardware recovery team in order to restore function as quickly as possible. As soon as the environment has been compromised by disaster, the recovery team swings into a well-rehearsed routine of restoring data from the offsite backup to backup servers in the cloud (if it is not already being replicated there) and providing any new access devices and passwords needed. If the original hardware does end up coming back soon, there is a small expense for renting cloud-servers; but this is immaterial compared to the cost of not having access to critical data.
The post-mortem which follows every disaster must separately determine why the vulnerability and how successful the recovery. The two issues are different.

Anyone responsible for critical systems in public or private sector should be asking their own IT people two simple questions: when was the last successful rehearsal of our functional recovery plan? How long did it take to restore functionality in the rehearsal?

By Tom Evslin, Nerd, Author, Inventor

His personal blog ‘Fractals of Change’ is at blog.tomevslin.com.

Visit Page

Filed Under

Comments

The Weekly Wrap

More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.

I make a point of reading CircleID. There is no getting around the utility of knowing what thoughtful people are thinking and saying about our industry.

VINTON CERF
Co-designer of the TCP/IP Protocols & the Architecture of the Internet