Why The Airline Industry Could Keep Suffering System Failures Like Delta's

Aug 9, 2016
Originally published on August 10, 2016 8:47 am

Delta canceled about 530 flights on Tuesday in addition to about 1,000 canceled a day earlier after a power outage in Atlanta brought down the company's computers, grinding the airline's operation virtually to a halt.

Seth Kaplan, who follows the airline industry, asks the question on everyone's mind: "If every small business on the corner can manage to keep its website running through a cloud-based server and all those sorts of things, why can't Delta Air Lines with all its resources manage to do that?"

And in fact, Delta isn't the first airline to be downed by computer malfunction — add Southwest, United and others to the list — and unfortunately, this meltdown is unlikely to be the last.

"It's a fair criticism," says Kaplan, managing partner of Airline Weekly, an independent publication that follows the industry. But, he says, airlines aren't like other businesses.

"Because they have to worry so much about safety and security, they are constrained in ways that other businesses aren't," he says. "Delta can't just host its systems on Joe Blow's cloud server somewhere else in the way that another business might be able to do."

Kaplan says if Delta and other airlines distribute their computing to many different locations, it will make them more vulnerable to, say, hackers or terrorists. In other words, given a choice between more backup systems and more security, airlines are picking security.

Delta is still investigating what happened. CEO Ed Bastian has apologized for the mess and the company is offering vouchers to affected customers.

The local utility Georgia Power says the failure was caused by a failed Delta "switchgear" at its Atlanta data center — that's a piece of equipment that connects Delta's computers to the power grid and to the company's backup generators, according to Bob Mann, a former airline executive who is now an aviation consultant.

He says this was a rare malfunction with a part that is usually reliable. "They had Georgia Power available at the site," Mann says. "They had their own generators and batteries available at the site. But the automated transfer switch seems to have failed in a way that allowed them to use neither of those systems."

And unfortunately, even if Delta finds a way to prevent another problem with the switchgear, Mann believes airline computer failures are likely to happen in the future.

The systems are increasingly complex. The computers are interacting with a myriad of outside systems, such as travel agents and Web apps. Plus, with the spate of mergers, various separate systems have had to consolidate.

"So the size of the networks, the number of devices on those networks continue to expand," Mann says, "and even the most reliable hardware have some probability of failure, however random and small that is. Thus, having more of them around creates greater potential that any one of them could fail."

The lesson for travelers may be: Don't check the bag with your toothbrush, in case you get stuck.

Copyright 2018 NPR. To see more, visit http://www.npr.org/.

ARI SHAPIRO, HOST:

Delta Airlines canceled more than 500 flights today. That's on top of a thousand flights it canceled yesterday after a power outage brought down Delta's computers. Delta's not the first airline to be grounded by a computer malfunction, and as NPR's Laura Sydell reports, it is not likely to be the last.

LAURA SYDELL, BYLINE: It was an epic fail as networks like ABC reported.

(SOUNDBITE OF ARCHIVED RECORDING)

UNIDENTIFIED MAN #1: Delta, the country's second-largest airline, and its passengers - grounded.

UNIDENTIFIED MAN #2: Everything's down right now.

UNIDENTIFIED MAN #3: Everything's down.

UNIDENTIFIED MAN #2: Yeah.

UNIDENTIFIED MAN #1: A computer meltdown that has already delayed more than a third of the airline's 6,000 flights...

SYDELL: Delta's CEO has apologized for the mess. The local utility, Georgia Power, says it was caused by a failed Delta switchgear at its Atlanta data center. Seth Kaplan imagines you may be wondering.

SETH KAPLAN: If every small business on the corner can manage to keep its website running through a cloud-based server and all those sorts of things, why can't Delta Airlines, with all its resources, manage to do that? And obviously it's a fair criticism.

SYDELL: Kaplan is the managing partner of Airline Weekly, an independent publication that follows the airline industry. Kaplan says airlines aren't like other businesses.

KAPLAN: Because they have to worry so much about safety and security, they are constrained in ways that other businesses aren't. Delta can't just host its systems on Joe Blow's (ph) cloud server or somewhere else in the way that another business might be able to do.

SYDELL: Kaplan says if Delta and other airlines distribute their computing to many different locations, it will make them more vulnerable to, say, hackers or terrorists. He says if given a choice between more backup systems and more security, airlines are picking security.

Delta is still investigating what happened. Bob Mann says the failed switch gear was used to connect Delta's computers to the power grid and to its own backup generators. Mann is a former airline executive who is now an aviation consultant.

BOB MANN: They had Georgia Power available at the site. They had their own generators and batteries available at the site. But the automated transfer switch seems to have failed in a way that allowed them to use neither of those systems.

SYDELL: Mann says this is a rare malfunction with a part that is usually reliable. Unfortunately even if airlines find a way to prevent another problem with the switch gear, Mann believes other computer failures are likely to happen.

The systems are increasingly complex. Part of it is that the computers are interacting with outside systems - travel agents, apps. And there have been a lot of mergers where two systems have started working together.

MANN: So the size of the networks, the number of devices on those networks continue to expand, and even the most reliable hardware have some probability of failure, however random and small that is. Thus having more of them around creates greater potential that any one of them could fail.

SYDELL: Delta's computer failure was the second airline fail just this summer. In July, Southwest computers were down for about 12 hours, but flights were canceled for days afterward as the airline struggled to get planes and crews where they were needed. Last summer, United Continental had computer failures that also left passengers stranded. The lesson for travelers may be, don't check the bag with your toothbrush. It may come in handy if you're stuck. Laura Sydell, NPR News. Transcript provided by NPR, Copyright NPR.