Major IT outage brings businesses around the world to a standstill – expert explains what’s happened and why

A major IT outage has hit businesses across the world, grounding planes as well as affecting banks and the healthcare sector.

Author

Alan Woodward

Professor, Department of Computer Science, University of Surrey

George Kurtz, CEO of IT security firm Crowdstrike, said it had traced the issue to a “” for the security software it provides for the Microsoft Windows operating system on computers.

Microsoft said the issue was caused by an “update from a ” and that the had now been fixed.

The Conversation spoke to Professor Alan Woodward, an expert in cybersecurity at the University of Surrey, about what went wrong and how the problem could be resolved.

Can you explain what’s happened here?

I think there are two things. First, Microsoft seems to have had a problem with its Azure cloud computing platform. It’s a bit unclear, but there was a degree of degradation in that service starting in the evening of 18 July. However, it didn’t fail altogether.

But by far the bigger problem seems to be an update that appears to have been done in the late evening of July 18 for [IT security company] Crowdstrike’s Falcon product – a computer threat checker. Falcon works by having some “agent” software deeply embedded in the operating system of every PC, which monitors that computer and “calls home” if there’s a problem. It also receives updates on what to look out for if there’s a threat. It’s used a lot by large organisations throughout the world, which have a huge number of PCs to police.

I’m sure Crowdstrike are urgently investigating what happened. This piece of software is designed to protect people from and the like. From the latest information I’ve seen, it looks like the update system file was somehow released in an incorrect format.

The Windows operating system gets to this update and it doesn’t know how to cope, so it crashes. That’s why people have been getting the “blue screen of death” [a computer screen with an error message indicating a system crash].

And the big problem is, you can’t fix this issue remotely. You have to go into every machine separately and put it into “safe” or “recovery” mode to isolate the software. From there, you should be able to reboot the machine and get it up and running again. But if you’re a big global company with a large distributed IT estate, that’s going to take a long time.

Why has this outage had such wide-ranging effects?

Crowdstrike has been a great success – its security software is used by hundreds of thousands of major clients around the world. So airlines, airports, railways, hospitals, stock exchanges … they’re all going down.

It started in Australia when they got up for business on Friday. The update had clearly been sent out last night UK time, and it has just rippled around the world.

With deliberate ransomware attacks, they’ll typically take out one or two targets at a time. But in this case, it’s happened to thousands of organisations at once. We’ve not had anything like this before.

How Crowdstrike will fix the software is yet to be determined. As I’ve explained, it’s clear how companies can work around the issue. But for some very large organisations, this could affect their critical infrastructure and business for a long time yet – it’s going to take them days to physically work round all those machines.

Can security companies ensure this doesn’t happen again?

Security software is very intertwined with a computer’s operating system – it’s buried deep in there. There has to be a way that if something is found to be corrupted, it doesn’t just keep crashing the system – this may have to be done in cooperation with Microsoft, which owns the Windows operating system.

There’s got to be some way of backing out of it, and there is. However, most people trying to log into their blank PCs don’t know how to put their PCs into safe mode and revert to a previous state.

At the moment, it looks like it’s one corrupted file that’s producing a global problem. Computers download updates all the time, so how Microsoft prevents that from happening with this update, I don’t know. It’s not immediately obvious. And the million dollar question is: how did this corrupted file get released in the first place?

How long before this problem is fully resolved?

It’s certainly going to take days, if not weeks. It’s like those hospitals in London that . They’re still suffering – there’s a very long tail on these things.

And in this case, it’s not just a long tail but a very broad swathe of global organisations in transport, health and everywhere else. I don’t think we’ve seen anything like this before.

On X, formerly Twitter, George Kurtz, co-founder and CEO of Crowdstrike : “The issue has been identified, isolated and a fix has been deployed. We refer customers to the support portal for the latest updates”.

Alan Woodward does not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.

/Courtesy of The Conversation. View in full .

Why you should never kiss a baby

New imaging method enables detailed RNA analysis of the whole brain

Four Würzburg Researchers Are “Highly Cited”

AMA urges Healthscope, Bupa and AHSA to resolve funding disputes

Ant stings can be painful. Here’s how to avoid getting stung this summer (and what to do if you do)

Minister For Health And Aged Care, Speech 22 November

Tax-free NZ Super Fund: smoke and mirrors or smart policy?

Women are less likely to receive CPR than men. Training on manikins with breasts could help

�Թ��վ