A major part of the value proposition of software over paper processing is its ability to integrate new services without friction. Thus, supply chains for software can be longer than for physical products, but the resulting distance to the end user can inadvertently create situations of overconcentration.
Risks related to IT and cybersecurity frequently relate just as much to “who” as to “what.” Windows, for example, is a well-known, apparently stable operating system, which powers a multitude of daily functions in the world, but this assessment does not necessarily take into account the multitude of other entities within its ecosystem. On June 19, the security provider Crowdstrike pushed a faulty update of its Falcon endpoint security software for Windows, causing computers around the world to produce a so-called “blue screen of death,” a protective state in response to low-level errors.
A variety of industries, most notably aviation, were subjected to sudden stoppages, although Asia was ironically less affected due to having less cutting-edge security practices. The remedy required physical access to machines which might have been configured to be controlled remotely, which could involve strenuous manual work. The incident can help illustrate the gap between the operations and security, and how the two areas might think about risk.
Failing ungracefully
The Crowdstrike bug never involved malicious action, but it still managed to achieve damages more substantial than most cyberattack scenarios. Kernel software disables the normal error handling routines of normal user applications; although it would be possible for developers to provide an alternative error handling mechanism at this level which fails gracefully, it would require active creation of a dedicated system. Advanced software like this, which operates at the kernel level within the highly sensitive internals of the Windows operating system, could be called “malware for the good guys.”
Aiming to provide the most rapid response possible to threat intelligence, Crowdstrike previously pushed up to a dozen updates a day, disregarding the software industry practice of staggered deployment of updates to monitor for any problems on a small number of machines (as well as a number of other industry-standard testing procedures which had no functional benefits).
From a legal perspective, it might seem reasonable to pursue Microsoft for damages, having granted this access to Crowdstrike. Microsoft indeed approved the code that was loaded into the OS, which is what Adam Meyers, Senior Vice President at Crowdstrike, was referring to when he testified to the US Congress that “The configuration update that occurred, the content update, was not code.” Nevertheless, the updates themselves introduced substantial risk, and IT experts almost unanimously blame Crowdstrike for the incident.
In fact, the decision to grant Crowdstrike access to the kernel in the first place was not entirely up to Microsoft, although the discussion focused on the profits of the service rather than incident risk. A 2009 EU decision forced Microsoft to provide market cybersecurity services the same access to the kernel as its own competing services, which also operated on the kernel.
If it were an application in user space, culpability for an incident like this would clearly belong to Microsoft. Microsoft has been gradually moving kernel services like drivers and anti-cheat modules for games into user space over the years. In September, in response to the July incident, it announced a renewed effort to migrate security-related APIs, reflecting a mindset of security by design which consumers of an operating system would expect. Nevertheless, due to the special role of this software, some functions will still need to remain at a lower level. Due in part to the practices of third parties, it will never be possible to evaluate the security of a single software product in isolation.
Third, fourth parties and beyond
Vendor risk can be broader in scope than just the traditional definition of cybersecurity, to also include a variety of legal, regulatory, or even geopolitical considerations. In light of the growing attention being placed on third-party risk management, in 2021, the US introduced a requirement for Software Bills of Materials (SBoMs) for government procurement - a sort of “nutritional label” disclosing the full contents of a program. With a computer-readable format, SBoMs can facilitate further automation of vendor management. They can also be used following a breach to allow users to quickly ascertain their degree of vulnerability following news of a compromise.
Outside of general cybersecurity measures, financial regulators around the world have also paid increasing attention to vendor management. On January 17, 2025, the EU will start enforcing the Digital Operational Resilience Act (DORA), which harmonizes resilience policies for all financial entities. In 2023, the US Federal Reserve, Federal Deposit Insurance Corporation, and Office of the Comptroller of the Currency jointly issued the Interagency Guidance on Third-Party Relationships: Risk Management, with an update in 2024 specific to community banks (who may need to outsource more functions due to their smaller size). Also in 2024, the Bank for International Settlements (BIS) released the Principles for Sound Management of Third-Party Risk, updating a document from all the way back in 2005. Following this trend, in the Asia-Pacific, Hong Kong, Australia, India, and Singapore have also recently issued relevant regulations.
The BIS document emphasizes a broader conception of risk than what would have been applied in the past. “Banks’ [third-party service provider] arrangements often involve dependencies on nth parties in the supply chain for delivery of services because of a variety of factors (eg specialisation, different types of innovation). Such chains may be lengthy and complex, resulting in additional or increased risks to banks. Banks should have appropriate risk management processes to identify and manage the supply chain risks, proportionate to the criticality of the services being provided. “
The limits of security by design
One frequent issue when considering operational risk is its interaction with cybersecurity. The Crowdstrike bug demonstrated this tradeoff on a large scale, but another, more mundane example is cloud migration.
The cloud frequently helps financial institutions to standardize their data governance and other processes, but it can also increase the risk of service interruptions. In a recent survey by the consultant Forrester, 65% of Asia-Pacific executives surveyed said that it has a negative impact on operational resilience, due both to service availability and various compliance issues. For this reason, it may be impossible to find a single configuration which is optimal on all dimensions; in light of its finding, Forrester recommends a more decentralized hybrid setup.
The two types of risk involve mitigation strategies with quite different skills and working cultures. The fundamental difference might be traced back to the differences between agile development - the well known “move fast and break things” ethos of the tech sector - and the more traditional waterfall method.
The “security by design” approach allows software developers to eliminate the possibility of bugs from the design phase, but it can be impractical for some of the most complex systems, such as operating systems. With the exception of this, many other aspects of cybersecurity move quickly and are well-suited for agile development, which deprioritizes over-engineered solutions to one-time problems. Sectors with a higher regulatory burden, meanwhile, tend to prefer a more stepwise approach to development, fully documenting each step and the assumptions behind them before acting. As an element of critical infrastructure, banking may seem more suited for the second approach, but unlike utilities for instance, multinational financial institutions are also exposed to cross-border regulatory changes which require constant updates.
The financial sector will need to carefully communicate its risk tolerances to both technology partners and regulators to balance the convenience of centralized control with the risks of a security monoculture. Concentration is inherent to the most powerful software, but this sometimes means trading indirect risks for direct ones.