Defining the Threat Model for Embedded Systems: What Needs to be Secured?
In order to establish a security posture for our systems and ensure we’re enabling the right security protections, we need to clearly define what our threat model is. A threat model guides us in selecting appropriate system configurations and options, to determine if we have enough security in place, and to determine if that security protects from actors with the given level of access.
Depending on our system, we may already have a threat model handed to us that we can’t change. As an example, FIPS 140-2 provides a certification standard for multiple types of cryptographic modules. Looking at the extremes, a Level 1 module mostly only has code and data integrity requirements. The Level 1 modules are mostly intended to be used in the context of mid security devices and to be consumed by general purpose operating systems. However, on the other end of the scale, with Level 4 modules we need to provide physical tamper identification as well as potentially the ability to invoke responses like zeroing a key store. These Level 4 modules are intended for higher security situations where an attacker has or may have physical access to the device. While the solutions used in a Level 1 module may also be applied to a Level 4 module, it would be overkill to use the physical protections from Level 4 in a Level 1 module which does not need physical protections or responses. This is a threat model: clearly defined potential threats with the solutions needed to mitigate them.
Cyber security standards, which are primarily focused on basic cyber hygiene, often imply a threat model of a remote, logical over-the-wire attacker. These standards don’t specifically identify the threat model, but it can be inferred based on the focus paid to firewalls and other network-based approaches. These standards also provide the ability for an end user to tailor controls and solutions for their specific threat environment. But before we can do that, we need to define what our threat environment is.
Defining the Threat Model
So how do we define our threat model? The first step is to identify what data/applications need to be secure. To do this, there are several questions we can ask to narrow it down.
Does an attacker have physical access?
Are we concerned with logical or over-the-wire attackers?
How do we handle fielded updates?
What kind of storage does the device have? Is it read-only?
Is the system updated regularly, or is it expected to be static over its lifetime?
Does the system have performance restrictions?
Does the system have specific security requirements?
Asking questions like these helps narrow the scope of possible threats for your system by eliminating irrelevant threats. It also helps you to have a deep understanding of the system by defining what risks there are and aren’t. That way you can be confident you’ve mitigated all applicable risks and threats to the system.
Let’s demonstrate the basics with examples of different systems. We’ll walk through some designs and the thought process to define a likely threat model for it.
Example #1:
We’re building an application that will be hosted in AWS and provide data analytics for customer healthcare data.
Does an attacker have physical access to our “application”? No – The application is hosted in the cloud, which already assumes we have co-tenancy. We could back this up to include Amazon’s secure data centers, which have physical, personnel and network security policies applied to it.
Solution commentary – While it is not a great practice, we don’t need to explicitly consider a secure boot solution (Note: HIPAA or similar regulations may require it). If we choose to use a secure boot solution, we may have to use something like grub with password and stored hashes of the kernel / boot components as we only have limited hardware support. No physical access also doesn’t absolve us of the need to provide basic cyber hygiene – implementing allowlisting, forcing module signing, enabling secure boot, removing unneeded attack surface, and restricting our application / services. Depending on what the attacker’s goal is, these basic concepts work to inhibit or restrict privilege escalation, lateral movement within a network, or support various data security requirements.
Do we care about data in transit? Yes – We have network-based application that processes customer health care data.
Solution commentary – It may be highly desirable to use MAC-based policies here to ensure that only the webserver has access to the TLS keys and certificates used. This would prevent an attacker who exploits the application or system and elevates their privileges from getting access to the TLS keys and decrypting previous traffic or intercepting new traffic. This allows us to use basic cyber hygiene to help backstop the security protections we have implemented, which in this case would be TLS/SSL.
Do we care about data at rest? Yes – We store customer health care data
Solution commentary – Great! How do you protect the keys? What happens if an attacker gets privileged access to the system? Do you need to be concerned with side channel attacks because of co-tenancy?
This demonstrates that our threat model is not static, it’s evolving as we add and remove various security functionality and consider different threats on the system.
Do we care about network access? Yes – We provide a network application which only needs to use specific ports and protocols.
Does our attacker have root-level access? No – But it may become a pivot point for an attacker who gains execution context.
Solution commentary – This is often an area that is overlooked, but most systems require some form of a highly privileged user or administrator to perform critical tasks, so gaining administrative access is very likely on an attacker’s path to success. Many of the standard cyber hygiene controls and best practices can be used to help prevent this pivot to “root”.
What is our attacker’s goal? Repurpose our “application”? Steal customer health data? Use it as a beach head into the rest of my network? Execute a denial-of-service attack Ransomware?
This is a bit of a trick question, as there is no right answer since each attacker may have different goals and end games. In this case, we may be able to eliminate “repurpose device”, because even if we disappeared as a company or stopped supporting the application, once we stop paying the AWS bill the compute resources will be released. Basic cyber hygiene can greatly reduce the attack surface and inhibit an attacker from accomplishing their goal without us needing security elements from a FIPS 140-2 Level 4 module where physical access restrictions, detections and responses are required.
Example #2:
We’re building an ARM-based system for robotic control of an industrial process.
Does an attacker have physical access to our “application”? Yes – We’re building a physical device.
Solution commentary – While the attacker has physical access, we may be able to eliminate some protections because the device will be in a more secured area (employees only). If this was a consumer device, we would have a much greater threat. We should still implement secure boot for this situation. Because we’re creating a physical product, we have various options available to tie this back to hardware. Physical access to the device also forces us to assume that an attacker has privileged (i.e. administrator or root) level access to the system (the ways to achieve this with physical access are essentially unlimited). Now that we’ve identified that an attacker essentially has privileged access to the system, we need to be even more laser-focused on our application of basic cyber hygiene principals.
Do we care about data in transit? No – This is an isolated system in a physical plant.
Do we care about data at rest? No...I think?
Solution Commentary – Wrong! At a bare minimum we absolutely care about the integrity of our data at rest. We need to be able to verify that while the system is off, an attacker can’t modify our control algorithm or configuration data to enable the robot to chase employees while wielding a welder.
Do we care about network access? No, at least not in the traditional sense.
Solution commentary – We do care to ensure that our robot can only do exactly what we specified and nothing else (so MAC-based policy and the ability to strictly control access to specific buses and devices). A robot wielding a welder and chasing our employees probably isn’t great for anybody. A network here might include the RS485 bus that the robot uses to drive servos and engage its welder.
Does our attacker have root-level access? No
Solution commentary – Wrong! Think again! The attacker has full physical access to the system enabling them to become privileged or administrator in a variety of ways or mechanisms.
There is no easy way to define the threat model for a system, and as we’ve shown sometimes it can seem like a “choose your own adventure” type of effort. It doesn’t have to be complicated, though. What we need is a way to identify our systems, the threats, and the amount of security we need for a given use case and scenario. Start with the series of questions we listed above and see where it takes you. By the end of the exercise, you’ll have a much better idea of what protection you need and where you need it. We can see some of this starting to take shape through the questions and analysis we asked in the examples above.
To read about defining the threat model based on where security is needed, read Part 2 of this blog: Defining the Threat Model for Embedded Systems (Part 2): Where Do you Need Security?
At Star Lab, we have developed a methodology that we use to analyze your application, determine where it lives, and as a result, how to further protect it. For further reading, and to see if Star Lab’s Kevlar Embedded Security can complete your security scheme, check out the resources below: