Linux and Hypervisor Hardening — Applying a Secure by Design Philosophy
Star Lab has long believed that Secure by Design is the best strategy for approaching security problems. To understand why this principle guides our work, you first need to understand the difficult class of problems we hope to address.
A Challenging Threat Model
Star Lab works in the lower levels of the software stack—firmware, hypervisor, and operating system—sometimes known as the Trusted Computing Base (TCB). We protect systems in operational environments that are inherently risky, such as when devices may not have sufficient physical controls, or when they are not regularly patched. These systems need to resist attackers that are smart and dedicated.
Because we know systems deployed with our protections will face skilled attackers, our threat model considers operating in an environment that others would consider extreme or hopeless. For example, we consider how to protect the system when the attacker a) may manipulate hardware or memory, b) has local execution privileges, or c) has a root shell.
We choose this threat model because our customers have systems that may not be connected to the internet and, worse, may physically be in the hands of adversaries who want to attack or reverse engineer them; systems like:
Plant controllers
Telecommunications equipment
Mission computers
Electricity generation systems
Industrial controllers
Robotics
Automotive controllers
These are systems you cannot fully control because they often aren't in your physical possession. But, you will also notice that they aren't typical general-purpose computers running arbitrary workloads for users. Instead, they are custom-tailored embedded systems. The security goal is to lock them into a particular configuration of hardware and software, allowing them to execute their specialized purpose while protecting the confidentiality and integrity of the system data and configuration.
This threat model differs significantly from the standard IT/IA threat model that informs security controls for normal rackmount or cloud servers in corporate infrastructure. Under this "standard IA philosophy," there are several good practices that apply to our threat model such as encryption of data both in transit and at rest and reducing the network attack surface to essential ports using firewalls. However, we also see many practices that don't work under our threat model. The first is everyday patching and updating we are all familiar with. The second is a broad category of tools that exist to detect and respond to malicious behavior. Virus/malware detection systems, intrusion detection systems, and heuristic or AI-based behavior monitoring, to name a few.
Why Traditional Strategies are Insufficient
Given the threat model laid out above, why are “detection and response” and “continuous patching” bad strategies? Starting with the "detection and response" strategy:
You must trust that the detection has an exceedingly low false-negative rate or a clever and adaptive human attacker will bypass it.
Your detection system must not require a constant stream of data to identify attacks such as fingerprints for new viruses. This is because an attacker with physical control of the device's operating environment will not allow it to connect home to acquire these fingerprints.
Your detection system must respond on its own without help. For example, intrusion detection systems typically alert administrators who resolve the attack. Without trusted personnel in control of the device and without any guarantee that the adversary will allow your device a network route to report the problem, no human or other system can be involved in the response.
Even if everything above is true, the monitoring system is in a race with the malicious code to shut it down before it can escalate its privilege and disable the monitoring system.
From here, it is obvious why "continuous patching" does not work. An attacker with such a high degree of control over the device will not apply any patches that improve the security posture of the system.
The result is that the security posture you ship with is the only one the attacker struggles against. This realization is a crucial motivation to embracing our "secure by design" philosophy.
What Does Secure by Design Mean (To Us)?
Under such a stringent threat model, you effectively get only one shot at security, making it imperative to get it right the first time. An attacker needs only one single successful attack-path into the system. Yet, there are typically millions and millions of lines of application, daemon, and system code with tens-of-thousands of vulnerabilities reported each year. Fortunately there are some valuable secure-by-design strategies that can help meet this challenge.
Prove It
There is no better way to make sure you get your code correct the first time than with a mathematical proof. There are decades of work on formal verification, and it has begun to bear fruit in projects such as the CompCert compiler and the seL4 high assurance microkernel. These tools allow you to trust the essential parts of your TCB (trusted computing base). Unfortunately, these stringent tools do not support concurrency well, so it is often impractical, for performance reasons, to use them on a modern CPU. Also, it is not clear that the detailed and correct hardware model that must be assumed for these formal methods holds for advanced CPUs such as modern Intel chips where a hidden CPU component may bypass the TCB and write directly to memory. For these reasons, Star Lab does not make extensive use of formal methods.
However, some tools provide weaker proofs than formal verification while still meeting performance goals. The Rust programming language, for example, guarantees both type and memory safety. Star Lab loves Rust and uses it extensively. The ability to have the compiler assure you that your code does not contain any buffer overflows or similar security faults gives you and your customers a high level of well-founded trust in the software.
Build Your Citadel
It is essential to spend significant effort reducing the attack surface of the hypervisor and other elements of the TCB when following the secure by design philosophy. This is because you want to build the smallest and most hardened TCB that you can and enforce your security controls strictly from within that TCB.
That is the line you draw in the sand and say, "This far, but no further." Assume everything above the TCB can potentially be compromised. There is just too much code in user-land to verify. Some privileged processes will eventually fall to the attacker.
Though removing what you can is a basic attack surface reduction exercise, it’s one that Star Lab takes seriously. When appropriate, we use Yocto and other tools to build tiny systems that contain a minimal set of essential packages. In fact, some of the first Star Lab funded contributions to Xen were by our friend and former colleague Doug Goldstein who added KConfig build functionality to Xen, allowing us to more easily configure out large parts of the hypervisor and reduce the attack surface.
Design and Code Securely
Using a security development life-cycle of some type is helpful. However, creating a team that takes security seriously and designs and thinks accordingly is more important.
A team that values security seeks to understand anomalies and potential security vulnerabilities. They don't attempt to wave away a potential security issue. They treat all potential attacks as severe and work to correct them. When they cannot correct them, they know what they are and work to mitigate them in other ways. And, most importantly, they take attacks seriously given the threat model. Instead of saying, "Well this attack would require them to have root access already, so it isn't worth addressing because the attacker has already won." They take the privileged user attack against the TCB as serious and work to provide security guarantees even in that case.
What Does This Look Like in Practice?
Enough philosophy. What kind of a system can stand up, even for a bit, against the kind of threat model described above? What concrete practices do we value as providing real security?
Mandatory Access Control
Most systems have discretionary access control systems. Those are the file permissions we are all familiar with. Mandatory access control systems such as SELinux, FLASK, and those implemented in our Titanium Technology Protection product can provide permissions on files or other system objects that not even the root user can modify.
Reduced Attack Surface and Reduced TCB
As discussed above, the fewer doors there are to your citadel, the less likely it is that someone leaves one unlocked. Every interface, executable file, and line of code that you can remove is one the attacker cannot use to compromise the system.
Encrypt Everything
While we assume the attacker is powerful, they are not so powerful that they can bypass well-respected cryptographic algorithms. In particular, it can be useful to chain encryption results. If a certain security subsystem succeeds, it can decrypt a key that enables another piece to continue.
Secure Hardware Wherever Possible
Hardware Security Modules, a.k.a. HSMs keep all your cryptographic operations out of main system memory and make the attackers' life much more difficult. More advanced processors may also have a way to do computation without an attacker observing it such as Intel's SGX reverse sandbox or ARM's TrustZone Trusted Execution Environment(although, in practice, it can be difficult to load your security code into those specialized environments).
Hardware Enforced Isolation All the Way Down
Modern CPUs have several levels of page tables that allow code running at a particular level to have an individualized view of memory and not worry about or interfere with other code. Most engineers will be familiar with the virtual memory system that separates individual processes from one another. However, modern CPUs also have lower-level page tables managed by the hypervisor that prevent virtual machines from interfering with one another.
Hardware isolation can also have nice properties for some types of side-channel attacks. For example, by pinning each virtual machine to a single CPU core, you can prevent different VMs from leaking information to their neighbors via CPU usage or low-level cache usage.
When you use isolation as a security feature, don't just think of the traditional uses. Your system may require code you don't trust—a weird old daemon or a flakey driver for an odd piece of hardware. Can that daemon become a unikernel application running under the hypervisor so that when the attacker compromises it, they find themselves trapped in a tiny virtual machine unable to affect the main operating system? Similarly, could that flakey driver live in a driver domain?
Deprivilege Anything You Can
Similar to reducing the attack surface, reducing privileges pays security dividends. Does your system need a root/admin user? If so, do they need all those privileges, or could those privileges be divided up among several users using SELinux policies?
Think similarly about privileged processes and VMs. They likely don't need all the authorities they have by default,either. Make every part of the system that the attacker compromises less valuable to them than they had hoped.
We have discussed several different strategies for deprivileging in the past, including LSMs vs. SECCOMP and Titanium Technology Protection vs. SELinux.
Go Forth and Don't Be Conquered
Having a philosophy about what protections are valuable and what are merely speedbumps to a sophisticated attacker allows you to think clearly about how to best deploy your limited security resources.
I hope that, even if our take on "secure by design" seems too stringent given your threat model, it can still inform your thinking and help you make better decisions.
For more information about securing Linux with our Titanium Technology Protection, tactical virtualization with our Crucible Hypervisor, or booting securely with our True Boot products, download our Software Security by Design whitepaper below.