Can You Trust the Data Coming Into Your SIEM?

Format and normalize audit events from each source; synchronize clocks between all event sources; establish filter rules to get rid of the daily noise. Wait…. Did we just filter out too much noise? Was that event we just ignored important? We can probably all agree, the hardest and most complicated parts of audit logging are ensuring you are logging only what you need and performing the actual analysis on the aggregated audit entries. These tasks are generally the job of a Security Information and Event Management (SIEM) and human analysts, but before we can even get to log analysis, we need to carefully consider something on the back end - can we trust our log and audit data?

Determining whether we have trust in our log / audit data is no small task. The naive approach is to say “Well the log data came from our devices, and we aggregated it, so it must be trusted, right?...” This approach falls apart if we start to ask ourselves questions related to our overall design, architecture, and threat model:

What if the data was modified in transit?
What if the data was forged by an attacker?
Do we still have positive control of the device we’re collecting from?
What if someone redirects our log data to a sink they control?

The principles of Zero Trust tell us we shouldn’t just blindly accept our log data (from wherever it comes from), and we need to make sure the data has not been tampered with.

Traditionally, and likely still in some very specialized cases, log or audit data was duplicated to a line printer under the assumption that it’s much harder to manipulate both physical and digital logs. In most situations where an embedded device may be deployed, this approach isn’t viable for a variety of reasons including the operating environment, the complexity of the added analysis, and the need to have printers everywhere. Modern cyber security approaches generally dictate that we aggregate all our log data to a central source or repository for analysis, which leaves us to determine how we can trust the data coming into this central collection source.

The concept of trust as it is applied to audit data is complex, and there are a variety of things that can be done to increase and/or verify the trust worthiness of audit / log data. We’re going to pick a few to highlight that can be addressed through the configuration and use of existing capability in a Linux environment.

How do we verify our applications and systems generating audit logs themselves haven’t been tampered with or modified?
How do we securely transmit audit and log data to a central collector?
How do we verify that our log data is destined for the expected collector and isn’t being rerouted?

While we’re not going to look at them, some other areas that are important to consider from a trust worthiness perspective include a synchronized time source. The synchronized time source is commonly something like ntp (possibly using the chrony daemon), and is generally used to force a common timezone, most likely GMT so that logs across multiple locations can all be seamlessly aggregated and analyzed. Additional considerations would include do we have a mechanism to force struct typing and formatting of the log data, and do the logs properly identify the context (user, system, SELinux context).

Verify Application and Logging Integrity

All sorts of malice and mischief can be caused by modifying an application that generates audit logs. As an example of what can be done, audit messages can be changed, or removed entirely. Similarly, if an attacker were to modify the system’s auditd or syslog-daemon, they could limit transfer to a remote host, drop messages, or modify the content of messages.

In order to prevent our applications, the syslog daemon, and the relevant configuration files from being modified or tampered with, we need to implement some form of a filesystem integrity solution, which may also force a read-only filesystem. Depending on its implementation, most of the existing solutions in Linux will prevent the file from being accessed if it has been modified, and some solutions may also force a system reboot or enable some form of recovery. Similarly, not all of the solutions available in Linux may apply to data files out of the box and may only apply to executables / libraries.

The approaches we generally have available in Linux are:

IMA

Pros:

Has basically been supported in the kernel forever (it’s the oldest of the solutions)
Supported by nearly every Linux distribution

Cons:

Notoriously difficult to setup and configure
Requires a filesystem with dnotify / inotify support which may exclude some embedded filesystems
Verifies the entire file at once (may lead to performance degradation)

dm-verity

Pros:

Works underneath the filesystem at the block-level
Some support for automatic recovery (based on use case and environment)
Applies to all files, whether they are executable or not

Cons:

Requires a relatively modern kernel
Potentially forces more partitions / disks to be used or the use of overmounts

fapolicyd

Pros:

Highly configurable
Significant documentation

Cons:

Best supported on RHEL and Ubuntu
Intended primarily for an Enterprise environment

While it’s not explicitly stated, it’s important that whatever integrity verification solution you use be integrated into a secure boot solution. Not having the integrity verification integrated with secure boot provides the attacker an avenue to not only compromise our logging / auditing infrastructure, but also make changes to the rest of the system. Similarly, using an integrity verification solution has implications for updates, that may need to be addressed during system design.

While not strictly related to the integrity of our application stack and logging infrastructure, it may be desirable to take this a step further and implement both mandatory access controls (i.e., around access to the syslog sockets / IPCs, logging infrastructure, etc.), and reduce system calls to various applications. Both of these mechanisms can help increase our confidence and the overall trustworthiness of our audit / log data. MAC policies can be implemented with a variety of Linux Security Modules, such as SELinux or AppArmor. Similarly, system calls can be reduced through the application of seccomp filters using systemd or within the application itself. Reducing system calls helps to reduce the attack surface and helps prevent an attacker from using other applications or trying to influence the logging and auditing subsystems.

Securely transmit log / audit data

We have two complementary goals with the secure transmission of audit / log data; first, we want to ensure the data has not been modified in transit from our edge node to our central collector. Second, we want to ensure that we have confidentiality of the data as it transits from the edge to our collector and analysis environment. Luckily TLS can solve both of these goals for us.

System loggers such as rsyslogd and syslog-ng can be configured to transfer logs using TLS. In order to support TLS however, we need to introduce the use of certificates, and this is where the complexity comes in. In support of Zero Trust principles, we want to ensure we have mutual authentication of the client (our edge device or source of log events), and the server (our log collector). In other words, we want both the client and the server to be able to verify each other and establish trust before transferring log data. Both syslog-ng and rsyslogd can be configured for mutual authentication using certificates.

We really have 2 paths available for TLS certificates and which path we choose may at least in part be dictated by our environment, and what kind of IP addresses we use. If we’re primarily using private addresses, we’ll probably want to create our own CA, and issue self-signed certificates to the clients and servers. Just think what would happen if one of the public CAs were to issue IP-based certificates and suddenly you became authoritative for 10.0.0.0/8; you would be able to influence TLS connections on almost every network out there (sounds like an attacker’s dream). Using our own CA and self-signed certificates requires some additional configuration, namely we need to add our CA to each node as a trusted CA during provisioning. We also have to consider how we protect our CA< and the root keys / certificates to keep it out of the hands of an attacker.
The other route we can use is commercial certificates, which are likely already trusted as a CA in your environment. Of course, certificate management, the ability to perform mutual authentication, and provisioning is greatly influenced by CONOPs and the operating environment; as an example, some devices may be deployed on a customer’s network and the customer will be the one doing the auditing and log collection, not the device vendor. Situations such as this require a different strategy for mutual authentication and certificate provisioning. Regardless of which source of certificates we use, we want to make sure we can do mutual authentication, and we’ll need to consider several other factors:

Do we need to use a certificate revocation list (CRL)? This would enable us to stop trusting an edge device (or even a server) if it were compromised or reached end of life. Of course, using a CRL requires us to have either permanent or at least intermittent connectivity (which presumably we already have for transferring audit logs).
What kind of a lifetime do we put on our certificates? A shorter lifetime will increase the maintenance burden but will also provide more opportunities to identify compromised endpoints, updates certificate algorithms for the latest standards (i.e., switch MD5 for SHA256), and possibly provide maintenance / update windows.
What algorithms and key sizes do we use with our certificates?

One thing not specifically mentioned here, but we need to consider is how we protect the confidentiality and integrity of the certificates (and keys) on both devices. Do we have security configured to prevent unauthorized access to these certificates and keys on both ends? What happens if an attacker has root-level execution on the systems? What happens if the device is lost or stolen? The use of a CRL can help mitigate some of these challenges, as we have an established mechanism to revoke a certificate and stop trusting a device.

Routing of log data to the expected destination

Now that we have a way to securely transfer our log data and protect the integrity of application and logging stack, we can look at how do we ensure we’re sending log data to the correct and expected log collector. In order to help with this, we can look at a couple of configuration options available in Linux:

DNSSEC
DNS over HTTPS
General System configuration
Firewall Configuration

DNSSEC lets us cryptographically sign our DNS zones, so we can ensure they have not been changed or modified. DNSSEC requires support from the local resolver (to check the zone signatures), as well as the upstream DNS server, and the top-level domain resolvers. Through the use of DNSSEC though, we can verify that nobody has added additional entries to our zones (maybe to redirect traffic or enter a round robin for log collectors) or tried to become authoritative for our zone. One thing DNSSEC does not provide however is encryption of the actual DNS queries themselves. This is where DNS over HTTPS (DoH) comes in; optionally using the same certificates we already established and deployed for audit offload and transfer, we can now encrypt the contents of our DNS queries. This protects our DNS queries from being observed or intercepted and helps provide some privacy protection during operation. By itself, DoH is mostly security through obscurity. However, in the larger interest of defense in depth, ensuring integrity and confidentiality of records, and helping to establish trust across our entire infrastructure (i.e., following Zero Trust protocols), we shouldn’t trust our neighbor networks and traffic and should encrypt everything we can. Much like how we use the certificates for encrypting the log offloading, we can also revoke certificates for compromised devices, enforce mutual authentication (i.e., is this really our DNS server) as another line of defense and not have a single point of failure. DoH to our servers can also be used to mitigate DNS being used for command & control or other malicious activities and gives us control over the specific DNS servers used and what types of queries are accepted. DoH also gives us another avenue to force firewall restrictions with as well as opens up the possibility of application-level proxies.

General system configuration can be used to limit the games that can be played with IP addresses and various tricks to intercept or bypass various restrictions. Some configurations to examine include:

IPv4/6 ICMP Redirects (net.ipv[46].conf.all.accept_redirects=0) => Prevent a neighbor from forcing traffic through themselves or answering for another IP

IPv4 Source Routing (net.ipv4.conf.all.accept_source_route=0) => Prevent an attacker from trying to provide explicit routes for traffic, potentially enabling interception.
IPv6 Router Advertisements (net.ipv6.conf.all.accept_ra=0) => Prevent the gateway from advertising new IPv6 routes potentially enabling redirection or interception

This is only a small sampling of some of the system configuration options that should be disabled (and enforced through a variety of means including the system’s integrity verification components). The sysctl configurations for these values can then be protected with the system’s integrity verification capabilities to prevent persistent changes.

Lastly, even though firewall rules are considered best practice, we should ensure our deployed systems (both the edge devices and the log collectors themselves), are only permitted to send audit log data on port TCP:6514 (syslog-tls) to our specific collectors. Similarly, the DoH servers should be configured to only accept DoH traffic, and to only forward DOH traffic, and our clients or edge devices can be configured to only allow DoH to our DNS servers. The firewall rules themselves can then be protected with the system integrity verification capabilities to prevent persistent changes.

Putting it all together

As we can see, and have previously discussed, security is not a point solution, it can’t be applied after the fact, and there is not a single solution that solves all security challenges. Even though we may be required (or desire) to have centralized logging and auditing of our devices, the mere concept of centralized logging / auditing does not solve what we’re really after in the first place. We need to be able to think through security holistically, and work through what we’re really trying to do and the impacts of each individual security component or requirement.

Commercial solutions such as our own Kevlar Embedded Security, can help to mitigate some of these threats and concerns, but we’ll be the first to admit, Kevlar Embedded Security does not solve all of your security challenges. Kevlar Embedded Security can help configure allowlisting and filesystem integrity verification, it can simplify the deployment of certificates for secure logging, and it can enable DoH. However, Kevlar Embedded Security can’t (or at least currently) address the systemctl tweaks or firewall rules. Similarly, there are numerous commercial solutions for centralized logging and logging transport, but we need to ask ourselves how those systems are secured, and what if any new concerns do they introduce. Commercial solutions for log aggregation may not include mutual authentication for log transfer or other mitigations that we desire for both trust and security.

As we see above from trying to establish trust in remote logging and the data being fed into our SIEM, we see that security is hard, really hard. We also see just how important it is to include security in our system design as it impacts provisioning, updates, and device lifecycle. Before we can effectively have security (and as we highlight above, trust in our data), we need to have a threat model and implement defense in depth.

Featured