Linux Symbolic Links: Convenient, Useful, and a Whole Lot of Trouble

Star Lab’s Kevlar Embedded Security Suite takes a defense-in-depth approach to securing Linux systems. We assume that the system we’ve been tasked with securing is going to have lurking bugs, no matter how many we can find and fix. And even if there aren’t any now, bugs are always just one software update away. So we take a layered approach to security: At various levels of the system, we apply hardenings, isolation, and even custom LSMs to ensure that the system remains resilient to attack, even as bugs are found. This means that we spend a lot of time thinking about the dusty corners of Linux and what sort of trouble we can find there. This post is a deep dive into one such corner that came up during development of our Data Protection feature. It gives you an idea of the types of considerations that go into fully securing a system, and it’s an interesting topic in its own right. 

Symbolic links, or symlinks, are a feature pretty familiar to anyone with a *nix background. These file shortcuts have been around in Unix since the early 1980s and have been a part of POSIX since its inception. Today they are supported almost everywhere: Linux, MacOS, Android, every flavor of Unix, and even Windows supports symlinks. They are ubiquitous, convenient, useful…and occasionally dangerous. 

This post will discuss some of the tricky issues with symlinks, and we’ll cover the more recent tools that Linux gives us to securely handle them in an application. While we’ll be focusing on Linux specifically in this post, we’ll talk a little bit about general POSIX-compatible symlink handling as well. 

Anatomy of a Symlink

In short, a symlink is a special file type that holds a path to a target file. There’s no special structure to the symlink: It’s just a bit of text. If the target is a relative path, it is interpreted relative to the symlink’s parent directory. You can set the target to a real file, a file that you don’t have permissions to read, or even a file that doesn’t exist at all. When a symlink’s target doesn’t exist, we call that a dangling symlink

$ ln -s i-dont-exist dangling 
$ ls -l dangling 
lrwxrwxrwx 1 user user 12 May 28 13:19 dangling -> i-dont-exist 
$ cat dangling 
cat: dangling: No such file or directory 

Listing 1: Creating a dangling symlink 

$ ln -s /root/.ssh/authorized_keys innocent.txt 
$ ls -l innocent.txt 
lrwxrwxrwx 1 user user 26 May 28 13:19 innocent.txt -> /root/.ssh/authorized_keys 
$ cat innocent.txt 
cat: innocent.txt: Permission denied 

Listing 2: Creating a symlink to a file we can’t access 

In some cases, the target can be completely made up: Linux uses a lot of what are called magic links in special filesystems (notably /proc) that may or may not have completely nonsensical targets. Some good examples of such magic links are the process namespace file descriptors in /proc/<pid>/ns. They’re not dangling: they point to a special file type that can be opened like any other file, but the link target text is merely informative. 

$ ls -l /proc/self/ns/mnt 
lrwxrwxrwx 1 user user 0 May 28 13:19 dangling -> 'mnt:[4026531841]' 
$ ls -l /proc/self/cwd 
lrwxrwxrwx 1 user user 0 May 28 13:19 /proc/self/cwd -> /home/user 

Listing 3: Example of magic links with real and nonsensical names 

Symlinks can point to other symlinks, and when accessing, the operating system will follow them recursively. Nothing is stopping you from making a symlink that points to itself. However, when trying to access a path, Linux will give up after following 40 links, wherever they appear, returning ELOOP: 

$ ln -s one two 
$ ln -s two one 
$ ls -l one two 
lrwxrwxrwx 1 user user 3 May 28 13:19 one -> two 
lrwxrwxrwx 1 user user 3 May 28 13:19 two -> one 
$ cat one 
cat: one: Too many levels of symbolic links 

Listing 4: Circular symbolic links 

$ ls -l /usr/bin/cc 
lrwxrwxrwx 1 root root 20 Feb 18  2022 /usr/bin/cc -> /etc/alternatives/cc 
$ ls -l /etc/alternatives/cc 
lrwxrwxrwx 1 root root 12 Feb 18  2022 /etc/alternatives/cc -> /usr/bin/gcc 
$ ls -l /usr/bin/gcc 
lrwxrwxrwx 1 root root 6 Aug  5  2021 /usr/bin/gcc -> gcc-11 
$ ls -l /usr/bin/gcc-11 
lrwxrwxrwx 1 root root 23 May 13  2023 /usr/bin/gcc-11 -> x86_64-linux-gnu-gcc-11 
$ ls -l /usr/bin/x86_64-linux-gnu-gcc-11 
-rwxr-xr-x 1 root root 928584 May 13  2023 /usr/bin/x86_64-linux-gnu-gcc-11 

Listing 5: Multi-level symbolic links 

Symlink Permissions and Ownership 

In the above listings, you probably also noticed a bit of symlink weirdness that has probably alarmed everyone at some point: The rwxrwxrwx (777) permissions. While it looks like the symlink has been left wide open to the whole world, in Linux the permission bits on symlinks are ignored entirely and always display as 777, an unsettling choice. It is true that anyone can see the symlink target path, but that doesn’t grant any special access to the target itself. And symlinks can’t be modified: You must delete and recreate the symlink, which depends on the permissions of the directory, not the file. The owner and group of the symlink are used only when the symlink is being deleted or renamed in a sticky directory


 

Data Protection for Embedded Systems Whitepaper


The Trouble with Symlinks 

The issues with symlinks stem from the fact that I don’t need any special permission to link to a file I have no access to myself. I can link myfile.txt to /etc/shadow just fine, even though I can’t read or write it. But if I can pass myfile.txt to someone who can operate on the target, I might be able to trick them into doing work for me. This has spawned a whole class of vulnerabilities, described and tracked as CWE-61 and CWE-59, and generates a steady stream of CVEs each year. (At the time of this writing, some recent ones are CVE-2024-32021 in git and CVE-2024-1933 in TeamViewer.) 

Symlink vulnerabilities also show up frequently when dealing with untrusted archives. Imagine you are downloading and unpacking tar archives from potentially untrusted sources. A typical tar file might contain entries like the following, in order: 

A textbook way of implementing this would be to traverse the files in order. We would create dir1, then dir1/file1.txt, then dir2/file2.txt. We would also be careful to guard against path traversal vulnerabilities, so we would reject a filename like ../../../../etc/shadow. But what happens when we receive a tar file like the following? 

The algorithm described above would attempt to write to dir1/shadow, which resolves to /etc/shadow. In the same way it will be tricked into writing to ../../../../.ssh/authorized_keys, which our attacker hopes points somewhere useful. We get very much the same effect as a path traversal vulnerability if symlinks aren’t validated as well. This has been an issue in Docker, where a crafted container image could use this trick to make the privileged Docker daemon write arbitrary files

 

Magic links present another variation of symlink vulnerability. If I can get a program to accept /proc/self/fd/7 I can get it to read or write data to one of its open files, sockets, pipes, etc. If I can convince a privileged process to write data to /proc/1002/cwd/.ssh/authorized_keys, I might be able alter important files that belong to another user. Magic links were the root of a recent flaw in runc (Docker’s container runtime) that allowed container escapes and an older flaw that allowed a container to overwrite the runc binary itself

 

So…how do we securely handle symlinks? 

Secure Symlink Strategy: Always Follow 

Given the subject of this blog post, you’d probably expect us to make you feel bad for having ever followed a symlink, but in fact, blindly following links is often the most sensible choice for less security sensitive applications. Users use symlinks to make their lives easier, and we shouldn’t get in the way of that without a good reason. A text editor for example, would be expected to open whatever file the user tells it to, symlink or no. We could certainly spin tails of scenarios where a user is tricked into editing the wrong file, but in practice, unless we have a good reason, we shouldn’t get in the way of users who wish to use symlinks for their own purposes.

This is the “do nothing” strategy, so there’s not a lot to talk about from an implementation point of view. The tricky part to doing this correctly (and it is unfortunately very tricky) is to understand when your application is “less security sensitive” and when it isn’t. This is part of why Star Lab uses a defense-in-depth approach, because it’s not reasonable to tell our customers to just always get it right or rewrite everything. 


 

System Call Filtering in Linux Whitepaper


Secure Symlink Strategy: Never Follow 

This is the most extreme option but the safest for security sensitive applications. Of course it comes with a cost: your application may surprise users and be more difficult to use. Completely disallowing symlinks is disruptive enough that it is generally recommended that this be configurable, so that your users can opt back into symlink following if they need to.

POSIX provides a few tools to prevent following symlinks, but with a huge caveat: They only apply to the last component of a path, not the intermediate directories that get resolved along the way. Foremost among these tools is the openat system call. When using this function, we can use the O_NOFOLLOW flag to prevent opening links: 

int fd = openat("path/to/file", O_RDONLY | O_NOFOLLOW); 
 
/* If "file" was a symlink, errno will be ELOOP. Note that if "/path" 
 * or "to" are links, the openat will still succeed. */ 
if (fd < 0) { 
    perror("Couldn’t open file"); 
    exit(1); 

Changing file metadata operates on symlinks as well. To prevent following symlinks in these cases, we can open it as above and use the f- variations of the POSIX functions on the open file descriptor: 

fchmod(fd, 0644); 
fchown(fd, 1000, 1000); 

Luckily we don’t have to do anything special to guard against deleting files: When given a symlink, the standard unlink system call will remove the symlink itself, not the file it points to. Moving or renaming files also never follows symlinks. 

Intermediate Symlinks

As noted above, symlinks in the middle of a path will be silently resolved by the operating system regardless of what we wish. (There’s another minor symlink detail here: /path/to/dir and /path/to/dir/ are different if the dir component is a symlink. In the former, dir is treated like a symlink, but in the latter, dir is an intermediate symlink and automatically resolved.) How do we deal with symlinks in the middle of the path? On POSIX, we need to walk the path one component at a time: 

/* Open without following any symlinks. We assume the path has already 
 * been split into components. */ 
int open_nofollow(size_t num_components, 
                  const char **components, 
                  int flags, 
                  int mode) 
    int dirfd = AT_FDCWD; 
    int nextfd; 
    int fd; 
    size_t i; 
 
    for (i = 0; i + 1 < num_components; i++) { 
        nextfd = openat(dirfd, 
                        components[i], 
                        O_RDONLY | O_DIRECTORY | O_NOFOLLOW); 
        if (dirfd != AT_FDCWD) 
            close(dirfd); 
        if (nextfd < 0) 
            return -1; 
        dirfd = nextfd; 
    } 
 
   fd = openat(dirfd, components[i], flags | O_NOFOLLOW, mode); 
   close(dirfd); 
   return fd; 

If we’re on a fairly recent version of Linux (5.6 or newer), we have a better option: the openat2 system call, which give us extra flags that we can use to truly disable symlink processing. There is no glibc function for this yet, so we need to invoke the syscall indirectly: 

int open_nofollow(const char *path, int flags, int mode) 
    struct open_how how = { 
        .flags = flags, 
        .mode = mode, 
        .resolve = RESOLVE_NO_SYMLINKS, 
    }; 
 
    return syscall(SYS_openat2, AT_FDCWD, path, &how, sizeof(how)); 

On Linux 5.10 and newer, there is one final option worth mentioning. This is more of a system configuration than an application behavior. We can mount certain portions of the filesystem using the nofollow (MS_NOSYMFOLLOW) mount option. This option disables following symlinks located on the mountpoint, much like nodev (MS_NODEV) disables accessing special files. At Star Lab, we have more control over system configuration than customer application code, so this is a useful option for us. 

Secure Symlink Strategy: Follow When Safe 

In this scenario, we validate all symlinks before we follow them. If the symlink is unsafe, we can return an error or somehow normalize it. This is a great idea, but implementation is much easier said than done. 

 

Here’s a simple example that we might try:

typedef bool (*validate_func)(const char *); 
 
int validate_open(const char *path, 
                  int flags, 
                  int mode, 
                  validate_func validate) 
    /**************************************** 
     * WARNING: Bad code below! Do not use. * 
     ****************************************/ 
 
    ssize_t n; 
    char buffer[256]; 
 
    n = readlink(path, buffer, sizeof(buffer)); 
    if (n < 0) { 
        if (errno == EINVAL) { 
            /* Must have been a file */ 
            return open(path, flags, mode); 
        } 
        return -1; 
    } 
 
    buffer[n] = '\0'; 
    if (!validate(buffer)) { 
        errno = EINVAL; 
        return -1; 
    } 
    return open(buffer, flags, mode); 

There are many problems with this code. Take a minute and see if you can spot them all!

Problem 1: Readlink 

The first problem is with readlink itself. Readlink is a particularly cumbersome API even by C standards. It does not null terminate the buffer. It will happily truncate the returned target if the buffer is too small. It returns the number of bytes written, with no indication of how large the buffer needs to be. In the above code, a symlink whose target is greater than or equal to 256 characters will be silently truncated and cause an out-of-bounds write to boot.

There are three strategies for using readlink properly. The first and easiest is just to allocate a “large enough” buffer, usually PATH_MAX + 1 bytes. (On Linux PATH_MAX is 4096 bytes. On most Unix, including OSX, it is 1024.) This isn’t, strictly speaking, portable, as relying on PATH_MAX has plenty of caveats. However, it works fine in most practical circumstances. Python, for example, has been using exactly this method for years without incident in standard library functions, and internally as well. Using this method, we can rewrite the above code as:

int validate_open(const char *path, 
                  int flags, 
                  int mode, 
                  validate_func validate) 
    /************************************************************* 
     * WARNING: Less bad (but still bad) code below! Do not use. * 
     *************************************************************/ 
 
    ssize_t n; 
    char buffer[PATH_MAX + 1]; 
 
    n = readlink(path, buffer, PATH_MAX); 
    if (n < 0) { 
        if (errno == EINVAL) { 
            /* Must have been a file */ 
            return open(path, flags, mode); 
        } 
        return -1; 
    } 
 
    buffer[n] = '\0'; 
    if (!validate(buffer)) { 
        errno = EINVAL; 
        return -1; 
    } 
    return open(buffer, flags, mode); 

The second method for using readlink is the most portable and most reliable: If the buffer is too small, reallocate and try again. This method is used in the Rust standard library. As a small bonus you will save a little bit of space: most symlinks are much smaller than PATH_MAX. A readlink wrapper function using this method might look like this:

char *read_symlink(const char *path) 
    ssize_t n; 
    size_t bufsize = 256; 
    char *buffer = NULL; 
    char *newbuffer; 
 
    while (1) { 
        newbuffer = realloc(buffer, bufsize); 
        if (!newbuffer) 
            goto error; 
        buffer = newbuffer; 
 
        n = readlink(path, buffer, bufsize); 
        if (n < 0) { 
            goto error; 
        } else if (n == bufsize) { 
            bufsize *= 2; /* try again */ 
        } else { 
            buffer[n] = '\0'; 
            return buffer; 
        } 
    } 
 
error: 
    free(buffer); 
    return NULL; 

The final strategy is to use the lstat system call to figure out how big the buffer should be before calling readlink. While this is the example given in the Linux readlink man page, it has the most gotchas. First, you must guard against a potential TOCTTOU issue (discussed below) should someone change the file on you between lstat and readlink. Second, this method doesn’t work for Linux magic symlinks, which all report a zero length.

Problem 2: TOCTTOU 

Lots of things are always happening concurrently on a system, and time-of-check to time-of-use (TOCTTOU) is a type of race condition vulnerability that can occur when we forget this. Regarding symlinks, TOCTTOU issues occur when files change on us after we have validated them to make sure everything is safe. We alluded to this above, but we have a more serious issue than that in the original code. When readlink fails, we assume that the file was not a link and open it directly, but what if between readlink and open an attacker is able to convert the path to a symlink? In that case, our validation is completely bypassed! 

We can modify our code to mitigate this issue: 

int validate_open(const char *path, 
                  int flags, 
                  int mode, 
                  validate_func validate) 
    /******************************************************** 
     * WARNING: Less bad (but still bad) below! Do not use. * 
     ********************************************************/ 
 
    ssize_t n; 
    char buffer[PATH_MAX + 1]; 
 
    while (1) { 
        n = readlink(path, buffer, PATH_MAX); 
        if (n < 0) { 
            if (errno == EINVAL) { 
                /* Must have been a file */ 
                int fd = open(path, flags | O_NOFOLLOW, mode); 
                if (fd < 0 && errno == ELOOP) { 
                    continue; 
                } 
                return fd; 
            } 
            return -1; 
        } else { 
            break; 
        } 
    } 
 
    buffer[n] = '\0'; 
    if (!validate(buffer)) { 
        errno = EINVAL; 
        return -1; 
    } 
    return open(buffer, flags, mode); 

Problem 3: Multi-Level Symlinks

Remember that symlinks can point to other symlinks. With the code above, if I want to sneak a link past validation, I can use this to my advantage. Let’s say our validation function doesn’t allow absolute symlinks. I can create the following in a directory:

lrwxrwxrwx 1 user user 22 May 29 10:55 invalid -> /illegal/symlink/value 
lrwxrwxrwx 1 user user  7 May 29 10:55 valid -> invalid 

If we pass the “invalid” symlink to the function above, it will fail the check. But if we instead pass “valid,” which points to “invalid”, it will pass validation and open the “invalid” link without further checks. 

We can solve this with good old-fashioned recursion:

int validate_open(const char *path, 
                  int flags, 
                  int mode, 
                  validate_func validate) 
    ... 
    return validate_open(buffer, flags, mode, validate); 

But now we have a new problem: Symlinks can be circular. We will need to track the symlink depth to avoid overflowing the stack.

Problem 4: Relative Symlinks 

The code above will not be able to resolve a symlink like /proc/self -> 11258. The solution is to (recursively!) track the directory of the symlink and use openat to resolve relative link targets. Our solution is already getting pretty involved, so we won’t produce a complete example here.

Problem 5: Intermediate Symlinks 

As in the previous section, all our validation occurs only on the final path component. If there are illegal symlinks in the middle of the path, we’ll never know. The solution here is to incorporate an algorithm like we used in the previous never-follow strategy, where we break the path into components and validate at each step. At this point, after fixing so many issues, we find that we’ve re-invented much of the operating system’s path lookup mechanism!

A Solution: Openat2 Again

Conditionally following a symlink is nontrivial, as we’ve seen. Fortunately, the openat2 system call we mentioned before can handle many of the most likely validations we would want to do. In addition to the RESOLVE_NO_SYMLINKS behavior we discussed earlier, we have: 

 

  • RESOLVE_NO_MAGICLINKS – Magic symlinks (/proc/self/exe and so on) are rejected. 

  • RESOLVE_NO_XDEV – This is not necessarily symlink related, but it’s still useful. When this is in effect, the path cannot  cross a mount point, via symlink or otherwise. This could be used to prevent symlinks from pointing to /dev or /proc, for example. 

  • RESOLVE_BENEATH – When this is in effect, symlinks must be in the same directory of dirfd, or in a subdirectory (recursively). This will reject attempts to link to, e.g., “../somefile”. 

  • RESOLVE_IN_ROOT – When this is used, symlinks act like dirfd is the root directory (similar to a chroot). For example, symlinks to “../../etc/passwd and “/etc/passwd” will both behave exactly like a symlink to “etc/passwd”. 

 

Here is an example of a privileged program that writes logs to a configurable directory. The user may use symlinks to rotate logs but may not use symlinks to cause arbitrary writes to other parts of the system.

#define _GNU_SOURCE 
#include <fcntl.h> 
#include <linux/openat2.h> 
#include <string.h> 
#include <sys/stat.h> 
#include <sys/syscall.h> 
#include <sys/types.h> 
#include <unistd.h> 
 
int append_log(const char* logdir, const char* message) 
    int dirfd; 
    int logfile; 
    ssize_t n; 
    struct open_how how; 
 
    dirfd = open(logdir, O_PATH | O_DIRECTORY); 
    if (dirfd < 0) 
        return -1; 
 
    /* Doing this without openat2 would be vulnerable: What if someone 
     * replaced 'logfile-latest' with a symlink to somewhere else on the 
     * system? However symlinks are useful here because they let us easily 
     * rotate logs. */ 
    how = (struct open_how) { 
        .flags = O_WRONLY | O_APPEND | O_CREAT, 
        .mode = 0644, 
        .resolve = RESOLVE_BENEATH, /* don't escape logdir */ 
    }; 
    logfile = syscall(SYS_openat2, dirfd, "logfile-latest", &how, sizeof(how)); 
    close(dirfd); 
    if (logfile < 0) 
        return -1; 
 
    n = write(logfile, message, strlen(message)); 
    close(logfile); 
    if (n < 0) 
        return -1; 
    return 0; 
 

Writing secure applications is an extremely detail-oriented endeavor. Even something like symlinks, which seem like a minor convenience feature, can end up having huge security implications if you aren’t handling them with care. At Star Lab, we are in the business of pouring over such details to produce mitigations at different levels of the system, so that even when a customer misses something small, there will still be additional layers of protection. You can read more about our approach in the following blogs and whitepapers, and we’d encourage you to do so as you endeavor upon secure-by-design development.


Ben Fogle