The alias Attribute - Some things I learned about the alias attribute

From emmtrix Wiki
Jump to navigation Jump to search


Recently, I had to read and modify some code that was responsible for handling alias attributes in C sources. Although I knew the alias attribute existed, I never had a closer look and thus needed to do some research. This escalated quite a bit and as documentation on attributes in general is rather scarce and the alias attribute poses no exception, I’d like to share my findings with you.

These notes begin with some probably rather dull remarks on the syntax of alias attributes and a subsequent explanation of how the compiler handles these. After having the basics up our sleeve, we have a closer look at some syntactic and semantic peculiarities. The remainder of these notes is then concerned with some cool or fun examples of what one might actually do with aliases. Readers who are easily bored or prefer to learn by example are well-advised to jump directly to these examples and return to the earlier paragraphs only when needed.

So, who’s the intended target audience? To be honest, I don’t exactly know. Obviously, these notes apply to C only and involve some rather low level hackery. Moreover, if you’re not using GCC or clang on some more or less common platform like x86_64, ARM or RISCV, the probabilities are high that your toolchain does not support the alias attribute at all. Now, if that didn’t put you off already, I can think of two possible ways of reading these notes:

  • If you’ve got some familiarity with C and are simply looking to use the alias attribute, it’s advisable to skim through the syntax and semantics sections before diving into some more complicated examples.
  • If you enjoy torturing your compiler, you’ll probably find some engaging content in the section on testing the limits and some of the later examples.

Before delving in, it’s crucial to acknowledge (yet again) the portability issues associated with the alias attribute. While it offers some quite powerful capabilities, its usage should be approached with caution, especially in cross-platform projects.

Syntax

Owing to the fact that attributes like the alias attribute have their origin in vendor-specific extensions to the C language, there are several ways to actually define an alias. We’ll only deal with two variants but the interested reader may find a third variant by looking here. Moreover, not every syntax is compatible with every compiler or even compiler settings, and different compilers do not necessarily agree upon the way in which certain attributes interoperate with standard C features. We’ll see examples below, so stay tuned.

GNU C

The GCC compiler originally introduced the alias attribute in GNU C using its __attribute__ syntax. According to GCC’s documentation, an attribute specifier is of the form

__attribute__(( attribute specifier list ))

where the attribute specifier list is going to consist of the single attribute alias("alias_tgt") for the greater part of these notes. In principle, such attributes could be attached to anything in your sources. However, the alias attribute applies only to variable and function declarations. In the example

__attribute__((alias("alias_tgt1")))
extern int alias_var;

__attribute__((alias("alias_tgt2")))
extern void alias_fun();

the attribute alias("alias_tgt1") applies to the declaration of alias_var whereas the attribute alias("alias_tgt2") applies to the declaration of alias_fun().

We’ll look into what exactly these alias attributes mean, but for the moment it suffices to think of these declarations as introducing an additional name alias_var for the previously defined variable alias_tgt1 and an additional name alias_fun for the previously defined function alias_tgt2.

C2x

Now, the reader might know very well that the attribute syntax

[[ namespace::attribute ]]

introduced in C++11 is to be included in the upcoming C2x standard. If you want to have a look, the draft on attributes can be found here.

As the alias attribute originated in GNU C, the namespace is going to be gnu in our case of interest. Hence, in C2x our first example of aliases from above could also be written as

[[ gnu::alias("alias_tgt1") ]]
extern int alias_var;

[[ gnu::alias("alias_tgt2") ]]
extern void alias_fun();

and both clang and GCC happily accept this syntax, as witnessed by compiler explorer. Just make sure that you don't omit the flag -std=c23 when invoking clang.

What is more, the proposal comes with clear and precise rules, where attributes can appear. It tells us that C2x will

[...] allow an attribute specifier to appear to the left of a
declaration so that the attributes appertain to all of the declarators in
the declaration list, or to appear to the right of all declaration
specifiers so that the attributes appertain to the type determined by the
specifier sequence.

[...]

Similarly, an attribute specifier can appear to the right of a type in a
declarator to appertain to the type, or to the right of an identifier in a
declarator to appertain to the identifier declared.

So, our above example might have also been written as

extern int alias_var [[ gnu::alias("alias_tgt1") ]];

extern void alias_fun [[ gnu::alias("alias_tgt2") ]] ();

and, again, both clang and GCC happily accept this syntax. This can also be checked on compiler explorer.

Semantics

In this section, we explain what the compiler makes of an alias attribute and give some first working examples. However, in order to explain how the alias attribute really works, we first need to recall some simple facts about symbol tables in object files.

Symbol tables

Disclaimer: While preparing these notes, the author was working on a GNU/Linux machine and while the specifics given below do not necessarily apply verbatim to your toolchain, the underlying mechanisms are almost certainly the same.

Whenever you declare a global variable or a function in one of your C sources, it ends up as an entry in the corresponding object file’s symbol tables, so that other parts of your program may use it. There is a symbol table for executable code, i.e. functions, called .text and there are symbol tables for initialized and uninitialized data called .data and .bss, respectively.

An entry in the symbol table simply tells the linker what a specific name in your program means. That is, an entry in the symbol tables either maps an identifier to a specific address in one of the segments of your object file, or is marked as UNDEFINED to tell the linker that this symbol must be defined in some other object file or library.

If you have one module a.c defining a global variable global and another module b.c using that global variable like in the example

// a.c
int global = 23;

// b.c
extern int global;

then the symbol table of a.o will contain an entry mapping global to some address in the .data segment and the symbol table of b.o will contain an UNDEFINED entry for global. The linker is then responsible for merging the symbol tables and making the code in b.c actually use the address of global in the .data segment of a.o.

The main point I want to make here is that names of functions and global variables are merely entries in some symbol tables that eventually map to specific addresses in memory. This means in particular that two names mapping to the same address will be indistinguishable after compilation and linking because a processor does not know of any names but works with memory addresses only.

Actually, this last paragraph fully explains how the alias attribute works.

A simple example with variables

Let’s get our hands dirty and try to understand the alias-attributes by investigating the following simple example:

// simple.c
int alias_tgt;
extern int alias_var [[gnu::alias("alias_tgt")]];

Compiling these two lines and looking at the symbol table of the resulting object file by invoking objdump or nm, reveals what happens inside your compiler:

> objdump --syms simple.o

simple.o:     file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 simple.c
0000000000000000 g     O .bss   0000000000000004 alias_tgt
0000000000000000 g     O .bss   0000000000000004 alias_var

Don’t worry, if you don’t know how to read the output of objdump. For our purposes, you only need to know that

  • the first column contains the address of a symbol in its segment,
  • the fourth column contains the segment that contains the symbol,
  • the last column contains the name of the symbol.

The columns are also explained in objdump's manpage and reading through nm 's manpage might give some further information if you're really curious.

In our simple example, we thus find two symbols alias_tgt and alias_var at address 0 of the segment .bss for uninitialized data. Once loaded into memory by your operating system or a boot loader, the variables alias_tgt and alias_var from simple.c will hence refer to the same actual address in memory and are indistinguishable from your computer’s point of view.

A simple example with functions

The alias attribute does not only apply to variable symbols but also to function symbols. Compiling the example

// fun.c

int fn(int a) {
  return a  + a;
}

[[gnu::alias("fn")]] int twice(int);

and investigating the resulting object file with objdump, we get the result

> objdump --syms fun.o
fun.o:     file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 fun.c
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 g     F .text  000000000000000e fn
0000000000000000 g     F .text  000000000000000e twice

Again, we have two symbols fn and twice in the .text section of our object file that share the single address 0.

Testing out the limits

What exactly is an alias?

We now know more or less what the compiler does when it encounters an alias attribute. However, it is not so clear what an alias attribute means in terms of the C language. Essentially the only official documentation on the alias attribute is GCC’s documentation of which I’ll quote the relevant first half:

The alias variable attribute causes the declaration to be emitted as an alias for another symbol known as an alias target. Except for top-level qualifiers the alias target must have the same type as the alias. For instance, the following

int var_target;
extern int __attribute__ ((alias ("var_target"))) var_alias;

defines var_alias to be an alias for the var_target variable. 

It is an error if the alias target is not defined in the same translation unit as the alias. 

One question that remains unanswered by this explanation is the question whether an alias is a definition in the standard committee’s sense. In fact, this is the question that originally initiated my investigations into the alias attribute.

Without further ado, let’s see what GCC and clang have to say. The snippet

int target;

[[ gnu::alias("target")]]
extern int alias_var;

int alias_var;

is happily accepted by both GCC and clang. However, if we turn the second declaration of alias_var from a tentative definition into a certain definition by changing the second declaration to

int alias_var = 0;

the two compilers start to disagree. In fact, clang complains about a redefinition of symbol alias_var while GCC tells us nothing. See for yourself. When you’re at it, you may check that GCC also accepts the alias after a definition of alias_var.

Now, it seems that clang treats the alias as a symbol definition and thus rightly complains about a symbol redefinition. But what does GCC do? Well, a quick inspection of the symbol table tells us that the alias seems to override any other variable definition that exists. There’s also this old bug report for GCC that seems related but never got any activity.

By the way, when it comes to functions, both clang and GCC seem to treat an alias attribute as a function definition and therefore complain about redefinitions.

Returning to variables, there’s one last surprise. Above, we noted that it seems that clang treats the alias as a symbol definition. However, that’s not completely true. Whereas a definition may appear after any number of tentative definitions, clang does not allow this for alias definitions and some tentative definitions such as

int alias_var;

[[gnu::alias("alias_tgt")]]
extern int alias_var;

as can also be seen on godbolt. Funny enough, turning the tentative definition int alias_var into a definition with external linkage (in the standard’s sense, see §6.9.2 in the latest working draft) as in

extern int alias_var;

[[gnu::alias("alias_tgt")]]
extern int alias_var;

reconciles clang with the code snippet. As already mentioned above, GCC is more forgiving and accepts both variants.

So, what’s the upshot? I guess there are two points:

  • GCC and clang don’t necessarily agree on what an alias actually is.
  • If you’re using clang, an alias attribute on some declaration is very close to a usual definition. Just make sure that either the alias is the first declaration or all other declarations are explicitly extern.

Syntactic limits

The reader may have observed that while the clear rules for the attribute syntax in C2x were highlighted above, there was no mention of any rules in the case of GCC’s original syntax. There’s a reason for that and in order to set the stage, let me quote from GCC’s documentation on its attribute syntax:

For compatibility with existing code written for compiler versions that did not implement attributes on nested declarators, some laxity is allowed in the placing of attributes.

Even though we’re not dealing with any nested declarators here, let’s see what GCC and clang are able to swallow. In fact, all four declarations in the example

__attribute__((alias("alias_tgt"))) extern int alias_var1;
extern __attribute__((alias("alias_tgt"))) int alias_var2;
extern int __attribute__((alias("alias_tgt"))) alias_var3;
extern int alias_var4 __attribute__((alias("alias_tgt")));

are happily accepted by both GCC and clang and turn out to be semantically equivalent. For function aliases there are even more positions where one could place the __attribute__. Of these, the correct C2x variant

extern void alias_fun __attribute__((alias("alias_tgt"))) ();

is rejected by both GCC and clang, and the variant

extern void alias_fun (__attribute__((alias("alias_tgt"))));

is of course interpreted as an attribute for parameter declarations. Thus, just as in the variable case, we are left with four valid and semantically equivalent variants to define a function alias:

__attribute__((alias("alias_tgt"))) extern void alias_fun1();
extern __attribute__((alias("alias_tgt"))) void alias_fun2();
extern void __attribute__((alias("alias_tgt"))) alias_fun3();
extern void alias_fun4() __attribute__((alias("alias_tgt")));

Note that for functions like int* fn() returning a pointer, there are even more variations of a function alias definition. Rest assured that in this case, too, GCC and clang swallow whatever you may come up with except for the two cases already excluded above.

However, one step further, GCC and clang start to disagree on what monstrosities are still acceptable. The alias inbetween in the example

int** ptr = 0;
int** target() {
  return ptr;
}

extern int * __attribute__((alias("target"))) * inbetween();

is still accepted by clang while GCC has problems parsing the corresponding line and consequently does not export a function called inbetween. You can see the warning in compiler explorer and either investigate the symbol tables on your own or have a look at the output on my machine:

> clang -c -std=c2x between.c
> objdump --syms between.o
between.o:     file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 between.c
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 g     F .text  000000000000000e target
0000000000000000 g     O .bss   0000000000000008 ptr
0000000000000000 g     F .text  000000000000000e inbetween

and

> gcc -c -std=c2x between.c
between.c:6:1: warning: ‘alias’ attribute does not apply to types [-Wattributes]
6 | extern int * __attribute__((alias("target"))) * inbetween();
  | ^~~~~~
> objdump --syms between.o
between.o:     file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 between.c
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 g     O .bss   0000000000000008 ptr
0000000000000000 g     F .text  000000000000000d target

A Word of Warning

Even though the beginning of the preceding paragraph might have left you with the impression that we may place __attribute__((alias="target")) essentially wherever we want, the placement does matter for a declaration list. In

extern int a, b __attribute__((alias("target"))) ;

only b is an alias for target while in

__attribute__((alias("target")))
extern int a, b;

both a and b are aliases for target. If you don’t believe me, just have a look at the symbol tables on your own and keep in mind that this remark applies to the new C2x syntax, too.

So, if I’d be pressed to write guidelines for alias attributes, I’d suggest the following simple rules:

  • An alias declaration should consist of a single declarator to avoid any confusion.
  • If you can, stick to the syntax introduced in C2x.
  • The attribute annotations have to be placed in front of the declaration like in our very first examples for the GNU and C2x syntax as this syntax seems to reliably work with both GCC and clang.

Examples

We’ll give several slightly more involved examples of the alias attribute. The first two of these examples are real-world examples. The other examples again more or less classify as testing the limits.

Aliases as portability and linking hacks

One problem where aliases might come handy is offering a specific API without renaming all your functions or offering different implementations of an API and switching between these implementations by changing aliases, e.g. with guarding #ifdefs. Among other examples for this use case, searching on github for __attribute__((alias yields for instance espressif’s implementation of VFS.

Software engineering with aliases

Another nice trick that crucially employs aliases is more related to software engineering than to linking problems: Imagine a situation, where one module of your software owns a global variable essentially_const that is computed once during startup but constant afterwards. If essentially_const is exposed as a global variable, then - rather sooner than later - some rogue developer on your team will introduce a function that modifies essentially_const and thereby introduce a hard-to-find bug[1]. After all, essentially_const is a non-constant global variable and modifying it should be OK, right?

Let us indicate a neat solution for this problem that makes good use of aliases:

a.c a.h
static int essentially_const;

[[gnu::alias("essentially_const")]]
extern const int const_view;

void init() {
  essentially_const = 42;
}
extern const int const_view;

void init();

Let’s see what’s happening here:

  • The variable essentially_const in a.c is declared non-constant and the function init() at the bottom of a.c may consequently modify it as its author pleases. Moreover, essentially_const is declared static and it therefore has internal linkage. That means in particular that our rogue developer cannot modify essentially_const as the symbol is not exposed to her[2].
  • In addition to essentially_const, the module a.c also defines a variable const_view of type const int as an alias of essentially_const. This global symbol is then exposed in a.h to all other modules of the project. What have we gained? The interface a.h of the module a.c now clearly states that const_view is constant and any compiler will complain if said rogue developer tries to modify its value.

In fact, trying to compile

// rogue.c
#include "a.h"

void fn() {
  init();
  const_view = 13; 
}

results in an error message such as

rogue.c: In function ‘fn’:
rogue.c:6:14: error: assignment of read-only variable ‘const_view’
6 |   const_view = 13;
  |              ^      

That clearly tells our rogue developer that modifying const_view is an evil thing to do.

Actually, this trick is not something the author came up with by himself. Rather, while doing some preliminary research for these notes, the author learned this technique from a source file in the qemu repositories, where the reader may also find some complementary explanations.

Doing weird stuff with aliases

As mentioned earlier, aliases are essentially just a way to have distinct names for the same memory address. As already exploited in the previous example, this means in particular that the aliasing mechanism bypasses C’s already quite weak type system and we might very well have symbols of different types sharing a single memory address.

In the example

// double1.c
#include <stdio.h>
#include <stdint.h>

double a;

[[gnu::alias("a")]]
extern uint64_t b;

#define SIGN_MASK     0x8000000000000000UL

int main(int argc, char **argv) {
  a = 0.5;
  b |= SIGN_MASK;

  printf("%f\n", a);

  return 0;
}

we have a symbol a of type double and a symbol b of type uint64_t sharing the same memory address. Running this example on a computer, where a double occupies 64 bits and where the sign of doubles is stored in the most significant bit, will result in an output like

> ./double1
-0.500000 

The reason is that the line

  b |= SIGN_MASK;

sets the most significant bit of the value stored at the address referred to by b, i.e. the very same address that a refers to. This most significant bit of b thus happens to be the sign bit of our double a.

As the reader might guess and as witnessed by the following example, one is of course not confined to messing around with the sign bit of floating point numbers. If you know the format of your floating point types, you might freely modify the exponent or the mantissa:

// double2.c
#include <stdio.h>

double a;

[[gnu::alias("a")]]
extern uint64_t b;

#define EXPONENT_MASK 0x7FF0000000000000UL
#define EXPONENT_SHIFT 52

int main() {
  // messing with the exponent
  a = 1;
  b = (b ^ EXPONENT_MASK) | ((b & EXPONENT_MASK) + (2UL << EXPONENT_SHIFT));
  printf("%f\n", a);

  // messing with the mantissa 
  a = 1;
  b = b | (1UL << (EXPONENT_SHIFT - 2));
  printf("%f\n", a);

  return 0;
}

On common consumer hardware, the output of the program looks as follows:

> ./double2
4.000000
1.250000

If you don’t know why, the Wikipedia article on double precision floating point numbers might be a good starting point for finding an explanation on your own.

Doing more weird stuff with aliases

As you might have guessed by now, there are very few safeguards in place to prevent us from defining function aliases with different signatures. In combination with other attributes like gnu::packed or gnu::aligned this may be used for tricks like the following:

// signatures.c
#include <stdio.h>
#include <stdint.h>

typedef struct [[gnu::packed]] {
  uint32_t a;
  uint32_t b;
} S;

void f(S s) {
  printf("a = %u, b = %u\n", s.a, s.b);
}

[[gnu::alias("f")]]
extern void g(uint64_t ab);

int main(int argc, char **argv) {
  S s = {2,3};

  printf("Calling f: ");
  f(s);

  printf("Calling g: ");
  g(0x0000000200000001UL);

  return 0;
}

Before showing you what this program does, let us examine it a bit more closely:

  • The file begins with the declaration of a struct type S with two members a and b of type uint32_t. The attribute gnu::packed tells the compiler to not insert any padding bytes, so that the layout of any instance of the struct S in memory is exactly as we as C programmers see it: 4 bytes of memory occupied by the member a followed by 4 bytes of memory occupied by the member b.
  • The function f takes an instance s of the struct type S and simply prints its members to the standard output.
  • What follows is an alias g of f with a different signature. The function g takes a single argument ab of type uint64_t. Being an alias of f, however, any call of g will execute the exact same code that f compiles to. Luckily, values of type uint64_t occupy 8 bytes of memory, which is the exact same amount of memory occupied by instances of the struct type S. The machine code that f compiles to will thus interpret the lower and higher 4 bytes of g’s argument ab as members a and b of f’s argument s, respectively.

Now, let’s see what this program does:

> ./signatures
Calling f: a = 2, b = 3
Calling g: a = 1, b = 2

As explained above, the function f indeed interprets the 8 bytes 0x0000000200000001UL given as argument to g as an instance of type S with member a being the lower four bytes 0x00000001 and member b being the upper four bytes 0x00000002.

Let me finish this example with a word of warning: Although being great fun, tricks like this one are highly non-portable as they depend among other things on memory alignment, padding, calling conventions and the concrete hardware that your binary is going to be deployed on. So, if you don’t know exactly what you’re doing, they should never be used in any production code.

An invitation

If you’ve come this far, there’s not much more I want to tell you for now. Get a cup of coffee, fire up your favourite editor and have some fun with aliases. If you don’t know where to start, let me give you one last hint: One may use the alias attribute for variables of struct type, too. This can be seen in the following program that is merely a slight variation of our previous example of function aliases:

#include <stdio.h>
#include <stdint.h>

typedef struct [[gnu::packed]] {
  uint32_t a;
  uint32_t b;
} S;

S s;

[[gnu::alias("s")]]
extern uint64_t c;

int main(int argc, char **argv) {
  s.b = 0x42;
  printf("%lx\n", c);

  return 0;
}

Links



  1. With some non-negligible probablity that rogue developer is you!
  2. Actually, having an alias on a static symbol leads to that symbol being exposed in the corresponding object file’s symbol table, see ARM’s documentation for instance. However, I did not find a way to modify the value through that entry in the symbol table as it is a non-global object.