emmtrix C to Rust Compiler
The emmtrix C-to-Rust Compiler is a source-to-source transpiler designed to modernize existing C codebases by translating them into safe and maintainable Rust. While C remains the dominant language for embedded and systems programming, its unrestricted pointer arithmetic and manual memory management are frequent sources of bugs and vulnerabilities. Rust, by contrast, enforces strict ownership and lifetime rules that prevent many of these issues at compile time.
The challenge, however, lies in bridging the gap between C and Rust in a way that preserves program semantics without producing unreadable or overly unsafe Rust code.
Naïve one-to-one translations often result in verbose output with large unsafe
regions, offering little benefit beyond syntactic portability.
The emmtrix approach addresses this problem by combining automated translation with static analyses, pragma-controlled customization, and a focus on readable output.
This enables developers to incrementally migrate legacy code while keeping the original C source as a reference.
Motivation and Challenges
Automatic translation from C to Rust faces several inherent challenges. Pointers in C allow arbitrary arithmetic and aliasing, whereas Rust requires explicit lifetimes and prohibits unchecked aliasing. Straightforward translations that map C pointers directly to raw Rust pointers (*const T
/ *mut T
) preserve semantics but produce large unsafe
regions. For example, C2Rust translates pointer arithmetic using .offset()
calls or integer address calculations, and its output may be described as “painful Rust” because it closely replicates C pointer manipulations. Research surveys observe that rule‑based translators often fail to provide idiomatic translations and rely heavily on unsafe
constructs.
Features
Correctness
emmtrix has more than 10 years of experience in translating C code with its source-to-source compiler technology. The primary design goal is to preserve the exact semantics of the C code. For instance, the transpiler automatically inserts explicit casts in Rust to mimic the implicit (and sometimes non-obvious) type conversions performed by C.
As an example, the C statement s1 = s2 + s3;
operating on short
variables is translated to s1 = ((s2 as i32) + (s3 as i32)) as i16;
.
This ensures that the intermediate promotion to int
and the final narrowing conversion back to short
follow the same rules as in C.
In a later (optional) optimization step, the transpiler may detect that it is safe to replace the intermediate i32
addition by an i16
addition, thereby eliminating the unnecessary casts.
Readable Rust output
The transpiler aims to produce Rust code that is easy to read, review, and refactor. It preserves the original program structure where this improves traceability, while applying targeted rewrites that make types and semantics explicit.
Key aspects include:
- Use of native Rust data types (e.g.
i32
,u16
) instead oflibc::c_int
.- Please note that the exact mapping depends on the target architecture settings. Since most modern C toolchains use standardized word sizes, the advantages of using native Rust types generally outweigh the potential portability concerns of relying on
libc
types.
- Please note that the exact mapping depends on the target architecture settings. Since most modern C toolchains use standardized word sizes, the advantages of using native Rust types generally outweigh the potential portability concerns of relying on
- Preservation of original C identifiers wherever possible
- Minimization of type casts (planned)
- Retention of control flow constructs where feasible
- Statements are kept in their original order (except for
switch
default blocks, which must appear at the end of amatch
) - C
for
loops are translated towhile
loops - C
do...while
loops are translated towhile
loops using an additionalenter
boolean variable to preserve the “execute at least once” semantics switch
/case
is translated tomatch
with explicit fall-through handling where required
- Statements are kept in their original order (except for
- Preservation of C comments in the generated Rust code
Unsafe minimization
The transpiler limits the use of unsafe
to the smallest code fragment that actually requires unchecked behavior
(e.g., pointer dereference, union field access, volatile memory operations).
This does not mean that the resulting code is automatically free of risks,
but that the boundaries of unsafe
are made explicit and narrow.
Instead of marking entire functions or large code regions as unsafe
,
only the exact operation is wrapped in unsafe { ... }
.
This makes it easier for developers to review and reason about the critical parts of the code
and provides clear targets for future refactoring to eliminate unsafe
entirely.
Static pointer resolution and reduction of unsafe constructs
The pointer resolve transformation in emmtrix Studio can optionally be applied to the C code before translation. This transformation includes a dedicated pointer analysis that propagates pointer targets across functions and replaces pointer arithmetic with offset variables.
By doing so, it eliminates many raw pointer accesses and simplifies the Rust output. It also handles interprocedural local variables and duplicates functions when a pointer can refer to different variables.
While this approach significantly reduces the number of unsafe
blocks and improves readability,
it comes at the cost of reduced traceability between the original C code and the generated Rust code.
Developers can therefore choose whether to prioritize readability or one-to-one correspondence depending on their migration strategy.
Continuous translation and incremental migration
The transpiler is not limited to a one-time conversion of an existing C codebase. Instead, it can be used continuously throughout the development process. This allows developers to keep the original C code as the primary reference while generating updated Rust translations whenever the C code changes.
By enriching the C source with pragmas, developers can guide the translation process according to their needs—for example, influencing data type choices or the treatment of global variables. This enables a workflow where C and Rust versions of the program evolve side by side: the C code remains compilable and maintainable, while the Rust code reflects the current state and progressively incorporates more idiomatic constructs.
A key advantage of this approach is that developers can maintain testability and portability by keeping the C code as a reference implementation. Rather than converting everything at once and performing extensive manual refactoring afterwards, teams can iteratively adjust the C source and its translation hints until the generated Rust code matches their requirements.
Automatic translation of standard library constructs
Functions from the C standard library (e.g. printf
) are mapped to equivalent Rust constructs.
In most cases, this is achieved by using Rust’s std::fmt
formatting macros
(print!
, println!
) or other appropriate standard library facilities.
This automatic mapping reduces the amount of manual work after translation
and ensures that common patterns such as formatted output or constant macros from headers like <limits.h>
are directly available in the Rust code.
By handling these standard constructs transparently, the transpiler improves readability and lowers the barrier for integrating
the translated code into existing Rust projects.
Support for libc constructs is being expanded step by step, so that over time more functions and macros will be translated automatically without requiring manual intervention.
Pragma-controlled translation (planned)
In addition to existing transformations (such as the optional pointer resolve), future versions of the transpiler will allow fine-grained control of the C-to-Rust translation process through pragmas. Planned features include:
- Controlling struct layout attributes (e.g.
#[repr(C)]
,#[repr(packed)]
,#[repr(align(N))]
) - Controlling enum representation (e.g. C-like
#[repr(C)]
vs. idiomatic Rustenum
with variants) - Controlling union translation (e.g. raw
union
vs. safeenum
wrapper withMaybeUninit
) - Controlling the data type of dynamic arrays (e.g. raw pointers, slices,
Vec
, with options for ownership and deallocation responsibility) - Controlling the representation of character arrays as strings (e.g. raw pointers,
CStr
/CString
,String
, or byte slices) - Controlling the translation of
do...while
loops (e.g.while (enter || ...)
vs.loop { ... break }
)
Testing through dual-language instrumentation (planned)
To build confidence in the translation, emmtrix plans to automatically instrument the original C program with additional debug code before the translation step. This instrumentation may include printing intermediate values, changes to global program state or control flow at defined program points. The C-to-Rust transpiler is then applied to this modified C program, so that the inserted debug statements are carried over into the generated Rust code.
As a result, both the instrumented C program and the translated Rust program contain identical debug outputs. When executed with the same inputs, the debug log can be directly compared. Any differences indicate potential translation issues or undefined behavior in the original C code.
The use of macros allows developers to enable or disable the debug outputs flexibly, both in the C and in the Rust program. This makes it possible to generate clean production builds without instrumentation while still supporting detailed step-by-step verification when needed.
Planned extensions of this concept include:
- Selective instrumentation controlled via pragmas, so developers can decide which functions or variables should be logged.
- Automatic generation of test harnesses that run the instrumented C and Rust code side-by-side.
- Integration of comparison tools that highlight mismatches in variable values or execution traces.
This approach provides a structured method to ensure semantic equivalence between the original C source and the translated Rust program during incremental migration.
Bidirectional traceability (on request)
Internally, the transpiler already tracks the original C source location of every expression and propagates this information throughout the translation process. As a result, the origin of each Rust expression can be traced back to the corresponding C code fragment.
On request, emmtrix can generate a traceability report that documents these mappings explicitly. Such a report enables developers to verify the provenance of translated constructs, supporting audits, certification processes, and systematic reviews in safety-critical domains.
Supported C Features
The emmtrix C-to-Rust Transpiler is validated against a comprehensive suite of test cases. These cover a broad range of C language constructs to ensure correctness and readability of the translated Rust code. Currently supported features include:
Variables
- Global, static and local variables
- Initialization and uninitialized declarations
- Different storage classes (e.g.
static
inside functions) volatile
variables with correct translation tostd::ptr::read_volatile
andwrite_volatile
Operators
- Arithmetic:
+
,-
,*
,/
,%
- Bitwise:
&
,|
,^
,<<
,>>
- Logical:
&&
,||
,!
- Comparison:
==
,!=
,<=
,>=
,<
,>
Control Flow
if
/else
constructswhile
anddo...while
loopsfor
loops, includingbreak
andcontinue
switch
/case
with fall-through handling
Functions
- Regular function definitions and calls
void
and non-void return types- Function pointers translated to
Option<fn(...) -> ...>
- Proper propagation of return values
Data Types
- Scalar integer and floating-point types
- Typedefs (including chains of typedefs and pointer typedefs)
- Structures (named, nested, anonymous, typedef-based)
- Unions (with correct handling of field access)
- Arrays (1D and multi-dimensional, initialized and uninitialized)
Literals
- Integer literals (decimal, octal, hex, binary)
- Character and multicharacter constants
- String literals, concatenated strings, escape sequences
- Wide strings (
L""
), UTF-8 (u8""
), UTF-16 (u""
) and UTF-32 (U""
) string literals
Pointers
- Basic pointer usage and dereferencing
- Pointer arithmetic with
.offset()
and.offset_from()
(only used when the Pointer Resolve Transformation is not applied) - Array-to-pointer decay
void*
pointers and conversions- Address-of operator and field access through pointers
Standard Library Constructs
- Translation of
printf
calls to Rustprint!
/println!
macros with correct formatting - Handling of
<limits.h>
and constant macros (e.g.INT_MAX
,UINT_MAX
)
Limitations
- Endianness dependency – The generated code currently depends on the endianness of the target architecture. This means that certain translations (e.g. involving unions or bit-level operations) may behave differently on little-endian vs. big-endian systems. Developers must therefore ensure that the target architecture matches the assumptions of the translated code, or apply additional transformations to make the code endian-independent.
Comparison with C2Rust and Other Translators
- Granularity of translation – C2Rust translates each C function into a Rust function that mirrors its structure and uses raw pointers and
unsafe
blocks extensively. emmtrix aims to use static analyses and pointer resolution to reduce the use of raw pointers and to confine unsafety to small regions, producing code that is closer to idiomatic Rust. - Readability and idiomaticity – The emmtrix transpiler prioritizes readability by converting pointer arithmetic to index variables and by using Rust abstractions where possible. Research notes that rule‑based approaches often fail to provide idiomatic translations and overuse unsafe constructs. By contrast, emmtrix strives to deliver a starting point that resembles hand‑written Rust and invites further refactoring.
- Pointer analysis and transformation – Both C2Rust and emmtrix perform static analyses; however, emmtrix’s pointer resolve transformation explicitly propagates pointer information across functions, introduces offset variables for pointer arithmetic, and duplicates functions when necessary. This reduces reliance on raw pointers and enables safer code.
Conclusion
The emmtrix C-to-Rust Transpiler is an emerging tool aimed at generating readable, maintainable Rust code from legacy C sources.
By combining fine-grained unsafe
blocks, pointer analysis, pragma-controlled transformations, and preservation of comments,
it seeks to overcome the shortcomings of existing C-to-Rust translators that produce unidiomatic and unsafe code.
Integration with continuous workflows and planned support for dual-language testing make it a practical choice for gradual migration of large codebases.
As memory safety becomes ever more critical, tools like emmtrix’s transpiler will play an important role in modernizing existing C projects while maintaining transparency and traceability.
In the long run, the goal is not only to assist migration but also to make the use of hybrid C/Rust projects in embedded and safety-critical domains more practical, by enabling continuous translation and preserving C code as a reference implementation.
See also
- Automatic C to Rust Translation – discussion of challenges and existing tools, including C2Rust.
- Pointer Resolve Transformation – describes how pointer analysis eliminates pointers and pointer arithmetic.