Automatic C to Rust Translation
The systems programming community has long grappled with security flaws from memory-unsafe languages like C. Rust offers memory and thread safety guarantees by enforcing strict ownership and lifetime rules at compile time. Migrating legacy C codebases to Rust could eliminate entire classes of vulnerabilities. However, manual rewrites of large C codebases in Rust are labor-intensive and error-prone. This has spurred development of automatic C-to-Rust translation tools. The goal is to automate conversion of C into Rust code that preserves the original program’s behavior while leveraging Rust’s safety features. In fact, the importance of this challenge is highlighted by DARPA’s Translating All C to Rust (TRACTOR) program, which explicitly aims to fully automate conversion of C into high-quality, idiomatic Rust code with memory safety guarantees. Achieving this is difficult due to fundamental differences between C and Rust (e.g. manual memory management vs. Rust’s ownership model, unrestricted pointers vs. Rust’s borrow checker). Nonetheless, several tools and research projects are actively tackling automatic C-to-Rust translation using a mix of compiler techniques and, more recently, AI. This article surveys the state-of-the-art tools – focusing on those actively maintained – and compares their approaches, features, fidelity, safety guarantees, and limitations.[1]
Challenges in Translating C to Rust
Automatically translating C into Rust is non-trivial because the languages have different paradigms and safety models. Pointers and memory management are central: C allows arbitrary pointer arithmetic, unchecked array accesses, and manual malloc/free
control, whereas Rust requires structured borrowing, prohibits data races, and enforces bounds-checking for safe references. A direct transliteration of C pointers into Rust will typically use Rust’s raw pointers and unsafe
blocks, which forgo Rust’s safety checks. This preserves semantics but yields Rust code that is as unsafe as the original C. The challenge is to infer higher-level safe abstractions (like Rust slices, references, or smart pointers) from low-level C code. Other tricky features include C’s unions, which have no direct safe equivalent in Rust (Rust’s unions are unsafe and lack tagging) and setjmp/longjmp or goto-based control flow, which do not map cleanly to Rust’s structured control constructs. Because C allows patterns that violate Rust’s safety rules (like aliasing a mutable buffer in multiple places), any automated tool must either (a) leave such code in unsafe
blocks, or (b) apply complex analyses or transformations to restructure the code for safety.
Another concern is handling undefined behavior (UB) in C. C code that invokes UB (like overflowing a signed integer, dereferencing an invalid pointer, or reading uninitialized memory) has no well-defined semantics. A straightforward translation may “inherit” these issues. For example, translating a potentially overflowing C arithmetic operation to Rust might use Rust’s normal arithmetic operator, which will panic on overflow in debug builds (a detectable failure) instead of exhibiting arbitrary behavior. In general, tools either assume the input C is well-defined (no UB), or they insert checks/conversions for certain UB cases, accepting that the Rust output may safely abort or diverge from C’s unpredictable behavior ([](https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf#:~:text=translations%20of%20shoco%20and%20urlparser,One%20example%20is%20that%20the)). These challenges require a combination of parsing technology, AST analysis, dataflow/alias analysis, and sometimes dynamic or heuristic methods to produce correct and safe Rust code.[2]
Semantics-Preserving Transpilers
Early and foundational work on C-to-Rust translation has focused on semantic preservation – generating Rust code that is as close as possible to the original C in behavior and structure. These tools prioritize correctness and completeness of translation, emitting Rust that may not be idiomatic but can be compiled and run to mirror the C program’s output (assuming no UB). Because Rust is a safer language, such translations typically mark the unsafe portions explicitly, resulting in a Rust program that compiles and runs, but largely bypasses Rust’s safety checks by using unsafe
constructs.
C2Rust
C2Rust is one of the most prominent C-to-Rust transpilers and is actively maintained (primarily by Immunant and Galois). It parses C source using Clang (supporting C99) and then programmatically generates corresponding Rust code. The design goal is to translate most C code into semantically equivalent Rust without manual intervention. C2Rust leverages Clang’s type information to produce Rust with matching types and function signatures, ensuring that the translated modules can link with code compiled from the original C (useful for incremental migration). Arbitrary control flow (including goto
and complex loops) is handled by using Emscripten’s Relooper algorithm to structure Rust if/while
constructs that emulate the original flow. For example, computed goto
or irreducible loops are transformed into state-machine like Rust code with explicit break
/continue
logic, since Rust has no direct goto
. [3][4]
C2Rust’s output is intentionally close to the C source. Pointers in C become raw pointers in Rust (mut T
or const T
), pointer arithmetic is translated using .offset()
or integer address calculations, and manual memory management calls (e.g. malloc
, free
) remain as calls to libc equivalents in Rust. The emphasis is on fidelity: the Rust code will behave the same as the C code, down to low-level details, provided the C program had defined behavior. As a result, the initial translated Rust is often littered with unsafe
blocks and can look “unidiomatic” or overly verbose. For instance, array indexing in C might be translated to pointer arithmetic in Rust (*p.offset(i as isize)
rather than a safe slice index) if the translator cannot prove it’s safe to use Rust slices. Indeed, early users noted that C2Rust’s output “just converts C to rather painful Rust” that mirrors pointer manipulations. This is a conscious design choice to avoid changing program semantics. C2Rust does not automatically convert C’s structs or pointer-heavy data structures into richer Rust types; they remain as struct definitions and pointer fields in Rust, preserving memory layout so that external binary interfaces (ABI) remain compatible.[2][5]
From an implementation standpoint, C2Rust originally built Rust ASTs using librustc
internals, but it has since been refactored to use the stable syn
library for Rust code generation ([https://immunant.com/blog/2022/06/back/#:~:text=and%20compatibility.%20c2rust%20used%20,bunch%20of%20unrelated%20code%20churn
C2Rust is Back :: Immunant, Inc
C2Rust is Back :: Immunant, Inc
]). This decoupling from Rust compiler internals has made C2Rust easier to maintain and install. The tool operates via a command-line that takes a compile commands database (e.g. from CMake or Bear) and processes each C file into a corresponding Rust file. It supports transpiling large projects module-by-module. Not every C construct is handled (for example, C99 variable-length arrays and certain GCC extensions may not be fully supported), but known unsupported features are explicitly warned or skipped. C2Rust assumes the input is portable C99; it runs the C preprocessor and thus the translation is platform-specific (it will embed definitions as seen on the host platform, which means the Rust output might be specific to the host OS/architecture). [4]
Undefined behavior handling: C2Rust largely carries over the original logic into Rust unsafe
code without adding safety checks, so memory errors in C would still be possible in the Rust output (now confined to unsafe
blocks). For certain UB cases that Rust cannot represent directly, C2Rust has to make a choice. For instance, C allows reading uninitialized values (UB), whereas Rust forbids using an uninitialized variable. C2Rust’s strategy in such cases is typically to initialize variables to a default when necessary or use constructs like std::mem::MaybeUninit
to represent them, to ensure the Rust compiles. For integer overflow, as noted, it doesn’t insert any special wrapping: it relies on Rust’s standard overflow semantics (two’s complement wrap in release builds, panic in debug). If the C code relies on overflow behavior, this could introduce a divergence (but since relying on signed overflow is UB in C, C2Rust chooses a reasonable Rust default rather than trying to exactly mimic the undefined). Overall, C2Rust’s priority is that the Rust program compiles and reproduces the C program’s functionality for all well-defined cases. Memory safety is not improved in this first-stage translation – the output Rust is essentially an “unsafe Rust” codebase. To help evolve this into idiomatic safe Rust, C2Rust was designed with subsequent refactoring in mind. The project provides (or in recent versions, plans to provide) additional tools to gradually refactor the transpiled code into more idiomatic and safe Rust by applying automated transformations. For example, there were c2rust-refactor
tools for doing things like renaming, or converting certain pointer patterns to references, though an “exciting new approach” to automate deeper unsafe-to-safe rewrites is under development ([https://immunant.com/blog/2022/06/back/#:~:text=tl%3Bdr%3A%20c2rust%20,are%20eager%20for%20any%20feedback [2][3]
C2Rust is Back :: Immunant, Inc
]) ([https://immunant.com/blog/2022/06/back/#:~:text=of%20,internals%2C%20it%20wasn%E2%80%99t
C2Rust is Back :: Immunant, Inc
]).
Corrode
Corrode is another early C-to-Rust translator, developed by Jamey Sharp in 2016-2017, written in Haskell. It uses the language-c Haskell library as a C parser and similarly attempts to emit equivalent Rust code. Corrode was an inspiration for C2Rust – in fact, C2Rust’s authors acknowledge borrowing ideas from Corrode’s design. The overarching philosophy of Corrode is to preserve the original program’s behavior, API, and structure as much as possible. Like C2Rust, it outputs Rust code intended to act as a drop-in replacement for the C, not changing data representations or logic. For example, Corrode will translate a C global array or pointer arithmetic in a way that keeps the same memory layout and computations in Rust. Its maintainers explicitly note that if the input program is free of undefined and implementation-defined behavior, the Rust output should behave exactly the same. In cases of C undefined behavior, Corrode makes a best-effort guess at a sensible Rust translation. A concrete example given in Corrode’s documentation is how it handles overflowing arithmetic: in C, signed overflow is UB, but Corrode translates it to Rust’s +
operator on integers, which will yield panics in debug mode (or two’s-complement wrapping in release mode). This choice means the Rust program might crash in debug builds when the C would have overflowed silently, but it avoids introducing silent wrong results—favoring a predictable failure over undefined behavior.[4][2]
Corrode’s output, much like C2Rust’s, is initially unsafe Rust. It uses raw pointers and unsafe
for operations that Rust cannot verify. However, Corrode did make some attempts at slightly more idiomatic constructs in simple cases. For instance, Corrode can recognize certain C loop patterns. A common pattern is a for
loop incrementing an index; Corrode will translate a C for(int i=0; i<i_max; ++i)
into a Rust for i in 0..i_max
range loop. This is possible when the loop fits a clear Rust iterator pattern. Such transformations improve readability without altering semantics. Another example is converting some uses of C’s NULL
to Rust’s Option
type when context allows (though this is limited). Despite these niceties, Corrode’s general output still contains many unsafe pointers. Like C2Rust, it doesn’t magically introduce Rust’s ownership tracking where none existed in C. Its focus is a correct translation that compiles, leaving deeper safety improvements to be done later (likely by a human). Corrode also strives to maintain C ABI compatibility, meaning one could replace one C file of a project with Corrode’s Rust output and link it with the remaining C. This constrained Corrode to avoid changing data layouts or function signatures.[5]
It should be noted that as of late 2010s, Corrode is no longer actively developed. Its author announced in 2018 that he would deprecate Corrode in favor of C2Rust, acknowledging that C2Rust’s more extensive engineering effort made it a better foundation going forward. Corrode’s last updates were around 2017, and while the repository is still available, it may not be compatible with the latest Rust compilers or handle newer C features. Nonetheless, Corrode is historically important as a pioneering tool, and its design principles (preserve behavior, keep code maintainable, allow gradual adoption) have influenced subsequent projects.[6]
Safety-Oriented Translation Tools
While direct transpilers like C2Rust produce a Rust version of the C code, they do not immediately yield the safe, idiomatic Rust that human developers would write. Recognizing this, a number of tools and research efforts have focused on translating C to safer Rust, going beyond rote syntax conversion to attempt enforcement of Rust’s safety guarantees. These tools typically start from code that is equivalent to the C (often leveraging output from C2Rust or a similar baseline), and then apply analyses or transformations to refactor unsafe
constructs into safe ones (like replacing raw pointers with references, or introducing Rust types that eliminate unchecked operations). This process is challenging because it requires reasoning about aliasing and lifetimes in the C code. Below, we discuss some representative safety-oriented tools that are under active development or study.
Laertes (Ownership Inference Post-Translation)
Laertes is a tool introduced by Emre et al. (OOPSLA 2021) that aims to lift C2Rust-generated code into safer Rust by inferring ownership and lifetimes for pointers. The input to Laertes is essentially the output of C2Rust (or an equivalent unsafe Rust translation). Laertes then tries to automatically convert as many raw pointers as possible into Rust references (&
or &mut
) which the Rust borrow checker can verify. The core algorithm is an iterative, compiler-guided inference: Laertes optimistically assumes all pointers can be safely transformed into references and assigns them provisional lifetime parameters. It then attempts to compile the program. Where the Rust compiler throws errors (due to violations of borrowing rules or type mismatches), Laertes analyzes the error and “backs off” that assumption for the offending pointers. For example, if two pointers were actually aliasing in a way that Rust’s rules disallow for references, the compiler would complain, and Laertes would revert those pointers to raw pointers (unsafe) in the next iteration. Under the hood, Laertes uses a combination of pointer analyses (Steensgaard’s inclusion-based analysis and Andersen’s subset-based points-to analysis) to propagate constraints and determine which pointers must remain unsafe. It effectively treats the Rust compiler’s borrow checker as an oracle: any pointer that can be converted to a reference without compiler errors is assumed to be safe to do so.[7]
Laertes can automatically introduce lifetime annotations in function signatures and struct definitions so that converted references have proper lifetimes. It also distinguishes owning pointers vs. borrowing pointers as part of its inference (e.g., a pointer that is only ever freed in one place might be turned into a Rust Box
or an owned value). However, Laertes is conservative in what it attempts. It completely ignores any pointer involved in patterns that inherently require unsafe code, such as pointer arithmetic, union use, or casting to/from integers. Only pointers that are “unsafe solely because C lacks explicit lifetime/ownership” are candidates. In their study, this turned out to be a relatively small subset – about 11% of raw pointers on average could be handled by Laertes’s approach. The rest still needed to stay as raw pointers due to other complicating factors (like global variables, complex aliasing, etc.). In other words, Laertes could make a portion of the C2Rust output safe, but most of the code remained untouched. Subsequent research (Emre et al., OOPSLA 2023) examined the limits of this approach, finding that even if one pre-processes the code to remove other causes of unsafety, inferring correct lifetimes and ownership for all pointers is extremely difficult. Nonetheless, Laertes demonstrated that an automated tool can safely convert a non-trivial fraction of raw pointers to references, eliminating those memory-unsafe pieces from the code. The advantage of Laertes’ method is that it guarantees preservation of behavior (it only changes pointers where the Rust compiler confirms no borrowing rule is violated) – so any pointer it converts to a reference is, by construction, not involved in unsafe aliasing. This means the translated program remains functionally equivalent to the original C (assuming the original had no UB) but now enjoys some enforced safety. The limitation is simply that many pointers cannot be transformed with this approach, so manual effort or more powerful analyses are needed for those.[7]
Crown (Static Ownership Analysis)
Crown (Zhang et al., 2023) is another tool that automates conversion of C pointers to Rust references, but it takes a different approach from Laertes. Instead of relying on trial-and-error with the compiler, Crown uses a more static, ownership-oriented analysis to determine which pointers can be safely replaced by references. According to its description, Crown performs a whole-program ownership inference that can handle a larger set of pointers than Laertes, which was restricted by needing a clean compile for each guess. Essentially, Crown analyzes the C code (or the unsafe Rust) to identify data that have a single owner or no concurrent aliases, allowing those to become Box
(owned heap allocations) or &mut
(unique mutable references) in Rust. By tracking the flow of pointers through assignments and function calls, it can sometimes determine, for example, that a pointer is only ever used in one context and is not aliased elsewhere, qualifying it for safe transformation. The Crown paper notes that it “employs ownership analysis to facilitate the replacement of a larger number of pointers compared to Laertes”. In contrast to Laertes’s iterative feedback loop, Crown’s approach is more direct: it computes a model of which pointers are exclusive, which are shared read-only, etc., and then rewrites the code accordingly in one go. [8]
Crown is able to convert not only simple function-local pointers but also more complex cases, thanks to its analysis of how pointers are passed and returned (ensuring the callee doesn’t stash a reference in a global or somewhere escaping the caller’s scope, for instance). Any pointer conversions Crown performs are vetted by compiling the resulting program – the output must still pass Rust’s borrow checker, meaning Crown’s static analysis must be sound. If Crown is too optimistic, the Rust code won’t compile, so presumably the implementation either avoids unsound replacements or would need to catch errors and adjust (the details in literature suggest Crown was successful on their benchmarks). In addition, Crown addresses some issues outside the scope of Laertes. For example, Concrat (Hong and Ryu, 2023) is a complementary tool cited alongside Crown that specifically targets replacing certain C library calls with safer Rust equivalents (such as replacing C’s pthread_mutex
and lock functions with Rust’s std::sync::Mutex
API). This indicates the broader effort to not only transform pointers but also other unsafe patterns. Crown and related techniques are currently research prototypes and are not as plug-and-play as C2Rust. They often require the code to be translated by C2Rust first, then a separate phase to apply the safe refactor. Nevertheless, Crown represents the direction of using deeper program analyses to push the boundary of what can be automatically made safe, moving closer to the TRACTOR program’s goal of human-quality Rust output. By using static ownership reasoning, Crown can handle cases that would stump a purely local or trial-based approach – for instance, inferring that a struct and all its pointers are encapsulated in one module and can be given lifetimes that make the whole module safe.[8]
CRustS (Semantics-Relaxed Source Rewriting)
CRustS (Ling et al., 2022) takes yet another approach: it applies a series of source-to-source transformation rules to the C2Rust output (unsafe Rust) with the aim of significantly increasing the proportion of code that is safe, even if it means slightly relaxing the requirement of full semantic preservation. The philosophy here is that if we allow minor changes to program behavior (preferably in cases that don’t affect intended functionality), we can achieve much safer Rust code automatically. CRustS is implemented using TXL (a source transformation language) and defines 220 rewrite rules that pattern-match on Rust code and transform it. Of these, 198 rules are strictly semantics-preserving (similar to what Corrode or Laertes would do – only reorganizing code without changing its meaning), and 22 are semantics-approximating. The approximating transformations deliberately make trade-offs that might alter corner-case behavior but remove unsafe
constructs. For example, one approximating rule might convert a raw pointer used for array access into a safe slice, filling in a default value for out-of-bounds indices rather than exactly mimicking a buffer overrun (which is UB anyway). Another might replace an unsafe union with an enum that covers common cases but might not preserve exact bit-level behavior for unsupported union variants. The idea is to eliminate as many unsafety causes as possible, then let the Rust compiler’s checks and tests ensure the program still behaves correctly for expected inputs.[9]
CRustS reports impressive improvements in safety: applying these rules to C2Rust-translated code yielded a significantly higher ratio of safe code, even reaching function-level safe code ratios comparable to idiomatic Rust projects. In comparison to Laertes, which could only handle a small subset of pointers, CRustS claims a much higher conversion rate (their paper notes that on Laertes’s own benchmarks, CRustS achieved a higher safe-pointer ratio). The cost, of course, is that some transformations are not guaranteed semantics-preserving in all cases. For instance, CRustS might assume that certain global variables can be made immutable or that certain pointer casts are never meant to alias incompatible types – assumptions that a human would validate but the tool can only guess. Because of this, CRustS is presented as a “demo” or research prototype – it’s a proof that by bending the rules, one can get far more safe code automatically. In practice, one would likely use CRustS’s output as a starting point and then run test suites or formal verification to ensure nothing critical broke. The CRustS pipeline is essentially: run C2Rust to get an unsafe Rust version, then run CRustS (which is a separate tool) to refactor that code. Installation instructions show it builds on top of C2Rust’s output. This two-phase approach underscores that CRustS doesn’t parse C directly; it relies on Rust code input. This is reasonable since TXL rules are applied to the Rust abstract syntax tree or source.[9]
Examples of CRustS transformations (deduced from its description and related work) likely include: converting pointer arithmetic into indexed slice accesses when possible (with bounds checks), turning C-style string manipulation with char
into usage of Rust String
or &str
where lengths are known, and simplifying macro-expanded code. It may also introduce safe wrappers: e.g., if a C function returns a pointer that must be freed by the caller, CRustS could change it to return a Box<T>
(owned pointer) in Rust, thus transferring ownership and making the deallocation deterministic (this might be a semantics-preserving change if done carefully, or semantics-relaxed if the C code sometimes leaked or reused that pointer in unusual ways). The end result of CRustS is a Rust codebase that has far fewer unsafe
blocks. Its authors measured that a majority of functions became completely safe Rust after transformation (95% of functions in their sample had no unsafe code inside, up from a much smaller percentage originally). This is a huge step towards the goal of automatic full migration. However, a caveat noted by other researchers is that CRustS’s safety might be shallow* – e.g., making each function body safe doesn’t guarantee the entire program is free of logical memory errors if some unsafe operations were hidden behind foreign calls or relaxed rules. In any case, CRustS is a valuable approach exploring the frontier between strict semantic equivalence and practical safety gains.[7]
Other Notable Efforts
Beyond the tools above, there are other specialized efforts. Concrat (2023) was mentioned earlier – it focuses on replacing C concurrency primitives with Rust’s standard library equivalents using dataflow analysis. This addresses a niche (but important) aspect: translating C’s pthreads usage into Rust’s thread and mutex types safely. Another area of focus has been C’s union types. One research work titled “To Tag or Not to Tag” (2023) examines how to automatically translate C unions into Rust in a safer way by introducing tagged enums or on-demand initialization of union fields. Unions are tricky because a C union allows reinterpretation of memory in incompatible ways. Tools like C2Rust leave unions as Rust union
(which still requires unsafe usage when accessing), but experimental transforms can wrap union accesses in safe enums or generate accessors that maintain a tag to track the active union variant. These kinds of targeted solutions could eventually be composed into larger translation pipelines.[8]
LLM-Assisted Translation Approaches
In recent years, the emergence of powerful Large Language Models (LLMs) for code (such as OpenAI’s Codex/GPT or similar) has opened a new frontier for automatic code translation, including C to Rust. LLMs can be prompted with C code and asked to produce Rust code, and often they will generate more idiomatic Rust by drawing on high-level patterns rather than doing a one-to-one translation. For example, an AI might translate a C buffer + length pair into a Rust slice, or use high-level library functions in Rust to replace low-level C loops. This is promising in terms of producing human-like Rust code, but LLM-based translation has its own challenges. Chief among them: the LLM might produce code that looks plausible but is semantically incorrect or fails to compile. Unlike a deterministic compiler-based tool, an LLM might miss subtle aspects of the C semantics, especially for larger functions or when complex pointer manipulation is involved. Additionally, LLMs have context length limits, making it hard to directly feed large codebases.
To harness the strengths of both worlds, some projects combine traditional analysis with LLMs – a neuro-symbolic approach. C2SaferRust (Nitin et al., 2023) is one such approach that uses C2Rust and then uses an LLM to improve the safety of the code. In C2SaferRust’s pipeline, first C2Rust generates the baseline unsafe Rust. Then the code is automatically sliced into small, independent chunks that an LLM (GPT-4 in their case) can handle. Each chunk (for example, a single function or a group of related functions) is given to the LLM with instructions to produce a safer Rust version of that code. The LLM can thereby suggest, say, using a Rust Vec<T>
instead of a raw array pointer + length, or using Option
for possibly-null returns, etc., effectively performing a context-aware refactoring. After getting the LLM’s suggestions for all parts, C2SaferRust reassembles the program and runs its test suite to verify correctness. This last step is crucial – it compensates for the fact that the LLM might introduce mistakes. If a test fails, the process could be iterated or the problematic section flagged for manual review. On a benchmark of real C projects, C2SaferRust was able to automatically reduce the number of raw pointer usages by up to 38% and overall unsafe
code by up to 28%, while all tests still passed. This demonstrates measurable progress: the resulting code is closer to idiomatic Rust (significantly fewer unsafe blocks), and it still behaves correctly on the tested functionality. C2SaferRust also outperformed some prior techniques like Laertes on those benchmarks, showing the benefit of letting an LLM propose creative refactorings that pure static tools might not attempt.[10]
Other LLM-based tools include Flourine and Vert (both discussed in a 2024 study). Flourine uses a test-driven repair strategy: it generates two versions of Rust code for a given C input – one via a safe-but-idiomatic guess from an LLM, and one via a more literal translation (e.g., through an intermediate WebAssembly representation to ensure correctness) – then it fuzz-tests the outputs to find behavioral differences and tries to automatically fix any discrepancies. The idea is to combine the reliability of a baseline translation with the elegance of an LLM translation. Vert, on the other hand, emphasizes breaking the input program into vertically sliced components that an LLM can handle piecewise, attempting to ensure that dependencies are managed when merging the LLM-generated parts. Despite these sophisticated strategies, the results so far show that LLM-based translation still struggles with large or complex code. One report noted that less than 20% of C programs over 150 lines could be satisfactorily translated by an LLM-based method without manual intervention[11]. Indeed, experiments with Flourine and Vert found that when functions were translated independently, it led to inconsistencies that had to be reconciled, and translating whole programs in one prompt often exceeded LLM capacity or produced non-compiling output.[12][13][14]
That said, LLMs have shown they can handle many syntactic or even somewhat semantic conversions gracefully on smaller snippets. They excel at producing idiomatic constructs: for example, using Rust’s Iterator
APIs instead of C-style loops, or leveraging pattern matching and Result
types for error handling. These high-level refactorings are something that pure static tools won’t do unless explicitly programmed for each pattern. As LLMs improve and as techniques like guided prompting, automated verification, and iterative refinement advance, we can expect LLM-assisted translation to become more reliable. It’s likely that the future of automatic C to Rust translation will involve a hybrid: using static analysis to break down and formally understand the source, and using LLMs or other AI to suggest the more complex transformations that require semantic understanding or creative restructuring, all under the validation of tests or formal checks.[1]
Comparative Analysis of Tools
The landscape of C-to-Rust translators can be compared along several axes: fidelity vs. idiomaticity, safety guarantees, automation scope, and maturity.
- Translation Fidelity: Tools like C2Rust and Corrode prioritize fidelity. They retain the exact behavior and even the structure of the C code in the generated Rust (making only minimal adjustments needed for Rust syntax). This means they have very high success in translating arbitrary C code (including low-level tricks) into compilable Rust. However, the Rust they produce is often verbose and non-idiomatic – essentially a C program expressed in Rust syntax. At the other end, LLM-based approaches and aggressive refactoring tools sacrifice some fidelity for readability and safety. For instance, an AI might refactor a C loop into a Rust iterator, which is more idiomatic but could conceivably alter performance characteristics or behavior in edge cases. Tools like Laertes and Crown try to walk the line: they keep the overall structure from C2Rust, but on the subset of code they can handle, they introduce Rust references (which is a semantic change – e.g., if in C two aliased pointers were written through, translating both to
&mut
would be incorrect and is avoided). These tools ensure fidelity by only making changes that Rust’s compiler or analyses can guarantee are safe (thus presumed semantically valid).[5][2]
- Safety and Memory Guarantees: The primary benefit of translating C to Rust is potential memory safety, but not all tools achieve this equally. C2Rust’s direct output provides no additional memory safety beyond the original C – it compiles under
unsafe
and the programmer must manually inspect or refactor to get safety. Corrode is similar in this regard. In contrast, Laertes and Crown explicitly aim to increase memory safety by converting pointers to safe references wherever possible. The result is a partially safe program: some parts are protected by Rust’s compile-time checks (no data races or use-after-frees on those portions), while other parts remainunsafe
. CRustS pushes further, attempting to maximize the safe portion by even altering some semantics if necessary. Its output can be predominantly safe Rust code (with runtime checks like bounds checking in place of what were raw pointer operations) – a big step in guarantees, although a developer may need to verify that the semantics deviations are acceptable. LLM-based translations often target completely safe Rust (nounsafe
at all), because they try to use high-level Rust features. When they succeed, the code has all of Rust’s guarantees (e.g., any out-of-bounds access would be caught as a panic, not a memory violation). However, the caveat is that if an LLM is unsure how to handle some low-level operation, it might omit necessary functionality or mis-handle it, leading to an incomplete translation that might not even compile. In a controlled setting with tests, the hope is to get the best of both: safe Rust that passes all the same tests as the C. Notably, when Rust translations enforce safety, they inherently change how formerly undefined behaviors manifest – rather than silent memory corruption, you might get a Rust panic or error for things like buffer overflow or null dereference ([]). This is usually considered a desirable outcome (failing fast instead of continuing with corruption).[2][8][7][9]
- Idiomatic Quality and Maintainability: From a maintainability standpoint, the closer the output is to how a human would have written it, the better. On this front, basic transpilers score low – they tend to produce code that, while correct, is hard for Rust programmers to work with or modify. Variables might have awkward names or extra casts, and everything is wrapped in unsafe. Tools that do even modest refactoring help a lot: e.g., Corrode’s loop translation or C2Rust’s upcoming refactoring successor can turn clunky patterns into cleaner Rust forms. The most idiomatic results currently come from AI-based translations or very high-level refactor rules. For example, an LLM might recognize a C idiom for error handling (set an integer status and jump to cleanup) and translate it into Rust using the
?
operator andResult
type – a very idiomatic construct. Such transformations are beyond the current static tools’ pattern matching (which tends to be local). RustMap (2025) is a research prototype that attempts project-scale migration by analyzing the C code for patterns that can be globally replaced by Rust equivalents, such as turning C macros into Rustconst
or functions, or identifying groups of functions that operate on a data structure and converting that whole data structure into a Rust struct with methods. This yields more idiomatic designs rather than line-by-line translation. Across the board, there is a trade-off: the more idiomatic the Rust, the more the tool must understand the code’s intent, which is hard to do automatically. Thus, the most idiomatic translations may require AI and carry a risk of misinterpretation.[5][15]
- Tool Maturity and Ecosystem: C2Rust is fairly mature and has been used on real-world projects; it is open-source (BSD-licensed), with an active community (over 4k stars on GitHub) and recent releases (as of 2025). It supports integration into build systems via compile command databases, making it practical for large codebases. Corrode, conversely, is discontinued, and while one can still use it on small code, it may not handle newer C or Rust versions well. The safety-oriented tools like Laertes and CRustS are research prototypes – they have published artifacts or code (Laertes’s artifact and dataset were released for evaluation, and CRustS is available as a Cargo package for demo purposes), but they are not one-click solutions and may require expertise to apply. We can consider them as early explorations likely to be incorporated into future pipelines. LLM-based tools currently are mostly in academic or experimental stages; they are not broadly available as off-the-shelf products, partly because of the cost and complexity (LLM APIs, managing prompts, etc.). However, one can already experiment with LLMs via services (for instance, there are online converters using GPT-4 that attempt C to Rust conversion, though their reliability varies). The DARPA TRACTOR program is likely to spawn more integrated tools in the next few years that combine these techniques.[4][9][16]
In summary, no single tool yet completely automates a perfect translation of C to fully safe and idiomatic Rust for large codebases. C2Rust provides a strong foundation with complete and correct (but unsafe) translations. On top of that foundation, tools like Laertes, Crown, and CRustS layer increasingly powerful analyses to convert chunks to safe code. Meanwhile, AI-driven approaches offer glimpses of near-human-like translations, excelling in small scopes but struggling to scale. The current best practice for migrating C to Rust might be to use a transpiler (like C2Rust) to do the heavy lifting of rote translation, and then incrementally apply automated refactors (where possible) and manual improvements for critical sections – essentially a human-in-the-loop approach. The ongoing research and development aim to reduce the amount of human effort needed by making the automated steps smarter and more comprehensive.
Handling of Undefined Behavior and Edge Cases
A critical aspect of translation is how tools deal with undefined behavior (UB) and low-level C quirks. As mentioned, most transpilers assume that the C code is reasonably well-behaved; they don’t attempt to make an incorrect C program safer or correct. If the C code has latent bugs (like buffer overflows), the translated Rust (with unsafe
code) will have the same bug. However, when translation tools attempt to produce safe Rust, they inherently must address UB because safe Rust cannot express certain dangerous operations without a check. One straightforward example is buffer overflows: C might let you increment a pointer past the end of an array (UB if dereferenced). If a tool converts that array and pointer into a Rust slice, any out-of-bounds access will cause a runtime panic in Rust. Thus, the behavior diverges: the C might have continued (possibly corrupting memory or misbehaving later), whereas the Rust will immediately abort on a bounds check failure. Researchers categorize this as an expected difference – the Rust translation “eliminates potentially unsafe behavior” and replaces it with a controlled failure ([](https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf#:~:text=translations%20of%20shoco%20and%20urlparser,One%20example%20is%20that%20the)). In a sense, the Rust version is safer by design: it won’t blindly continue after an out-of-bounds, thereby avoiding exploitation, though it might not match an ill-defined C outcome (there is no meaningful “equivalent” to a wild memory write in safe Rust except termination).
Similarly, for null pointers: C can have a null pointer that gets dereferenced (UB). A safe Rust translation might use Option<&T>
and explicitly handle the null case (e.g., return an error or panic). This changes the control flow (the Rust might error out where the C would crash unpredictably). Tools like Laertes and Crown, when converting pointers to references, must ensure that those pointers were never null – otherwise the Rust code would introduce a new crash. They likely conservatively leave potentially-null pointers as raw mut T
(still unsafe) unless they can prove null cannot happen. CRustS’s approach of semantics relaxation might, for instance, initialize pointers that are observed to be null to a dummy object to avoid crashes – but that could mask a bug. The trade-off is complex: do we preserve a bug or guard against it? Most automatic tools lean toward guarding*, on the principle that avoiding UB is the goal (especially since the exact UB behavior can’t be preserved anyway).
Integer behaviors are another subtle area. C’s integer types can overflow silently (except when using compiler sanitizers). Rust’s safe arithmetic will panic on overflow in debug builds and wrap in release. Neither is “the same” as C’s UB (which could do anything, but on typical two’s complement hardware effectively wraps). Tools generally choose Rust’s native arithmetic, which means a translation is effectively giving defined semantics to what was undefined. Empirical studies using fuzzing have shown that Rust translations often exhibit different behavior from C when it comes to things like I/O or arithmetic edge cases, precisely because Rust checks things C doesn’t. For example, if a C program reads an invalid UTF-8 byte sequence into a char
and later prints it, in C this might just output garbled text or pass through bytes. A direct Rust translation using std::string::String
might refuse to encode those bytes (Rust String
requires UTF-8), causing a runtime error or loss of data. One study noted differences in how tools handle I/O encoding – some Rust translations assume UTF-8 and thus diverge when given non-UTF-8 input ([](https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf#:~:text=Differences%20in%20I%2FO,provide%20convenient%20ways%20to%20work)). These are not memory safety* issues per se, but they show how higher-level assumptions can creep in.[17]
Setjmp/longjmp and other non-local jumps are typically not supported by Rust at all. C2Rust and others usually cannot translate a longjmp
in any straightforward way; they might either reject such code or transform it into a Rust panic!
/catch_unwind
(which is not exactly the same, and is unsafe FFI with C). Similarly, inline assembly in C is often left as inline assembly in Rust (which is unstable and unsafe
), or the tool will bail out if it can’t handle it.
In summary, automatic translators try to stay correct on defined behavior and usually either skip or embed as unsafe anything truly undefined. When aiming for safety, they effectively turn some UB into Rust runtime errors or compiler errors. This is generally acceptable since one goal is to use Rust’s safety to catch those issues. The key is that when evaluating these tools, one should not expect bit-for-bit identical behavior in scenarios that were undefined in C; rather, we expect that if the C program was intended to be correct, the Rust will be correct on the same inputs. When UB is unavoidable, the Rust will likely either refuse to compile (requiring human attention) or will contain safety abstractions that prevent the UB (possibly changing the program’s failure mode). Tools under active development are increasingly looking to integrate dynamic analysis (like running test cases or fuzzing) as part of the translation workflow to detect behavioral deviations caused by these issues and correct them.[18]
Conclusion and Future Outlook
Automatic C-to-Rust translation has evolved from simple transpilers that emit line-by-line unsafe Rust into a multi-faceted field combining compiler theory, program analysis, and machine learning. Currently, if one needs to port a C project to Rust, a typical pipeline might involve using C2Rust to get an initial Rust codebase that is functionally correct but rife with unsafe
, then employing a mix of automated refactoring tools (like the experimental Laertes or CRustS) and manual rewrites to improve safety and idiomatic style. This incremental approach is necessary because fully automated translation at the level of a skilled human programmer is still an open problem. The active research (funded by initiatives like DARPA’s TRACTOR) indicates optimism that the gap can be closed. Future translators will likely integrate the static and dynamic techniques: for example, a tool could perform sophisticated alias analysis to convert most pointers to safe references (like Crown), use an LLM to rewrite low-level C patterns into high-level Rust (like using iterators or Rust libraries), and then validate the result with exhaustive testing or model checking. Each of those components is being prototyped today.[1]
One can imagine a near-future tool that, given a C codebase, produces a Rust crate where memory safety issues are largely fixed, perhaps with annotations or reports for the developer for any remaining unsafe
bits that could not be resolved automatically. The developer can then focus their effort only on the truly tricky parts (for example, a complex bit-casting union or an exotic macro). In essence, the goal is to handle the “mundane 90%” of the translation automatically and safely, and leave the 10% of corner cases for human expertise – or further learning by the tool as these corner patterns repeat across projects.
It’s also worth noting that the presence of a correct automatic translation tool could influence how people write C (knowing it will be machine-converted) or how they write Rust (perhaps in a way that is easier to generate from C). Such co-evolution has precedent (as seen with automatic refactoring tools influencing coding standards).
In conclusion, automatic C to Rust translation has made significant strides: C2Rust provides a reliable base for transpiling C99 code to Rust, Corrode demonstrated the feasibility and influenced later work, and newer tools like Laertes, Crown, and CRustS show that a sizable portion of unsafe
can be eliminated through clever analysis. Meanwhile, Flourine, Vert, and C2SaferRust illustrate the power of AI and fuzzing to make translated code more idiomatic and safe. The field is moving quickly, and collaborations between academia and industry (such as the open-source efforts on C2Rust and the research from multiple universities) are bringing us closer to the day where legacy C code can be automatically “uplifted” into safe Rust. That promises not only to reduce bugs but also to extend the life of critical software by modernizing it for today’s secure development standards. Each tool today contributes a piece of the puzzle – from preserving exact semantics to inferring lifetimes or suggesting Rust-specific idioms – and combined, they are paving the way toward fully automated, trustworthy translation from C to Rust. [3][4][7][9][19]
References: The content above cites information from official tool documentation, academic papers, and reputable sources, including the C2Rust project site, the Corrode GitHub README, research papers on Laertes and related techniques, the CRustS demonstration summary, and studies on LLM-based translation effectiveness. These references provide further details for readers interested in the technical specifics of each tool and approach.[3][2][7][8][9][20]
- ↑ 1.0 1.1 1.2 Translating All C to Rust https://www.darpa.mil/program/translating-all-c-to-rust
- ↑ 2.0 2.1 2.2 2.3 2.4 2.5 2.6 GitHub - jameysharp/corrode: C to Rust translator https://github.com/jameysharp/corrode
- ↑ 3.0 3.1 3.2 3.3 C2Rust Demonstration https://c2rust.com/
- ↑ 4.0 4.1 4.2 4.3 4.4 GitHub - immunant/c2rust: Migrate C code to Rust https://github.com/immunant/c2rust
- ↑ 5.0 5.1 5.2 5.3 C to Rust translator - community - The Rust Programming Language Forum https://users.rust-lang.org/t/c-to-rust-translator/55381
- ↑ c2rust vs Corrode https://jamey.thesharps.us/2018/06/30/c2rust-vs-corrode/
- ↑ 7.0 7.1 7.2 7.3 7.4 7.5 Aliasing Limits on Translating C to Safe Rust https://dl.acm.org/doi/pdf/10.1145/3586046
- ↑ 8.0 8.1 8.2 8.3 8.4 To Tag, or Not to Tag: Translating C’s Unions to Rust’s Tagged Unions https://arxiv.org/html/2408.11418v2
- ↑ 9.0 9.1 9.2 9.3 9.4 9.5 CRustS — Rust application // Lib.rs https://lib.rs/crates/crusts
- ↑ C2SaferRust: Transforming C Projects into Safer Rust with NeuroSymbolic Techniques https://arxiv.org/html/2501.14257v1
- ↑ ](https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf#:~:text=Flourine%2C%20reports%20that%20with%20current,and%20we%20have%20similar%20findings)). The necessity to split programs into small chunks is a major hurdle – if a function
f
callsg
, the LLM ideally needs to see both to produce consistent Rust, otherwise it might generate conflicting versions ofg
when given different contexts ([https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf#:~:text=LLM,suggest%20safe%20Rust%20data%20types https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf - ↑ ]) ([https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf#:~:text=in%20the%20works%20of%20Flourine,benchmarks%2C%20the%20function%20main%20depends https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf
- ↑ ]) ([https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf#:~:text=Flourine%20or%20Vert%20can%20produce,The%20two%20components%20are https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf
- ↑ ]) ([https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf#:~:text=translations%20of%20shoco%20and%20urlparser,One%20example%20is%20that%20the https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf
- ↑ RustMap: Towards Project-Scale C-to-Rust Migration via Program Analysis and LLM https://arxiv.org/html/2503.17741v1
- ↑ Online C to Rust Converter - CodeConvert AI https://www.codeconvert.ai/c-to-rust-converter
- ↑ ](https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf#:~:text=Differences%20in%20I%2FO,provide%20convenient%20ways%20to%20work)) ([https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf#:~:text=analysis%20,if%20any%2C%20enable%20a%20successful https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf
- ↑ ]) ([https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf#:~:text=APIs%2C%20and%20coding%20conventions%20to,4%2C%20claude3 https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf
- ↑ ]) ([https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf#:~:text=Flourine%2C%20reports%20that%20with%20current,and%20we%20have%20similar%20findings C2SaferRust: Transforming C Projects into Safer Rust with NeuroSymbolic Techniques https://arxiv.org/html/2501.14257v1
- ↑ ]) ([ https://www.ndss-symposium.org/wp-content/uploads/2025-1407-paper.pdf