C++ to C Test Strategy

From emmtrix Wiki
Jump to navigation Jump to search

The complexity of translating code from one programming language to another is a challenging task, especially when dealing with languages as complex as C++ and C. To ensure the utmost accuracy and reliability in this translation process, a robust testing strategy is paramount. In this article we look at the detailed testing strategy of our C++ to C compiler, focusing on correctness to pave the way for future qualifications such as ISO26262 or DO 178C.

C++ to C Test Strategy

The test strategy involved the following four steps:

  1. Code Translation The initial phase of our test strategy involves the direct translation of C++ code to C. This step is crucial as it sets the foundation for all subsequent tests and optimizations. The compiler is designed to handle various C++ features and intricacies, ensuring a comprehensive translation that retains the original program's logic and functionality.
  2. Compilation to LLVM IR Post translation, both the original C++ code and the translated C code undergo compilation to LLVM Intermediate Representation (IR). LLVM IR is a low-level programming language similar to assembly, which provides a common ground for comparing the two code bases. This step is vital as it ensures that any differences observed in the later stages of testing are solely due to the translation process and not the compilation.
  3. Optimizations and Transformations Once in LLVM IR form, both code bases are subjected to a series of optimizations and transformations. These processes are implemented to make the IRs comparable, smoothing out any discrepancies that do not affect the program's correctness. It is important to note that these optimizations are focused on ensuring equivalence and correctness, not on enhancing performance.
  4. Comparison Using llvm-diff: The final and most crucial step in our test strategy is the comparison of the two LLVM IRs using the llvm-diff utility. This utility is designed to highlight even the smallest differences between the two IRs, including variations in memory access orders. By employing such a meticulous comparison tool, we ensure that any discrepancies are caught and addressed, guaranteeing the translated C code's correctness in relation to the original C++ code.

The test strategy outlined above is meticulously crafted to ensure the correctness of our C++ to C compiler. By focusing on a thorough translation, comprehensive compilation, detailed optimizations, and a rigorous comparison process, we ensure that every aspect of the C++ code is accurately reflected in the C code. This strategy not only aids in catching and rectifying errors in the translation process but also sets the stage for future qualifications under standards like ISO26262 or DO 178C, ensuring the reliability and safety of the translated code in critical applications.

Example

The table below presents a typical example of how our testing strategy is working. For a given C++ input, it shows both an incorrect and a correct conversion to C, along with the corresponding LLVM IR for each code snippet. The test case is a reduced test case that was derived from a real-world application. It outlines that C and C++ have different requirements for the = operator. Since C++17, the following requirement in C++ does not exist in C:

In every simple assignment expression E1 = E2 and every compound assignment expression E1 @= E2, every value computation and side effect of E2 is sequenced before every value computation and side effect of E1

In this example, it requires that the function calls of the right side must be executed before the function call of the left side. In C, the execution order of the function calls is undefined, and clang performs the opposite order.

C++ Incorrect C Code Correct C Code
Input
struct a {
  int x;
};

a *b1();
a *b2();

void b4() { 
  *b1() = *b2();
}
struct /*a*/_ZTS1a {
    int x;
};

struct /*a*/_ZTS1a */*b1*/_Z2b1v(void);
struct /*a*/_ZTS1a */*b2*/_Z2b2v(void);

void /*b4*/_Z2b4v(void) {
    *_Z2b2v() = *_Z2b1v();
} 
struct /*a*/_ZTS1a {
    int x;
};

struct /*a*/_ZTS1a */*b1*/_Z2b1v(void);
struct /*a*/_ZTS1a */*b2*/_Z2b2v(void);

void /*b4*/_Z2b4v(void) {
    const struct /*a*/_ZTS1a *tmpEvalOrder1 = &*/*b2*/_Z2b2v();
    struct /*a*/_ZTS1a *tmpEvalOrder2 = &*/*b1*/_Z2b1v();
    (*tmpEvalOrder2 = *tmpEvalOrder1);
}
LLVM IR
define dso_local void @_Z2b4v() #0 {
entry:
  %call = call noundef %struct.a* @_Z2b2v() #3
  %call1 = call noundef %struct.a* @_Z2b1v() #3
  %0 = bitcast %struct.a* %call1 to i8*
  %1 = bitcast %struct.a* %call to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %0, i8* align 4 %1, i64 4, i1 false), !tbaa.struct !8
  ret void
}
define dso_local void @_Z2b4v() #0 {
entry:
  %call = call %struct._ZTS1a* @_Z2b1v() #3
  %call1 = call %struct._ZTS1a* @_Z2b2v() #3
  %0 = bitcast %struct._ZTS1a* %call to i8*
  %1 = bitcast %struct._ZTS1a* %call1 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %0, i8* align 4 %1, i64 4, i1 false), !tbaa.struct !8
  ret void
}
define dso_local void @_Z2b4v() #0 {
entry:
  %tmpEvalOrder1 = alloca %struct._ZTS1a*, align 8
  %tmpEvalOrder2 = alloca %struct._ZTS1a*, align 8
  %0 = bitcast %struct._ZTS1a** %tmpEvalOrder1 to i8*
  call void @llvm.lifetime.start.p0i8(i64 8, i8* %0) #4
  %call = call %struct._ZTS1a* @_Z2b2v() #5
  store %struct._ZTS1a* %call, %struct._ZTS1a** %tmpEvalOrder1, align 8, !tbaa !8
  %1 = bitcast %struct._ZTS1a** %tmpEvalOrder2 to i8*
  call void @llvm.lifetime.start.p0i8(i64 8, i8* %1) #4
  %call1 = call %struct._ZTS1a* @_Z2b1v() #5
  store %struct._ZTS1a* %call1, %struct._ZTS1a** %tmpEvalOrder2, align 8, !tbaa !8
  %2 = load %struct._ZTS1a*, %struct._ZTS1a** %tmpEvalOrder2, align 8, !tbaa !8
  %3 = load %struct._ZTS1a*, %struct._ZTS1a** %tmpEvalOrder1, align 8, !tbaa !8
  %4 = bitcast %struct._ZTS1a* %2 to i8*
  %5 = bitcast %struct._ZTS1a* %3 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %4, i8* align 4 %5, i64 4, i1 false), !tbaa.struct !12
  %6 = bitcast %struct._ZTS1a** %tmpEvalOrder2 to i8*
  call void @llvm.lifetime.end.p0i8(i64 8, i8* %6) #4
  %7 = bitcast %struct._ZTS1a** %tmpEvalOrder1 to i8*
  call void @llvm.lifetime.end.p0i8(i64 8, i8* %7) #4
  ret void
}
Optimized

LLVM IR

define dso_local void @_Z2b4v() local_unnamed_addr #1 {
entry:
  %call = call %structclass.a* @_Z2b2v()
  %call1 = call %structclass.a* @_Z2b1v()
  %0 = bitcast %structclass.a* %call1 to i32*
  %1 = bitcast %structclass.a* %call to i32*
  %2 = load i32, i32* %1, align 4
  store i32 %2, i32* %0, align 4
  ret void
}
define dso_local void @_Z2b4v() local_unnamed_addr #1 {
entry:
  %call = call %structclass._ZTS1a* @_Z2b1v()
  %call1 = call %structclass._ZTS1a* @_Z2b2v()
  %0 = bitcast %structclass._ZTS1a* %call to i32*
  %1 = bitcast %structclass._ZTS1a* %call1 to i32*
  %2 = load i32, i32* %1, align 4
  store i32 %2, i32* %0, align 4
  ret void
}
define dso_local void @_Z2b4v() local_unnamed_addr #1 {
entry:
  %call = call %structclass._ZTS1a* @_Z2b2v()
  %call1 = call %structclass._ZTS1a* @_Z2b1v()
  %0 = bitcast %structclass._ZTS1a* %call1 to i32*
  %1 = bitcast %structclass._ZTS1a* %call to i32*
  %2 = load i32, i32* %1, align 4
  store i32 %2, i32* %0, align 4
  ret void
}
Comparison FAILED because function calls _Z2b1v and _Z2b2v are in different order SUCCEED