Loop Unrolling

From emmtrix Wiki
Jump to navigation Jump to search

Loop unrolling is an optimization technique that reduces the number of iterations in a loop by expanding its body to process multiple elements per iteration. This transformation decreases loop overhead, improves execution efficiency, and can enhance opportunities for parallelization. By reducing control flow instructions, loop unrolling minimizes branching and increases instruction-level parallelism, making it particularly useful for performance-critical applications. While it can lead to larger code size, the trade-off often results in significant runtime improvements.

Loop Unrolling Transformation in emmtrix Studio

emmtrix Studio implements loop unrolling using #pragma directives or via the GUI. Unrolling will reduce the iteration count and increase the body of the loop, processing statements from multiple iteration steps in a single iteration.

Typical Usage and Benefits

Loop unrolling is used to reduce the overhead of the loops and to exploit parallelization on coarser parts.

Example

/* The following code tests loop unroll transformation applied to a for loop: */   

int main(void) {
    int i;
    int a[4];
    #pragma EMX_TRANSFORMATION LoopUnroll { ”unrollfactor”: 4}
    for (i = 0; i < 4; i++) {
        a[i] = i;
    }
    return 0;
}
/* The generated code includes all four iterations of the loop transformed into four separate statements.
 * The loop unrolling is full and the loop is removed.
 */

int main(void) {
    int i;
    int a[4];
    i = 0;
    {
        a[i] = i;
    } {
        a[i + 1 * 1] = i + 1 * 1;
    } {
        a[i + 1 * 2] = i + 1 * 2;
    } {
        a[i + 1 * 3] = i + 1 * 3;
    }
    return 0;
}

Parameters

Following parameters can be set (each description is followed by keyword in pragma-syntax and default value):

Id Default Value Description
unrollfactor max_unrollfactor Unroll factor - divide iteration count & multiply iterating variable. If equal to total number of iterations, loop-construct will be removed from code. If not integer divisor of total number of iterations, additional loop

processing last iterations will be added