Compile and launch simple OpenCL C kernels with Clover

Published on ven 15 juillet 2011 in Clover, (Comments)

Hello,

Another Clover milestone was hit today ! During my holidays (from July 4 to July 15), I worked on the OpenCL C compiler based on Clang that is used in Clover to compile OpenCL programs. I also implemented all the infrastructure to get and launch the kernels of a program.

All is in place and can be tested (but there are still some limitations). Clover is now able to fully execute functions like clSetKernelArg, clBuildProgram and the best one : clEnqueueNDRangeKernel.

This means that from now, my OpenCL implementation is very close from being complete API-wise (I've still a lot of work to do regarding the OpenCL C language and its standard set of functions).

For information, Clover is able to launch this program (found in tests/test_kernel.cpp) :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
float simple_function(float a)
{
    return a * 2.5f;
}

__kernel void kernel1(__global float *a, __global float *b, float f)
{
    size_t i = get_global_id(0);
    a[i] = simple_function(f) * b[i];
}

__kernel void kernel2(__global int *buf)
{
    size_t i = get_global_id(0);
    buf[i] = 2 * i;
}

It recognizes that kernel1 and kernel2 are two different kernels (and that simple_function cannot be called as a kernel). The testcase launches kernel2 as it has only one argument (it's faster to test a simpler kernel, but I'll also add testcases for the other kernel), and checks that its result is ok.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
int buffer[64];

// Create a cl_mem buffer to give to the kernel
buf = clCreateBuffer(ctx, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
                     sizeof(buffer), buffer, &result);

// Fetch the list of the kernels in the program
result = clCreateKernelsInProgram(program, 2, kernels, 0);

// Set the argument of kernel2
result = clSetKernelArg(kernels[1], 0, sizeof(cl_mem), &buf);

// Execute the kernel
size_t global_size = sizeof(buffer) / sizeof(buffer[0]);
result = clEnqueueNDRangeKernel(queue, kernels[1], 1, 0, &global_size, 0, 0, 0, &event);
result = clWaitForEvents(1, &event);

The buffer contains after the kernel is run 64 ints : (0, 2, 4, 6, 8, ..., 128).

The implementation of all of this is a bit tricky. I invite you to take a look at the CGit, but the commits are fairly difficult to understand. I hope the code isn't too ugly, it works, I even find it somewhat nice at some places, but it's too early to be sure it's good.

To end this blog post, a summary of what I did during my holidays :

1
55 files changed, 5031 insertions(+), 1941 deletions(-)

The line numbers are not truth, but it shows I was able to work many hours a day during my holidays (and I also very enjoyed them). I'm happy :) . I promised to be able to work during my holidays, I kept my promise. I have now a month and a half to work on the builtin functions (only get_global_id is implemented, and there are plenty of other functions) and to finish the API (samplers and images aren't yet completely implemented).

Then I will have to write tests, tests and more tests.

« When LLVM is slow   A fix and an optimization »