Hello,
Another Clover milestone was hit today ! During my holidays (from July 4 to July 15), I worked on the OpenCL C compiler based on Clang that is used in Clover to compile OpenCL programs. I also implemented all the infrastructure to get and launch the kernels of a program.
All is in place and can be tested (but there are still some limitations). Clover is now able to fully execute functions like clSetKernelArg, clBuildProgram and the best one : clEnqueueNDRangeKernel.
This means that from now, my OpenCL implementation is very close from being complete API-wise (I've still a lot of work to do regarding the OpenCL C language and its standard set of functions).
For information, Clover is able to launch this program (found in tests/test_kernel.cpp) :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | float simple_function(float a)
{
return a * 2.5f;
}
__kernel void kernel1(__global float *a, __global float *b, float f)
{
size_t i = get_global_id(0);
a[i] = simple_function(f) * b[i];
}
__kernel void kernel2(__global int *buf)
{
size_t i = get_global_id(0);
buf[i] = 2 * i;
}
|
It recognizes that kernel1 and kernel2 are two different kernels (and that simple_function cannot be called as a kernel). The testcase launches kernel2 as it has only one argument (it's faster to test a simpler kernel, but I'll also add testcases for the other kernel), and checks that its result is ok.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | int buffer[64];
// Create a cl_mem buffer to give to the kernel
buf = clCreateBuffer(ctx, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
sizeof(buffer), buffer, &result);
// Fetch the list of the kernels in the program
result = clCreateKernelsInProgram(program, 2, kernels, 0);
// Set the argument of kernel2
result = clSetKernelArg(kernels[1], 0, sizeof(cl_mem), &buf);
// Execute the kernel
size_t global_size = sizeof(buffer) / sizeof(buffer[0]);
result = clEnqueueNDRangeKernel(queue, kernels[1], 1, 0, &global_size, 0, 0, 0, &event);
result = clWaitForEvents(1, &event);
|
The buffer contains after the kernel is run 64 ints : (0, 2, 4, 6, 8, ..., 128).
The implementation of all of this is a bit tricky. I invite you to take a look at the CGit, but the commits are fairly difficult to understand. I hope the code isn't too ugly, it works, I even find it somewhat nice at some places, but it's too early to be sure it's good.
To end this blog post, a summary of what I did during my holidays :
1 | 55 files changed, 5031 insertions(+), 1941 deletions(-)
|
The line numbers are not truth, but it shows I was able to work many hours a day during my holidays (and I also very enjoyed them). I'm happy :) . I promised to be able to work during my holidays, I kept my promise. I have now a month and a half to work on the builtin functions (only get_global_id is implemented, and there are plenty of other functions) and to finish the API (samplers and images aren't yet completely implemented).
Then I will have to write tests, tests and more tests.