Hello,
In this title, when I write “programs”, I mean “OpenCL programs”. Since my last blog post, I took some days to enjoy the end of the school year, but I also thought about how to implement a Clang-based OpenCL C compiler.
After some days of coding, I have finally something that works : we can now compile an OpenCL C kernel and get LLVM IR. We cannot do anything with this IR except printing it on stderr, but it shows the infrastructure is already working.
The next days will see the implementation of kernels (exploring a program looking for __kernel functions), and maybe already a JIT.
I will be on holiday from July 3, but I will have two or three hours a day to work ,so work will continue to advance, even at a faster pace that during my exams.
To end this post, here is the testcase used to check that all goes right
1 2 3 4 5 6 7 8 9 10 11 12 | const char program_source[] =
"#define __global __attribute__((address_space(1)))\n"
"\n"
"__kernel void test(__global float *a, __global float *b, int n) {\n"
" int i;\n"
"\n"
" for (i=0; i<n; i++) {\n"
" a[i] = 3.1415926f * b[i] * b[i];\n"
" }\n"
"}\n";
program = clCreateProgramWithSource(ctx, 1, &src, 0, &result); result = clBuildProgram(program, 1, &device, "", 0, 0);
|
A “module->dump()” in the Clover source code sends to stderr what Clang produces (currently unoptimized) :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | ; ModuleID = 'program.cl'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"
define void @test(float addrspace(1)* %a, float addrspace(1)* %b, i32 %n) nounwind
{
entry:
%a.addr = alloca float addrspace(1)*, align 8
%b.addr = alloca float addrspace(1)*, align 8
%n.addr = alloca i32, align 4
%i = alloca i32, align 4
store float addrspace(1)* %a, float addrspace(1)** %a.addr, align 8
store float addrspace(1)* %b, float addrspace(1)** %b.addr, align 8
store i32 %n, i32* %n.addr, align 4
store i32 0, i32* %i, align 4
br label %for.cond
for.cond:
; preds = %for.inc, %entry
%tmp = load i32* %i, align 4
%tmp1 = load i32* %n.addr, align 4
%cmp = icmp slt i32 %tmp, %tmp1
br i1 %cmp, label %for.body, label %for.end
for.body:
; preds = %for.cond
%tmp2 = load i32* %i, align 4
%idxprom = sext i32 %tmp2 to i64
%tmp3 = load float addrspace(1)** %b.addr, align 8
%arrayidx = getelementptr inbounds float addrspace(1)* %tmp3, i64 %idxprom
%tmp4 = load float addrspace(1)* %arrayidx
%mul = fmul float 0x400921FB40000000, %tmp4
%tmp5 = load i32* %i, align 4
%idxprom6 = sext i32 %tmp5 to i64
%tmp7 = load float addrspace(1)** %b.addr, align 8
%arrayidx8 = getelementptr inbounds float addrspace(1)* %tmp7, i64 %idxprom6
%tmp9 = load float addrspace(1)* %arrayidx8
%mul10 = fmul float %mul, %tmp9
%tmp11 = load i32* %i, align 4
%idxprom12 = sext i32 %tmp11 to i64
%tmp13 = load float addrspace(1)** %a.addr, align 8
%arrayidx14 = getelementptr inbounds float addrspace(1)* %tmp13, i64 %idxprom12
store float %mul10, float addrspace(1)* %arrayidx14
br label %for.inc
for.inc:
; preds = %for.body
%tmp15 = load i32* %i, align 4
%inc = add nsw i32 %tmp15, 1
store i32 %inc, i32* %i, align 4
br label %for.cond
for.end:
; preds = %for.cond
ret void
}
!opencl.kernels = !{!0}
!0 = metadata !{void (float addrspace(1)*, float addrspace(1)*, i32)* @test}
|
You can see that Clang even lists OpenCL kernels in a special metadata entry (very useful). Another nicety is that I got rid of the hack Zack Rusin had to do : my source code is read directly from memory, not put in a on-disk file that is read back by Clang. It is easily done this way
1 2 3 4 5 6 7 | // Fake source name for debugging and reporting purpose
frontend_opts.Inputs.push_back(std::make_pair(clang::IK_OpenCL, "program.cl"));
// Say to Clang this file is not on disk but in memory (source : llvm::MemoryBuffer)
clang::PreprocessorOptions &prep_opts = p_compiler.getPreprocessorOpts();
prep_opts.addRemappedFile("program.cl", source);
|
It's easy and works well. Clang is also surprisingly fast at compiling kernels (without any standard lib or header, I admit). It takes 0.06 second on my single-core 1,66 Ghz Atom computer to compile the kernel I give in this blog post.
The code is already in Git, ready to be tested.