OpenCL Test Tool
I’m blogging a little less frequently these days, mainly due to work keeping me busy, but also because my spare time is consumed. As any keen followers of my blog / social media activities may already know, I’ve been working hard on a machine learning tool-kit for Delphi.
The machine learning tool-kit that I am developing has a ‘pluggable’ implementation, allowing machine learning applications to be trained on local CPU or GPU hardware, or on remote devices across the network. For the GPU implementation (for desktop at least) I’m using OpenCL. This is another Khronos API (Khronos will be familiar to anyone that has used OpenGL or Vulkan API’s for graphics programming).
In modern graphics programming almost everything revolves around shaders. Shaders are small programs written in a C/C++ variant, which are compiled by the driver for the graphics hardware, and then perform computation on the graphics hardware it’s self. OpenCL works in a similar way, allowing you to provide small programs called ‘kernels’ as opposed to ‘shaders’ which use the graphics hardware to perform computationally intensive operations in parallel. My AMD based hardware for example, has 36 ‘compute units’, that’s effectively a 36-core processor for floating-point math operations. By compiling a kernel and uploading it to the graphics card, I can take advantage of this parallelism to accelerate the machine learning tool-kit.
Unfortunately, the compiler for OpenCL is provided as part of the graphics card vendors driver, which means that each vendor provides their own implementation of the compiler for building kernels. I recently had a problem arise when the Intel compiler behaved differently for one of my kernels than for the others. In short, a single line of code in my kernel source was failing to compile. The same line had been used in many other kernels both for AMD and Intel targets and compiled fine, but for this particular kernel it was failing. OpenCL feeds back warning and error information from the compiler through it’s API, but unfortunately in this case, no error log was being returned either. Instead, all I got was the dreaded “floating point overflow” exception. I could have understood this exception being raised when my kernel was executed on the graphics hardware, but that was not the case, instead I was getting this exception at compile time.
I’m not going to paste my kernel code here for now, but the line in question was attempting to assign the result of a multiplication of two values from source buffers to a target buffer. Effectively Target[idx] += SourceA[idx] * SourceB[idx];
Nothing about this line of code seemed to be an obvious problem, but as it turns out, the way that my indices for the buffers was calculated was a problem. You see, the Intel compiler was attempting to optimize the kernel code to compute the indices more efficiently. Why would the optimization be a problem? Well I am using an OpenCL version 1.x context to compile my kernels for backwards compatibility. Intel’s driver had the optimization enabled by default however, the optimization is only available for OpenCL version 2.x contexts! Essentially this is a bug in the compiler, it should have gracefully returned an error message detailing the broken optimization, or simply not attempt to perform the optimization silently.
What matters here is the process that I went through to identify the cause of the issue that I was having. I extracted my kernel code from my main application and pasted it into a separate application in order to remove any other code that might impact the results of testing. I then went line-by-line through my code, ensuring that my code was correctly handling error checks (which in many cases it was not, so I repaired it!). Still I was having issues, but the very last thing I came across was the parameter to provide options to the OpenCL compiler. I wondered what options might be available, which lead me to the documentation for the optimizations and finally I’d discovered the cause of the issue.
Wait wait, that separate application that I was using for debugging, well it looked like quite a handy tool. I’m certain I’ll use it again, and I figured it may also be useful to others. So I gave it a UI, slapped on an icon, packaged and uploaded it here:
CL Test is a simple UI application which gives you some information about the available hardware for OpenCL, and which can build your kernel code for you, with a dialog for passing options to the OpenCL compiler. It traps most exceptions and returns as much error information as I’m able to get from the API. So if you need a simple tool to test to see if your OpenCL kernels compile, it may be of use to you.
What CLTest does not currently do, is generate a binary file of the kernel, nor does it attempt to run the kernel. The later of these two features would require significant additional effort to add dialogs for multiple data-type parameters etc. So for now, it’s just a dumb compiler interface, but none the less, I hope you find it useful. If there is sufficient interest, I’ll consider enhancing it with execution and binary output features, let me know in comments!
For now though, enjoy a couple of screen shots.