Update: LibJacket has been renamed to ArrayFire.
In response to a comment on a previous post about integrating LibJacket into an OpenCV project, below is just a simple FYI performance comparison of OpenCV‘s GPU Sobel filter versus LibJacket‘s conv2 convolution filter (with a sobel kernel)…
This is an evolutionary post, so be sure to scroll all the way down to see more comparisons…
Update (10/24/2011): Round 2
Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
[via LibJacket’s ginfo()]:
Libjacket v1.0.1 (build dd66add) by AccelerEyes
CUDA Driver: 270.41.19
CUDA Toolkit: v4.0
CUDA capable devices detected:
GPU0 GeForce GTX 295, 896 MB, Compute 1.3 (single,double) (in use)
GPU1 GeForce GTX 295, 896 MB, Compute 1.3 (single,double)
Random matrices were generated and used for testing. For every size, the same matrix(image) was used for each call. A “warm up” function call was made, then the average over 100 runs is reported. As of writing this, the latest versions of both libraries were used for comparison.
One note: I’m disabling LibJacket’s ‘dynamic caching’ by calling gsync() each loop, and without this call (i.e. in normal code) the functions run even faster than above.
Get the source code and see for yourself!
(put the folder in your /libjacket/examples/ directory)
The test is by no means an extensive benchmark, but it does expose some hints about performance in general for the various platforms. I may bench other functions when I get more time, but right now it looks like LibJacket has got a few tricks up it’s sleeve!
To address the question in the comments about “convolve()”, I ran another quick test. This one compares OpenCV’s GPU “convolve()” method against LibJacket’s “conv2()” with varying size Sobel filter kernel sizes. Note: OpenCV’s filter2D() doesn’t support floating point images as of current, so it is not considered here.
It’s interesting to see that OpenCV seems unaffected by kernel size, while LibJacket’s performance is quite dependent, and highly favoring smaller kernels.
(LibJacket is not using separable kernels here either).
This seems to be an evolving post, thanks to all the comments!
Up until now, the benchmarks of LibJacket have been using 2D kernels. Since OpenCV’s Sobel filter uses separable kernels, I re-ran the above benchmark using the separable kernel version of LibJacket’s conv2() function. The dotted line is the same LibJacket 3×3 kernel as in the first chart, for reference on the improvement of separable kernels.
I wish that I had more time to do a full feature comparison for all overlapping LibJacket/OpenCV functions, but alas, maybe another day… The source above is enough for anyone else out there to get started though!
As requested in the comments, here are the Fermi Tesla benchmarks.
Note: If the goal here was solely Sobel filtering, then one would compare jkt::conv2 vs cv::gpu::Sobel (first figure below). To generalize to any convolution though (second/third figures below), in Opencv-GPU, one must either use cv::gpu::filter2D or cv::gpu::convolve. Unfortunately, filter2D only works on uchar images, while convolve works on any type; the common data-type between LibJacket and OpenCV float. According to the comments, convolve was designed for larger kernels, while their sobel stops at 16×16 (I experimentally discovered this). I would say for general floating point convolutions, jkt::conv2 vs cv::convolve is a fair comparison.
See also: OpenCV+ArrayFire