OpenCV vs. LibJacket: GPU Sobel Filtering

January 22, 2022September 24, 2011 by mcclanahoochie

Update: LibJacket has been renamed to ArrayFire.

In response to a comment on a previous post about integrating LibJacket into an OpenCV project, below is just a simple FYI performance comparison of OpenCV‘s GPU Sobel filter versus LibJacket‘s conv2 convolution filter (with a sobel kernel)…

This is an evolutionary post, so be sure to scroll all the way down to see more comparisons…

Update (10/24/2011): Round 2

Sobel filter: OpenCV GPU vs. LibJacket — OpenCV GPU Sobel vs. LibJacket conv2 (2D kernel)

Test system:
[via /proc/cpuinfo]:
Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
[via LibJacket’s ginfo()]:
Libjacket v1.0.1 (build dd66add) by AccelerEyes CUDA Driver: 270.41.19 CUDA Toolkit: v4.0 CUDA capable devices detected: GPU0 GeForce GTX 295, 896 MB, Compute 1.3 (single,double) (in use) GPU1 GeForce GTX 295, 896 MB, Compute 1.3 (single,double)

Test procedure:
Random matrices were generated and used for testing. For every size, the same matrix(image) was used for each call. A “warm up” function call was made, then the average over 100 runs is reported. As of writing this, the latest versions of both libraries were used for comparison.

One note: I’m disabling LibJacket’s ‘dynamic caching’ by calling gsync() each loop, and without this call (i.e. in normal code) the functions run even faster than above.

Get the source code and see for yourself!
(put the folder in your /libjacket/examples/ directory)

The test is by no means an extensive benchmark, but it does expose some hints about performance in general for the various platforms. I may bench other functions when I get more time, but right now it looks like LibJacket has got a few tricks up it’s sleeve!

Update 1:

To address the question in the comments about “convolve()”, I ran another quick test. This one compares OpenCV’s GPU “convolve()” method against LibJacket’s “conv2()” with varying size Sobel filter kernel sizes. Note: OpenCV’s filter2D() doesn’t support floating point images as of current, so it is not considered here.

Conv2: OpenCV GPU vs. LibJacket — OpenCV GPU convolve vs. LibJacket conv2

It’s interesting to see that OpenCV seems unaffected by kernel size, while LibJacket’s performance is quite dependent, and highly favoring smaller kernels.
(LibJacket is not using separable kernels here either).

Update 2:
This seems to be an evolving post, thanks to all the comments!

Up until now, the benchmarks of LibJacket have been using 2D kernels. Since OpenCV’s Sobel filter uses separable kernels, I re-ran the above benchmark using the separable kernel version of LibJacket’s conv2() function. The dotted line is the same LibJacket 3×3 kernel as in the first chart, for reference on the improvement of separable kernels.

I wish that I had more time to do a full feature comparison for all overlapping LibJacket/OpenCV functions, but alas, maybe another day… The source above is enough for anyone else out there to get started though!

Update 3:
As requested in the comments, here are the Fermi Tesla benchmarks.

Note: If the goal here was solely Sobel filtering, then one would compare jkt::conv2 vs cv::gpu::Sobel (first figure below). To generalize to any convolution though (second/third figures below), in Opencv-GPU, one must either use cv::gpu::filter2D or cv::gpu::convolve. Unfortunately, filter2D only works on uchar images, while convolve works on any type; the common data-type between LibJacket and OpenCV float. According to the comments, convolve was designed for larger kernels, while their sobel stops at 16×16 (I experimentally discovered this). I would say for general floating point convolutions, jkt::conv2 vs cv::convolve is a fair comparison.

Screenshot-f2-1 — OpenCV GPU Sobel vs. LibJacket conv2 (separable)

Screenshot-f2-0 — OpenCV GPU convolve vs. LibJacket conv2 – small kernels

Screenshot-f2-2 — OpenCV GPU convolve vs. LibJacket conv2 – larger kernels

See also: OpenCV+ArrayFire

18 thoughts on “OpenCV vs. LibJacket: GPU Sobel Filtering”

Pavan

September 25, 2011 at 12:25 pm

If OpenCV has conv2 also, may be you need to add that for comparison. May be they are using different algorithms for sobel and convolution ?
Pavan

September 25, 2011 at 12:27 pm

Also, for a 3×3 kernel, Libjacket is faster if you send in the host side data.
mcclanahoochie

September 25, 2011 at 1:32 pm

Pavan:
Good point: OpenCV does have other filtering methods. Looking into this, I find that their “filter2D()” doesn’t support floating point images, and their “convolve()” function is basically LibJacket’s “conv2()”. A quick benchmark of OpenCV’s GPU convolve() looks like…

size: 512x512 cv-gpu: 2.27729 jacket: 0.17907 size: 1024x1024 cv-gpu: 3.64561 jacket: 0.61625 size: 1536x1536 cv-gpu: 13.6968 jacket: 1.05545 size: 2048x2048 cv-gpu: 13.954 jacket: 2.43504 size: 2560x2560 cv-gpu: 27.7197 jacket: 3.28057 size: 3072x3072 cv-gpu: 28.0698 jacket: 4.26894 size: 3584x3584 cv-gpu: 47.5459 jacket: 5.65912
^and the above Jacket results do include the faster “convn()” timings as well.
Alexey

September 26, 2011 at 7:47 am

mcclanahoochie:
Is situation with convolve() the same for bigger kernel sizes? Say 5% or 10% of source image width.
anatoly

September 26, 2011 at 9:18 am

Dear mcclanahoochie,

Thank you for benchmarking OpenCV library. We will add specializations for small Sobel kernels. We look forward your tests of other functions. Any help are welcome!!!

OpenCV’s filter engine consists of 3 layers. At simplest and slowest user can call cv::gpu::Sobel. For better performance, he should use Filter Engine API. Our fault that we haven’t documented this very well. But I guess even in this case LibJacket will be a bit faster, because of universality of OpenCV’s code. Anyway we will add specializations. Many thanks.

Also a little note: cv::convolve is utility function used in template matching. It is optimized for big pattern sizes like 100×100 or 250×250. It uses FFT inside and performs GPU buffers allocations. It’s quite funny to use this function for Sobel filtering 3×3. Sounds like a nailing by excavator rather than a hummer :)))
mcclanahoochie

September 26, 2011 at 9:50 am

Alexey:
I’ve updated various size “convolve()” benchmarks, and as Anatoly points out, it is indeed designed for larger kernel sizes.

Anatoly:
Thanks for the clarification. I also noticed that OpenCV uses separable kernels for Sobel. When I get time next, I’ll try and re-do the LibJacket benchmarks using separable kernels as well, for a more fair comparison (meaning Jacket will probably be faster…).
pavan

September 26, 2011 at 4:15 pm

Anatoly, we use FFT based convolutions at large sizes too 🙂
pavan

September 26, 2011 at 4:18 pm

Chris,

Just to piss you off a little more, Can you try using the separable kernels http://en.wikipedia.org/wiki/Sobel_operator#Technical_details ? This is much faster inside libjacket for small kernels (< 20×20)
pavan

September 26, 2011 at 4:19 pm

And then just for the kicks, convolve with random kernels 🙂
mcclanahoochie

September 26, 2011 at 4:58 pm

Pavan:
Updated! (for separable kernels)
…and probably won’t have much time to do too many more benchmarks until the weekend comes again
anatoly

September 27, 2011 at 5:23 am

@pavan:
I don’t doubt that LibJacket uses FFT techniques for large kernel convolutions.

@mcclanahoochie:
Thanks for for the new chart. That’s very helpful. BTW, do you plan to run the benchmark on Fermi? Also I wonder if LibJacket supports some kind of extrapolation, doesn’t it?
mcclanahoochie

September 27, 2011 at 9:39 am

Anatoly:
Glad to hear that! I’ll see if I can get a hold of a Fermi card before the weekend, so check back then! As far as extrapolation, I assume you’re referring to how the border is handled… in that case, LibJacket (currently) supports filtering with/without zero-padded edges: . Thanks for the interest!
Alexey

October 14, 2011 at 5:38 am

Hi mcclanahoochie. I’m trying to do one more comparison of OpenCV and LibJacket libs. Here is the modified version of your benchmark: http://pastebin.com/W41RwPnu. I got strange results for ksz = 32, while results for ksz = 64 seem to be OK. I wonder if I use LibJacket correctly.

Results for ksz = 32
====================
Libjacket v1.0.1 (build dd66add) by AccelerEyes
CUDA Driver: 270.81
CUDA Toolkit: v4.0

CUDA capable devices detected:
GPU0 Tesla C2050 / C2070, 2652 MB, Compute 2.0 (single,double) (in use)
size: 512×512
cv-gpu: 0.00202607
jacket: 0.00790995
size: 1024×1024
cv-gpu: 0.00767946
jacket: 0.0321756
size: 1536×1536
cv-gpu: 0.00893694
jacket: 0.0735304
size: 2048×2048
cv-gpu: 0.0171747
jacket: 0.131005
size: 2560×2560
cv-gpu: 0.0177356
jacket: 0.205862
size: 3072×3072
cv-gpu: 0.0278928
jacket: 0.297097
size: 3584×3584
cv-gpu: 0.0268914
jacket: 0.404806

Results for ksz = 64
====================
Libjacket v1.0.1 (build dd66add) by AccelerEyes
CUDA Driver: 270.81
CUDA Toolkit: v4.0

CUDA capable devices detected:
GPU0 Tesla C2050 / C2070, 2652 MB, Compute 2.0 (single,double) (in use)
size: 512×512
cv-gpu: 0.0021476
jacket: 0.00167873
size: 1024×1024
cv-gpu: 0.00802897
jacket: 0.00602738
size: 1536×1536
cv-gpu: 0.00899348
jacket: 0.00626257
size: 2048×2048
cv-gpu: 0.0173659
jacket: 0.0232162
size: 2560×2560
cv-gpu: 0.0177426
jacket: 0.0236838
size: 3072×3072
cv-gpu: 0.0264366
jacket: 0.0242362
size: 3584×3584
cv-gpu: 0.0270701
jacket: 0.0249083
Anatoly

October 18, 2011 at 5:18 pm

We updated our code and rebenchmarked it.
http://opencv-gpu.blogspot.com/2011/10/opencv-vs-libjacket.html
mcclanahoochie

October 18, 2011 at 6:03 pm

Anatoly:
Very interesting results, and good work on your performance improvement! I hope to dive deeper into your other question soon and have relayed the message to developers at AccelerEyes… Cheers.
~Chris
Pingback: Filtering Benchmarks – OpenCV GPU vs LibJacket — GPU Software Blog
Pingback: Filtering Benchmarks – OpenCV GPU vs LibJacket | ArrayFire
Pingback: GPU Convolutions: OpenCV GPU and LibJacket – Part 2 | mcclanahoochie's blog

Update: LibJacket has been renamed to ArrayFire.

18 thoughts on “OpenCV vs. LibJacket: GPU Sobel Filtering”

Leave a Comment