OpenCV vs. LibJacket: GPU Sobel Filtering

Update: Lib­Jacket has been renamed to  Array­Fire.

In response to a com­ment on a pre­vi­ous post about inte­grat­ing Lib­Jacket into an OpenCV project, below is just a sim­ple FYI per­for­mance com­par­i­son of OpenCV’s GPU Sobel fil­ter ver­sus Lib­Jacket’s conv2 con­vo­lu­tion fil­ter (with a sobel kernel)…

This is an evo­lu­tion­ary post, so be sure to scroll all the way down to see more comparisons…

Update (10/24/2011): Round 2

 

Sobel filter: OpenCV GPU vs. LibJacket

OpenCV GPU Sobel vs. Lib­Jacket conv2 (2D kernel)

 

Test sys­tem:
[via /proc/cpuinfo]:
Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
[via LibJacket’s ginfo()]:
Libjacket v1.0.1 (build dd66add) by AccelerEyes
CUDA Driver: 270.41.19
CUDA Toolkit: v4.0
CUDA capable devices detected:
GPU0 GeForce GTX 295, 896 MB, Compute 1.3 (single,double) (in use)
GPU1 GeForce GTX 295, 896 MB, Compute 1.3 (single,double)

 

Test pro­ce­dure:
Ran­dom matri­ces were gen­er­ated and used for test­ing. For every size, the same matrix(image) was used for each call. A “warm up” func­tion call was made, then the aver­age over 100 runs is reported. As of writ­ing this, the lat­est ver­sions of both libraries were used for comparison.

One note: I’m dis­abling LibJacket’s ‘dynamic caching’ by call­ing gsync() each loop, and with­out this call (i.e. in nor­mal code) the func­tions run even faster than above. 

Get the source code and see for your­self!
(put the folder in your /libjacket/examples/ directory)

 

The test is by no means an exten­sive bench­mark, but it does expose some hints about per­for­mance in gen­eral for the var­i­ous plat­forms. I may bench other func­tions when I get more time, but right now it looks like Lib­Jacket has got a few tricks up it’s sleeve!

 

Update 1:

To address the ques­tion in the com­ments about “con­volve()”, I ran another quick test. This one com­pares OpenCV’s GPU “con­volve()” method against LibJacket’s “conv2()” with vary­ing size Sobel fil­ter ker­nel sizes. Note: OpenCV’s filter2D() doesn’t sup­port float­ing point images as of cur­rent, so it is not con­sid­ered here.

Conv2: OpenCV GPU vs. LibJacket

OpenCV GPU con­volve vs. Lib­Jacket conv2

It’s inter­est­ing to see that OpenCV seems unaf­fected by ker­nel size, while LibJacket’s per­for­mance is quite depen­dent, and highly favor­ing smaller ker­nels.
(Lib­Jacket is not using sep­a­ra­ble ker­nels here either).

 

Update 2:
This seems to be an evolv­ing post, thanks to all the comments!

Up until now, the bench­marks of Lib­Jacket have been using 2D ker­nels. Since OpenCV’s Sobel fil­ter uses sep­a­ra­ble ker­nels, I re-ran the above bench­mark using the sep­a­ra­ble ker­nel ver­sion of LibJacket’s conv2() func­tion. The dot­ted line is the same Lib­Jacket 3x3 ker­nel as in the first chart, for ref­er­ence on the improve­ment of sep­a­ra­ble kernels.

 

Sobel filter: OpenCV GPU vs. LibJacket

OpenCV GPU Sobel vs. Lib­Jacket conv2 (separable)

 

I wish that I had more time to do a full fea­ture com­par­i­son for all over­lap­ping LibJacket/OpenCV func­tions, but alas, maybe another day… The source above is enough for any­one else out there to get started though!

 

Update 3:
As requested in the com­ments, here are the Fermi Tesla benchmarks.

Note: If the goal here was solely Sobel fil­ter­ing, then one would com­pare jkt::conv2 vs cv::gpu::Sobel (first fig­ure below). To gen­er­al­ize to any con­vo­lu­tion though (second/third fig­ures below), in Opencv-GPU, one must either use cv::gpu::filter2D or cv::gpu::convolve. Unfor­tu­nately, filter2D only works on uchar images, while con­volve works on any type; the com­mon data-type between Lib­Jacket and OpenCV float. Accord­ing to the com­ments, con­volve was designed for larger ker­nels, while their sobel stops at 16x16 (I exper­i­men­tally dis­cov­ered this). I would say for gen­eral float­ing point con­vo­lu­tions, jkt::conv2 vs cv::convolve is a fair comparison.

OpenCV GPU Sobel vs. Lib­Jacket conv2 (separable)

OpenCV GPU con­volve vs. Lib­Jacket conv2 — small kernels

OpenCV GPU con­volve vs. Lib­Jacket conv2 — larger kernels

 

 

16 thoughts on “OpenCV vs. LibJacket: GPU Sobel Filtering

  1. If OpenCV has conv2 also, may be you need to add that for com­par­i­son. May be they are using dif­fer­ent algo­rithms for sobel and convolution ?

  2. Pavan:
    Good point: OpenCV does have other fil­ter­ing meth­ods. Look­ing into this, I find that their “filter2D()” doesn’t sup­port float­ing point images, and their “con­volve()” func­tion is basi­cally LibJacket’s “conv2()”. A quick bench­mark of OpenCV’s GPU con­volve() looks like…

    size: 512x512
    cv-gpu: 2.27729
    jacket: 0.17907
    size: 1024x1024
    cv-gpu: 3.64561
    jacket: 0.61625
    size: 1536x1536
    cv-gpu: 13.6968
    jacket: 1.05545
    size: 2048x2048
    cv-gpu: 13.954
    jacket: 2.43504
    size: 2560x2560
    cv-gpu: 27.7197
    jacket: 3.28057
    size: 3072x3072
    cv-gpu: 28.0698
    jacket: 4.26894
    size: 3584x3584
    cv-gpu: 47.5459
    jacket: 5.65912

    ^and the above Jacket results do include the faster “convn()” tim­ings as well.

  3. mcclana­hoochie:
    Is sit­u­a­tion with con­volve() the same for big­ger ker­nel sizes? Say 5% or 10% of source image width.

  4. Dear mcclana­hoochie,

    Thank you for bench­mark­ing OpenCV library. We will add spe­cial­iza­tions for small Sobel ker­nels. We look for­ward your tests of other func­tions. Any help are welcome!!!

    OpenCV’s fil­ter engine con­sists of 3 lay­ers. At sim­plest and slow­est user can call cv::gpu::Sobel. For bet­ter per­for­mance, he should use Fil­ter Engine API. Our fault that we haven’t doc­u­mented this very well. But I guess even in this case Lib­Jacket will be a bit faster, because of uni­ver­sal­ity of OpenCV’s code. Any­way we will add spe­cial­iza­tions. Many thanks.

    Also a lit­tle note: cv::convolve is util­ity func­tion used in tem­plate match­ing. It is opti­mized for big pat­tern sizes like 100x100 or 250x250. It uses FFT inside and per­forms GPU buffers allo­ca­tions. It’s quite funny to use this func­tion for Sobel fil­ter­ing 3x3. Sounds like a nail­ing by exca­va­tor rather than a hum­mer :) ))

  5. Alexey:
    I’ve updated var­i­ous size “con­volve()” bench­marks, and as Ana­toly points out, it is indeed designed for larger ker­nel sizes.

    Ana­toly:
    Thanks for the clar­i­fi­ca­tion. I also noticed that OpenCV uses sep­a­ra­ble ker­nels for Sobel. When I get time next, I’ll try and re-do the Lib­Jacket bench­marks using sep­a­ra­ble ker­nels as well, for a more fair com­par­i­son (mean­ing Jacket will prob­a­bly be faster…).

  6. @pavan:
    I don’t doubt that Lib­Jacket uses FFT tech­niques for large ker­nel convolutions.

    @mcclanahoochie:
    Thanks for for the new chart. That’s very help­ful. BTW, do you plan to run the bench­mark on Fermi? Also I won­der if Lib­Jacket sup­ports some kind of extrap­o­la­tion, doesn’t it?

  7. Hi mcclana­hoochie. I’m try­ing to do one more com­par­i­son of OpenCV and Lib­Jacket libs. Here is the mod­i­fied ver­sion of your bench­mark: http://pastebin.com/W41RwPnu. I got strange results for ksz = 32, while results for ksz = 64 seem to be OK. I won­der if I use Lib­Jacket correctly.

    Results for ksz = 32
    ====================
    Lib­jacket v1.0.1 (build dd66add) by Accel­erEyes
    CUDA Dri­ver: 270.81
    CUDA Toolkit: v4.0

    CUDA capa­ble devices detected:
    GPU0 Tesla C2050 / C2070, 2652 MB, Com­pute 2.0 (single,double) (in use)
    size: 512x512
    cv-gpu: 0.00202607
    jacket: 0.00790995
    size: 1024x1024
    cv-gpu: 0.00767946
    jacket: 0.0321756
    size: 1536x1536
    cv-gpu: 0.00893694
    jacket: 0.0735304
    size: 2048x2048
    cv-gpu: 0.0171747
    jacket: 0.131005
    size: 2560x2560
    cv-gpu: 0.0177356
    jacket: 0.205862
    size: 3072x3072
    cv-gpu: 0.0278928
    jacket: 0.297097
    size: 3584x3584
    cv-gpu: 0.0268914
    jacket: 0.404806

    Results for ksz = 64
    ====================
    Lib­jacket v1.0.1 (build dd66add) by Accel­erEyes
    CUDA Dri­ver: 270.81
    CUDA Toolkit: v4.0

    CUDA capa­ble devices detected:
    GPU0 Tesla C2050 / C2070, 2652 MB, Com­pute 2.0 (single,double) (in use)
    size: 512x512
    cv-gpu: 0.0021476
    jacket: 0.00167873
    size: 1024x1024
    cv-gpu: 0.00802897
    jacket: 0.00602738
    size: 1536x1536
    cv-gpu: 0.00899348
    jacket: 0.00626257
    size: 2048x2048
    cv-gpu: 0.0173659
    jacket: 0.0232162
    size: 2560x2560
    cv-gpu: 0.0177426
    jacket: 0.0236838
    size: 3072x3072
    cv-gpu: 0.0264366
    jacket: 0.0242362
    size: 3584x3584
    cv-gpu: 0.0270701
    jacket: 0.0249083

  8. Ana­toly:
    Very inter­est­ing results, and good work on your per­for­mance improve­ment! I hope to dive deeper into your other ques­tion soon and have relayed the mes­sage to devel­op­ers at Accel­erEyes… Cheers.
    ~Chris

  9. Pingback: Filtering Benchmarks – OpenCV GPU vs LibJacket — GPU Software Blog

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>