{"id":1876,"date":"2011-10-24T23:42:00","date_gmt":"2011-10-25T06:42:00","guid":{"rendered":"http:\/\/mcclanahoochie.com\/blog\/?p=1876"},"modified":"2015-04-26T20:43:37","modified_gmt":"2015-04-27T03:43:37","slug":"gpu-convolution-opencv-gpu-and-libjacket-part-2","status":"publish","type":"post","link":"https:\/\/mcclanahoochie.com\/blog\/2011\/10\/gpu-convolution-opencv-gpu-and-libjacket-part-2\/","title":{"rendered":"GPU Convolutions: OpenCV GPU and LibJacket &#8211; Part 2"},"content":{"rendered":"<p>This is a response to my earlier post <a href=\"http:\/\/mcclanahoochie.com\/blog\/2011\/09\/opencv-vs-libjacket-gpu-sobel-filtering\/\" target=\"_blank\">comparing OpenCV&#8217;s gpu::convolve() and LibJacket&#8217;s jkt::conv2()<\/a> convolution functions, at various image and kernel sizes.<\/p>\n<p>That post generated a lot of traffic, most notably from the <a href=\"http:\/\/mcclanahoochie.com\/blog\/2011\/10\/gpu-convolution-opencv-gpu-and-libjacket-part-2\/\" target=\"_blank\">OpenCV developer<\/a> community. Taking note of this, it seems that the folks at Willow Garage have re-vamped their GPU convolutions and posted their own set of <a href=\"http:\/\/opencv-gpu.blogspot.com\/2011\/10\/opencv-vs-libjacket.html\" target=\"_blank\">benchmarks<\/a> using their updated routines.<\/p>\n<p>While the benchmarks I ran highlighted some performance issues in OpenCV &#8211; which the maintainers have now fixed, <em>their<\/em>\u00a0benchmarks exposed a weak spot in LibJacket&#8217;s convolutions &#8211; which AccelerEyes have now addressed.<\/p>\n<p>Now, I bring yet another set of benchmarks (along with <a href=\"https:\/\/code.google.com\/p\/mcclanahoochie\/source\/browse\/cuda\/versus\/versus.cpp\" target=\"_blank\">updated code<\/a>) to show <em>the current state of mutual improvements<\/em> for 2D image convolutions in both libraries.<\/p>\n<figure id=\"attachment_1878\" aria-describedby=\"caption-attachment-1878\" style=\"width: 635px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/10\/Screenshot-f2-2075.png\"><img data-recalc-dims=\"1\" decoding=\"async\" data-attachment-id=\"1878\" data-permalink=\"https:\/\/mcclanahoochie.com\/blog\/2011\/10\/gpu-convolution-opencv-gpu-and-libjacket-part-2\/screenshot-f2-2075\/#main\" data-orig-file=\"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/10\/Screenshot-f2-2075.png?fit=645%2C841&amp;ssl=1\" data-orig-size=\"645,841\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;}\" data-image-title=\"Screenshot-f2-2075\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/10\/Screenshot-f2-2075.png?fit=645%2C841&amp;ssl=1\" class=\"size-full wp-image-1878\" title=\"Screenshot-f2-2075\" src=\"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/10\/Screenshot-f2-2075.png?resize=645%2C841\" alt=\".\" width=\"645\" height=\"841\" srcset=\"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/10\/Screenshot-f2-2075.png?w=645&amp;ssl=1 645w, https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/10\/Screenshot-f2-2075.png?resize=230%2C300&amp;ssl=1 230w\" sizes=\"(max-width: 645px) 100vw, 645px\" \/><\/a><figcaption id=\"caption-attachment-1878\" class=\"wp-caption-text\">.<\/figcaption><\/figure>\n<p>Test system (same as before):<br \/>\n<code><br \/>\nIntel(R) Core(TM) i7 CPU 920 @ 2.67GHz<br \/>\nOpenCV svn: r6902<br \/>\nLibJacket 1.1 (build 767c147)<br \/>\nGPU0 GeForce GTX 295, 896 MB, Compute 1.3 (single,double)<br \/>\nGPU0 Tesla C2075, 5376 MB, Compute 2.0 (single,double)<\/code><\/p>\n<figure id=\"attachment_1877\" aria-describedby=\"caption-attachment-1877\" style=\"width: 632px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/10\/Screenshot-f2-295.png\"><img data-recalc-dims=\"1\" decoding=\"async\" data-attachment-id=\"1877\" data-permalink=\"https:\/\/mcclanahoochie.com\/blog\/2011\/10\/gpu-convolution-opencv-gpu-and-libjacket-part-2\/screenshot-f2-295\/#main\" data-orig-file=\"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/10\/Screenshot-f2-295.png?fit=642%2C841&amp;ssl=1\" data-orig-size=\"642,841\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;}\" data-image-title=\"Screenshot-f2-295\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/10\/Screenshot-f2-295.png?fit=642%2C841&amp;ssl=1\" class=\"size-full wp-image-1877\" title=\"Screenshot-f2-295\" src=\"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/10\/Screenshot-f2-295.png?resize=642%2C841\" alt=\".\" width=\"642\" height=\"841\" srcset=\"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/10\/Screenshot-f2-295.png?w=642&amp;ssl=1 642w, https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/10\/Screenshot-f2-295.png?resize=229%2C300&amp;ssl=1 229w\" sizes=\"(max-width: 642px) 100vw, 642px\" \/><\/a><figcaption id=\"caption-attachment-1877\" class=\"wp-caption-text\">.<\/figcaption><\/figure>\n<p>The bottom part of each figure shows LibJacket speedup over OpenCV, where<br \/>\n<code><br \/>\nspeedup = (time_opencv \/ time_jacket)<\/code><\/p>\n<p>Indeed, both libraries have improved since last time, and are sure to only get faster!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is a response to my earlier post comparing OpenCV&#8217;s gpu::convolve() and LibJacket&#8217;s jkt::conv2() convolution functions, at various image and kernel sizes. That post generated a lot of traffic, most notably from the OpenCV developer community. Taking note of this, it seems that the folks at Willow Garage have re-vamped their GPU convolutions and posted &#8230; <a title=\"GPU Convolutions: OpenCV GPU and LibJacket &#8211; Part 2\" class=\"read-more\" href=\"https:\/\/mcclanahoochie.com\/blog\/2011\/10\/gpu-convolution-opencv-gpu-and-libjacket-part-2\/\" aria-label=\"Read more about GPU Convolutions: OpenCV GPU and LibJacket &#8211; Part 2\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1],"tags":[91,193,179,192,103,100,54,92],"class_list":["post-1876","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-arrayfire","tag-benchmarking","tag-comparison","tag-convolution","tag-gpgpu","tag-gpu","tag-image-processing","tag-opencv"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pZdXI-ug","jetpack-related-posts":[{"id":1810,"url":"https:\/\/mcclanahoochie.com\/blog\/2011\/09\/opencv-vs-libjacket-gpu-sobel-filtering\/","url_meta":{"origin":1876,"position":0},"title":"OpenCV vs. LibJacket: GPU Sobel Filtering","author":"mcclanahoochie","date":"September 24, 2011","format":false,"excerpt":"Update: LibJacket has been renamed to\u00a0\u00a0ArrayFire. In response to a comment on a previous post about integrating LibJacket into an OpenCV project, below is just a simple FYI performance comparison of OpenCV's GPU Sobel filter versus LibJacket's conv2\u00a0convolution\u00a0filter (with a sobel kernel)... This is an evolutionary post, so be sure\u2026","rel":"","context":"In \"arrayfire\"","block_context":{"text":"arrayfire","link":"https:\/\/mcclanahoochie.com\/blog\/tag\/arrayfire\/"},"img":{"alt_text":"Sobel filter: OpenCV GPU vs. LibJacket","src":"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/09\/cv-versus-jkt.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":1731,"url":"https:\/\/mcclanahoochie.com\/blog\/2011\/08\/image-processing-with-libjacket-opencv\/","url_meta":{"origin":1876,"position":1},"title":"Image processing with LibJacket + OpenCV","author":"mcclanahoochie","date":"August 24, 2011","format":false,"excerpt":"Update: one year later:\u00a0ArrayFire+OpenCV The OpenCV library is the de-facto standard for doing computer vision and image processing research projects. OpenCV includes several hundreds of computer vision algorithms, aimed for use in real-time vision applications. LibJacket is a matrix library built on CUDA. LibJacket offers hundreds of general matrix and\u2026","rel":"","context":"In \"arrayfire\"","block_context":{"text":"arrayfire","link":"https:\/\/mcclanahoochie.com\/blog\/tag\/arrayfire\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/08\/Screen-shot-2011-08-24-at-2.42.52-PM-1024x640.png?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/08\/Screen-shot-2011-08-24-at-2.42.52-PM-1024x640.png?resize=350%2C200 1x, https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/08\/Screen-shot-2011-08-24-at-2.42.52-PM-1024x640.png?resize=525%2C300 1.5x"},"classes":[]},{"id":1896,"url":"https:\/\/mcclanahoochie.com\/blog\/2011\/11\/gpu-tv-l1-optical-flow-with-libjacket\/","url_meta":{"origin":1876,"position":2},"title":"GPU TV-L1 Optical Flow with ArrayFire","author":"mcclanahoochie","date":"November 6, 2011","format":false,"excerpt":"Update 1: LibJacket has been renamed to\u00a0\u00a0ArrayFire. Update 2: Huang Chao-Hui was nice enough to port the LibJacket code mentioned here to ArrayFire - see his work here. As one of my\u00a0Computer Vision\u00a0class\u00a0projects, I decided to implement optical flow, because I wanted to learn more about optical flow, and also\u2026","rel":"","context":"In \"arrayfire\"","block_context":{"text":"arrayfire","link":"https:\/\/mcclanahoochie.com\/blog\/tag\/arrayfire\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/11\/jkt-oflow-tvl1-1024x626.png?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/11\/jkt-oflow-tvl1-1024x626.png?resize=350%2C200 1x, https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/11\/jkt-oflow-tvl1-1024x626.png?resize=525%2C300 1.5x"},"classes":[]},{"id":881,"url":"https:\/\/mcclanahoochie.com\/blog\/2011\/01\/cuda-opencv-sobel\/","url_meta":{"origin":1876,"position":3},"title":"Misc CUDA + OpenCV Fun : Sobel Filter","author":"mcclanahoochie","date":"January 1, 2011","format":false,"excerpt":"March 2010 I recently started playing with the\u00a0Nvidia CUDA SDK Samples involving image processing. I extracted the\u00a0SobelFilter kernel and made\u00a0it run off my USB webcam using\u00a0OpenCV - the two live modes\u00a0are\u00a0single-channel-gray and\u00a0three-channel-rgb. I wrote two different basic kernels on my own that did\u00a0binary thresholding and found the most\u00a0dominant RGB value.\u2026","rel":"","context":"In \"cuda\"","block_context":{"text":"cuda","link":"https:\/\/mcclanahoochie.com\/blog\/tag\/cuda\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/01\/cuda-fun.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":611,"url":"https:\/\/mcclanahoochie.com\/blog\/2010\/09\/gtc-2010-trip\/","url_meta":{"origin":1876,"position":4},"title":"GTC 2010 Trip","author":"mcclanahoochie","date":"September 26, 2010","format":false,"excerpt":"I just got back from Nvidia's 2010 GPU Technology Conference in San Jose California. I had an amazing trip, and am thankful that I got to go, as it was my first visit to California as well as my first trade show attendance. [Side Note: The afternoon before the conference,\u2026","rel":"","context":"In \"arrayfire\"","block_context":{"text":"arrayfire","link":"https:\/\/mcclanahoochie.com\/blog\/tag\/arrayfire\/"},"img":{"alt_text":"Arriving at the 2010 GPU Tech Conference","src":"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2010\/09\/2010-09-20_09-39-29_568-1024x577.jpg?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2010\/09\/2010-09-20_09-39-29_568-1024x577.jpg?resize=350%2C200 1x, https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2010\/09\/2010-09-20_09-39-29_568-1024x577.jpg?resize=525%2C300 1.5x"},"classes":[]},{"id":950,"url":"https:\/\/mcclanahoochie.com\/blog\/2011\/01\/computer-vision-on-android\/","url_meta":{"origin":1876,"position":5},"title":"Computer Vision on Android in Java","author":"mcclanahoochie","date":"January 4, 2011","format":false,"excerpt":"January 2010 \u00a0 Over the holiday break, I finally created an Android\u00a0app that allows image processing on the camera's raw data, and displays it back on the screen. It only uses\u00a0Java on the CPU for now, but in my free time I'll be porting the code to OpenGL ES to\u2026","rel":"","context":"In \"android\"","block_context":{"text":"android","link":"https:\/\/mcclanahoochie.com\/blog\/tag\/android\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/01\/device-sobel-2-small.png?resize=350%2C200","width":350,"height":200},"classes":[]}],"jetpack_likes_enabled":false,"_links":{"self":[{"href":"https:\/\/mcclanahoochie.com\/blog\/wp-json\/wp\/v2\/posts\/1876","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mcclanahoochie.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mcclanahoochie.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mcclanahoochie.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mcclanahoochie.com\/blog\/wp-json\/wp\/v2\/comments?post=1876"}],"version-history":[{"count":0,"href":"https:\/\/mcclanahoochie.com\/blog\/wp-json\/wp\/v2\/posts\/1876\/revisions"}],"wp:attachment":[{"href":"https:\/\/mcclanahoochie.com\/blog\/wp-json\/wp\/v2\/media?parent=1876"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mcclanahoochie.com\/blog\/wp-json\/wp\/v2\/categories?post=1876"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mcclanahoochie.com\/blog\/wp-json\/wp\/v2\/tags?post=1876"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}