{"id":886,"date":"2011-01-01T11:15:13","date_gmt":"2011-01-01T18:15:13","guid":{"rendered":"http:\/\/mcclanahoochie.com\/blog\/?post_type=portfolio&#038;p=886"},"modified":"2023-06-10T10:32:22","modified_gmt":"2023-06-10T17:32:22","slug":"mtimes-gpu-matrix-multiplication","status":"publish","type":"post","link":"https:\/\/mcclanahoochie.com\/blog\/2011\/01\/mtimes-gpu-matrix-multiplication\/","title":{"rendered":"MTIMES &#8211; GPU Matrix Multiplication"},"content":{"rendered":"<h3>July 2010<\/h3>\n<p><a href=\"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/01\/fermi_gflops_single.jpg\"><img data-recalc-dims=\"1\" decoding=\"async\" data-attachment-id=\"1255\" data-permalink=\"https:\/\/mcclanahoochie.com\/blog\/2011\/01\/mtimes-gpu-matrix-multiplication\/fermi_gflops_single\/#main\" data-orig-file=\"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/01\/fermi_gflops_single.jpg?fit=200%2C124&amp;ssl=1\" data-orig-size=\"200,124\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;}\" data-image-title=\"fermi_gflops_single\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/01\/fermi_gflops_single.jpg?fit=200%2C124&amp;ssl=1\" class=\"alignnone size-full wp-image-1255\" title=\"fermi_gflops_single\" src=\"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/01\/fermi_gflops_single.jpg?resize=200%2C124\" alt=\"\" width=\"200\" height=\"124\" \/><\/a><\/p>\n<p>OK, it&#8217;s not really a project, but I did learn a lot about GPU matrix multiplication over the summer, working\u00a0at <a href=\"http:\/\/www.accelereyes.com\/\">AccelerEyes<\/a>.<\/p>\n<p>I\u00a0re-worked the back-end CUDA code for\u00a0the <a href=\"http:\/\/blog.accelereyes.com\/blog\/2010\/06\/24\/sgemm-mtimes\/\" target=\"_blank\" rel=\"noopener\">MTIMES<\/a> Jacket function. I also modified it to accelerate SUM, MIN, and\u00a0MAX.<\/p>\n<p><em><strong>Checkout my MTIMES <a href=\"http:\/\/arrayfire.com\/sgemm-mtimes\/\" target=\"_blank\" rel=\"noopener\">wiki<\/a> page!<\/strong><\/em><\/p>\n<p><em><strong><br \/>\n<\/strong><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>July 2010 OK, it&#8217;s not really a project, but I did learn a lot about GPU matrix multiplication over the summer, working\u00a0at AccelerEyes. I\u00a0re-worked the back-end CUDA code for\u00a0the MTIMES Jacket function. I also modified it to accelerate SUM, MIN, and\u00a0MAX. Checkout my MTIMES wiki page!<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1],"tags":[91,110,103,100,126,101,29],"class_list":["post-886","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-arrayfire","tag-cuda","tag-gpgpu","tag-gpu","tag-matrix-multiplication","tag-programming","tag-projects"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pZdXI-ei","jetpack-related-posts":[{"id":887,"url":"https:\/\/mcclanahoochie.com\/blog\/2011\/01\/gtc-2010-presentation\/","url_meta":{"origin":886,"position":0},"title":"GTC 2010 Presentation","author":"mcclanahoochie","date":"January 1, 2011","format":false,"excerpt":"September 2010 I got an incredible opportunity (at the last minute!) to give a short talk\u00a0at Nvidia's GTC 2010 on what I learned about GPU matrix multiplication, while at\u00a0AccelerEyes over the summer! Watch\u00a0presentation here (I'm 19 minutes in)! \u00a0 \u00a0","rel":"","context":"In \"arrayfire\"","block_context":{"text":"arrayfire","link":"https:\/\/mcclanahoochie.com\/blog\/tag\/arrayfire\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/01\/gtc-2010-front-door.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":611,"url":"https:\/\/mcclanahoochie.com\/blog\/2010\/09\/gtc-2010-trip\/","url_meta":{"origin":886,"position":1},"title":"GTC 2010 Trip","author":"mcclanahoochie","date":"September 26, 2010","format":false,"excerpt":"I just got back from Nvidia's 2010 GPU Technology Conference in San Jose California. I had an amazing trip, and am thankful that I got to go, as it was my first visit to California as well as my first trade show attendance. [Side Note: The afternoon before the conference,\u2026","rel":"","context":"In \"arrayfire\"","block_context":{"text":"arrayfire","link":"https:\/\/mcclanahoochie.com\/blog\/tag\/arrayfire\/"},"img":{"alt_text":"Arriving at the 2010 GPU Tech Conference","src":"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2010\/09\/2010-09-20_09-39-29_568-1024x577.jpg?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2010\/09\/2010-09-20_09-39-29_568-1024x577.jpg?resize=350%2C200 1x, https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2010\/09\/2010-09-20_09-39-29_568-1024x577.jpg?resize=525%2C300 1.5x"},"classes":[]},{"id":929,"url":"https:\/\/mcclanahoochie.com\/blog\/2011\/01\/p3dfft-cuda-gpu-3d-fft\/","url_meta":{"origin":886,"position":2},"title":"P3DFFT + CUDA (GPU 3D FFT)","author":"mcclanahoochie","date":"January 1, 2011","format":false,"excerpt":"December 2010 My first project as a GRA under Rich Vuduc involved accelerating 3D Fast Fourier Transforms (3D FFT) with GPUs. The project was basically porting the open-source P3DFFT code\u00a0(written in FORTRAN) to run on GPU(instead of CPU)\u00a0clusters using CUFFT. \u00a0 Update: 04\/16\/2011 - This project has morphed into a\u2026","rel":"","context":"In \"cuda\"","block_context":{"text":"cuda","link":"https:\/\/mcclanahoochie.com\/blog\/tag\/cuda\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/01\/pencil-decomp.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":1810,"url":"https:\/\/mcclanahoochie.com\/blog\/2011\/09\/opencv-vs-libjacket-gpu-sobel-filtering\/","url_meta":{"origin":886,"position":3},"title":"OpenCV vs. LibJacket: GPU Sobel Filtering","author":"mcclanahoochie","date":"September 24, 2011","format":false,"excerpt":"Update: LibJacket has been renamed to\u00a0\u00a0ArrayFire. In response to a comment on a previous post about integrating LibJacket into an OpenCV project, below is just a simple FYI performance comparison of OpenCV's GPU Sobel filter versus LibJacket's conv2\u00a0convolution\u00a0filter (with a sobel kernel)... This is an evolutionary post, so be sure\u2026","rel":"","context":"In \"arrayfire\"","block_context":{"text":"arrayfire","link":"https:\/\/mcclanahoochie.com\/blog\/tag\/arrayfire\/"},"img":{"alt_text":"Sobel filter: OpenCV GPU vs. LibJacket","src":"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/09\/cv-versus-jkt.png?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":1731,"url":"https:\/\/mcclanahoochie.com\/blog\/2011\/08\/image-processing-with-libjacket-opencv\/","url_meta":{"origin":886,"position":4},"title":"Image processing with LibJacket + OpenCV","author":"mcclanahoochie","date":"August 24, 2011","format":false,"excerpt":"Update: one year later:\u00a0ArrayFire+OpenCV The OpenCV library is the de-facto standard for doing computer vision and image processing research projects. OpenCV includes several hundreds of computer vision algorithms, aimed for use in real-time vision applications. LibJacket is a matrix library built on CUDA. LibJacket offers hundreds of general matrix and\u2026","rel":"","context":"In \"arrayfire\"","block_context":{"text":"arrayfire","link":"https:\/\/mcclanahoochie.com\/blog\/tag\/arrayfire\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/08\/Screen-shot-2011-08-24-at-2.42.52-PM-1024x640.png?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/08\/Screen-shot-2011-08-24-at-2.42.52-PM-1024x640.png?resize=350%2C200 1x, https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/08\/Screen-shot-2011-08-24-at-2.42.52-PM-1024x640.png?resize=525%2C300 1.5x"},"classes":[]},{"id":1663,"url":"https:\/\/mcclanahoochie.com\/blog\/2011\/08\/cuda-connected-component-labeling\/","url_meta":{"origin":886,"position":5},"title":"GPU Connected Component Labeling","author":"mcclanahoochie","date":"August 6, 2011","format":false,"excerpt":"Connected Component Labeling (CCL): \"is used in computer vision to detect connected regions in binary digital images\", and sometimes referred to as blob coloring. Motivation: To keep AccelerEyes'\u00a0ever expanding GPU library growing, over a few weeks of this summer\u00a0I took on the project of writing a CUDA version of connected\u2026","rel":"","context":"In \"arrayfire\"","block_context":{"text":"arrayfire","link":"https:\/\/mcclanahoochie.com\/blog\/tag\/arrayfire\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/mcclanahoochie.com\/blog\/wp-content\/uploads\/2011\/08\/coins-bwlabel-300x122.png?resize=350%2C200","width":350,"height":200},"classes":[]}],"jetpack_likes_enabled":false,"_links":{"self":[{"href":"https:\/\/mcclanahoochie.com\/blog\/wp-json\/wp\/v2\/posts\/886","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mcclanahoochie.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mcclanahoochie.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mcclanahoochie.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mcclanahoochie.com\/blog\/wp-json\/wp\/v2\/comments?post=886"}],"version-history":[{"count":0,"href":"https:\/\/mcclanahoochie.com\/blog\/wp-json\/wp\/v2\/posts\/886\/revisions"}],"wp:attachment":[{"href":"https:\/\/mcclanahoochie.com\/blog\/wp-json\/wp\/v2\/media?parent=886"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mcclanahoochie.com\/blog\/wp-json\/wp\/v2\/categories?post=886"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mcclanahoochie.com\/blog\/wp-json\/wp\/v2\/tags?post=886"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}