High-performance GPU techniques for 2D convolution operators in deep neural networks
Research Output: Contribution to journal Article Peer-review
Abstract
This paper presents an optimized implementation of Winograd non-fused convolution. Our optimizations include both application independent and Winograd-specific software techniques, such as a specialized interface-kernel data format (tile-united CNHW layout) to enhance memory access efficiency; warp specialization and double-buffered prefetching to effectively exploit computational resources and memory bandwidth; and the use of shuffle instructions to conserve hardware resources. We propose a GPU-based Multi-Modal Parallelism Method (MMPM) for 2D Winograd non-fused convolution and provide a supplementary explanation of Winograd’s tile extraction, which
reduces memory usage and computation. The proposed techniques were evaluated at the kernel level in two environments (ENV1 – GTX 980 GPU, CUDA 9.2, cuDNN 7.6.4; ENV2 – GTX 1650Ti GPU, CUDA 10.2, cuDNN 8.2.0) using a wide range of CNN layer benchmark-compliant parameters. Compared with the state-of-the-art Winograd non-fused convolution in cuDNN, our implementation achieves speedups of 1.64× and 1.28× for the two environments, respectively.
reduces memory usage and computation. The proposed techniques were evaluated at the kernel level in two environments (ENV1 – GTX 980 GPU, CUDA 9.2, cuDNN 7.6.4; ENV2 – GTX 1650Ti GPU, CUDA 10.2, cuDNN 8.2.0) using a wide range of CNN layer benchmark-compliant parameters. Compared with the state-of-the-art Winograd non-fused convolution in cuDNN, our implementation achieves speedups of 1.64× and 1.28× for the two environments, respectively.
Publication Information
Output type
Research Output: Contribution to journal Article Peer-review
Original language
EnglishArticle number
CPE-25-1771Journal (Volume, Issue Number)
Concurrency and Computation: Practice and ExperiencePublication milestones
- Accepted/In press - 18/06/2026
Publication status
Accepted/In press - 18/06/2026
ISSN
1532-0626Access to documents
High-Performance GPU Techniques for 2D Convolution Operators in Deep Neutral Networks
Accepted author manuscript, 2.9 MB
License:CC BY-NC-ND, opens in new tab
Access to file: Embargo ends 29/06/2027
