CCES Unicamp

Unlocking AI’s Full Potential: Tool from UNICAMP Researchers Set to Improve Artificial Intelligence Speed and Efficiency

In a world increasingly powered by Artificial Intelligence, from the smartphones in our pockets to complex medical diagnostic tools, the speed and efficiency of AI systems are paramount. Now, a team of researchers from the Institute of Computing at the University of Campinas (UNICAMP) has unveiled a  tool that promises to accelerate AI development by providing a new level of insight into one of its most fundamental operations. The new benchmark suite, named “ConvBench,” is set to become an asset for AI developers worldwide, helping them build faster, more robust, and more efficient AI models.

The research, detailed in a paper titled “ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation,” addresses a critical, yet often overlooked, challenge in the world of AI: how to fairly and comprehensively test the performance of convolution algorithms. While the term “convolution” might sound technical, it is the computational heart of many modern AI systems, particularly the Convolutional Neural Networks (CNNs) that are essential for image recognition, autonomous driving, and countless other applications. In fact, this single operation can be responsible for up to 90% of the time it takes for a CNN to run.

Imagine trying to find the fastest car in the world, but you can only test it on a handful of short, straight roads, and each car has to follow a different set of rules. You wouldn’t get a true picture of which car is best. That’s the situation AI developers have faced until now. Existing testing methods were often limited, looking at only a small number of scenarios, making it difficult to get a fair and complete comparison of different algorithms.

This is the core problem that ConvBench aims to solve. The UNICAMP team identified two major issues with previous benchmarking approaches: the “completeness problem” and the “fairness problem.” The completeness problem refers to the fact that most benchmarks used a very small, often handmade, set of tests—sometimes as few as a dozen or so operations. This is like judging a world-class marathon runner based on a 100-meter dash. The fairness problem arises because different convolution algorithms require data to be prepared in specific ways before they can even begin their calculations, and these preparation steps take time. Failing to account for these “pre-processing penalties” can lead to skewed and unfair performance comparisons.

Introducing ConvBench: A New Gold Standard for AI Testing

ConvBench tackles these challenges head-on with a comprehensive and meticulously designed solution. To solve the completeness problem, the researchers have assembled a large and diverse dataset of convolution operations, which they call “convSet.” This isn’t just a handful of tests; the convSet contains a staggering 9,243 unique convolution operations derived from 1,097 real-world AI models used in the industry today. These models are sourced from widely used collections like Hugging Face TIMM and PyTorch Torchvision, ensuring that ConvBench tests algorithms against the kinds of tasks they will actually face in real-world applications. The sheer scale and diversity of the convSet, which includes operations from image sizes up to 1024×1024 pixels and a vast range of other parameters, ensures that algorithms are pushed to their limits across a wide spectrum of computational challenges.

To address the fairness problem, the team developed a “Timing Measurement Tool” or “TM-Tool.” This tool acts like a high-precision stopwatch for every single step involved in a convolution operation. It doesn’t just measure the final result; it breaks down the entire process into distinct phases: “pre-convolution,” “in-convolution,” and “post-convolution.” Within these phases, it meticulously times routines like data preparation (packing), the core computation (microkernel), and data reordering. This detailed breakdown allows developers to see exactly where an algorithm is excelling and, more importantly, where it’s struggling.

The goal was to create a level playing field. With the TM-Tool, developers can no longer hide inefficiencies in hidden steps. ConvBench shines a bright light on the entire process, from start to finish, allowing for truly fair and insightful comparisons.

Putting ConvBench to the Test: A Real-World Success Story

To demonstrate the utility of their new tool, the researchers used ConvBench to evaluate a recent, high-performance convolution algorithm called “Sliced Convolution” (SConv). SConv had previously shown great promise, but its evaluation was based on a much smaller set of 134 operations.

The results of the ConvBench analysis were multi-faceted. On the one hand, ConvBench confirmed that SConv is indeed a very fast algorithm, outperforming the standard method (known as Im2col-GEMM) in an impressive 93.6% of regular convolution tests. This validated the algorithm’s excellent design for a vast majority of common cases. 

However, ConvBench revealed data on the remaining 6.4% of cases where SConv was slower. Thanks to the detailed breakdown provided by the TM-Tool, the UNICAMP team was able to pinpoint the exact source of the problem. They discovered a critical bottleneck in SConv’s “packing” step—the routine that prepares the data for the main calculation. In the underperforming cases, this packing step was, on average, a staggering 79.5% slower than the baseline method.

The analysis went even deeper, revealing that this slowdown primarily occurred in specific scenarios, such as when the convolution involved strides and a small number of channels. In these situations, SConv was falling back on a less optimized, scalar packing routine. This is the kind of critical, actionable insight that was previously hard to obtain. Armed with this precise knowledge, the designers of SConv can now focus their efforts on fixing this specific weakness, potentially making their already fast algorithm even better. This discovery perfectly underscores the value of ConvBench: it doesn’t just tell you if something is fast or slow; it tells you why

The Future of AI is Faster and More Efficient

The implications of this work extend far beyond a single algorithm. ConvBench provides the entire AI community with a powerful, open-source tool to rigorously evaluate and compare new and existing convolution algorithms. As AI models continue to grow in complexity and are deployed in ever more critical applications, the need for efficiency becomes paramount. Faster algorithms mean less energy consumption, lower operational costs, and the ability to run powerful AI on smaller devices, from drones to portable medical scanners.

The UNICAMP team is already looking ahead to future enhancements for ConvBench, including the ability to measure system-level metrics like cache performance and memory access, and to evaluate algorithms on a wider range of hardware architectures.

The development of ConvBench is a testament to the cutting-edge research being conducted at the University of Campinas and a significant contribution to the global advancement of artificial intelligence. By providing a clearer, fairer, and more comprehensive view of AI’s core computational engine, these researchers have not only identified pathways for immediate optimization but have also laid a foundation for a future of more powerful and efficient artificial intelligence for all. This work, supported by the São Paulo Research Foundation (FAPESP) and UNICAMP, is a shining example of how foundational computer science research can unlock immense practical benefits.

Related posts

Pesquisadores do CCES participaram de curso do MIT sobre modelagem multiescala de materiais

cces cces

NV-PhTM: An Efficient Phase-Based Transactional System for Non-volatile Memory

cces cces

Transition, intermittency and phase interference effects in airfoil secondary tones and acoustic feedback loop

cces cces