China has made a significant breakthrough in overcoming the limitations of NVIDIA’s “cut-down” AI accelerators, thanks to DeepSeek’s innovative approach. Their latest project boasts an impressive eightfold increase in TFLOPS using the Hopper H800s AI accelerators.
## Maximizing AI Power: DeepSeek’s FlashMLA Optimizes NVIDIA’s Hopper GPUs
China is taking matters into its own hands when it comes to advancing hardware capabilities. Companies like DeepSeek are spearheading this effort by harnessing the power of software to push the limits of existing equipment. Recent strides by DeepSeek are particularly noteworthy, as they’ve achieved remarkable performance improvements with NVIDIA’s “cut-down” Hopper H800 GPUs. They’ve done this by optimizing memory usage and resource allocation across inference requests.
At the start of their “OpenSource Week,” DeepSeek made a splash by unveiling FlashMLA, a “decoding kernel” specifically designed for NVIDIA’s Hopper GPUs. This tool is now freely accessible to the public via GitHub repositories. Let’s delve into the transformative enhancements it brings to the table.
DeepSeek asserts that they’ve achieved an astonishing 580 TFLOPS for BF16 matrix multiplication on the Hopper H800, which is an eightfold improvement over the industry standard. Moreover, with savvy memory utilization, FlashMLA enables memory bandwidth of up to 3000 GB/s, nearly doubling the H800’s theoretical limit. Remarkably, these advancements come solely from software tweaks, without any hardware upgrades.
FlashMLA employs “low-rank key-value compression,” which essentially breaks data into smaller portions, improving processing speed and reducing memory consumption by 40% to 60%. It also features a block-based paging system that dynamically allocates memory based on task demands, enhancing the processing of variable-length sequences and boosting performance.
The progress made by DeepSeek is a testament to the multifaceted nature of AI computing, illustrating that it isn’t reliant on a single factor. Currently, FlashMLA is tailored for Hopper GPUs, but it will be exciting to see how it performs with the H100.