GroqAI Blog

Nvidia’s rival is here, GroqAI chip releases new artificial intelligence model

Recently, a brilliant new star appeared in the field of artificial intelligence hardware-Groq, whose innovative high-performance processor has aroused widespread heated discussion in the industry. This artificial intelligence solution company, established in 2016, has achieved a major breakthrough against the traditional GPU giant Nvidia with its revolutionary language processing unit (LPU) technology.

It is reported that Groq’s LPU architecture shows amazing efficiency in processing deep learning tasks, and its inference performance is ten times higher than the current mainstream Nvidia GPU, while the cost is only about one-tenth of the latter. This disruptive achievement has made Groq the focus of attention worldwide. Particularly noteworthy is a large-scale pre-trained model called Llama 2 7B running on the Groq platform, which shows the ability to generate up to 750 tokens per second, compared to OpenAI’s GPT-3.5 at 40 tokens per second2. This is an astounding 18-fold increase.

Groq’s success is not accidental, but behind it is a strong technical foundation and talent advantage. Among the core team members of the company, eight came from the core design team of Google’s early TPU. They abandoned the traditional TPU, GPU or CPU route and developed a LPU technology tailored for efficient processing of natural language and other complex AI tasks. With the support of the Groq LPU™ inference engine, the Llama 2 70B model of Meta AI showed unprecedented excellent performance, and the throughput increased by up to 18 times compared to other cloud-based inference service providers3.

However, despite Groq’s significant achievements in speed, there are still some doubts and discussions in the industry about whether it can fully replace Nvidia’s position. On the one hand, although the LPU card launched by Groq sells for more than $20,000 each, the memory is relatively small, only equipped with 230MB, which may be limited in some application scenarios that require a large amount of data caching4. Some network analysts pointed out that from the perspective of cost-effectiveness, Nvidia’s H100 series may still have a higher price-performance ratio, about 11 times that of Groq products5.

On the other hand, the key difference of Groq LPU lies in the memory configuration. Unlike Nvidia’s use of high-bandwidth memory (HBM), Groq chose a type of ultra-high-speed static random access memory (SRAM), which is about 20 times faster than HBM3, but has limited capacity6. This means that when running a single large AI model, Groq may need to deploy multiple LPU cards in a cluster to achieve a throughput level comparable to Nvidia H2007. According to Groq insiders, their large-scale language model does run on hundreds of chips in coordination.

Regarding Groq’s technical characteristics, Yao Jinxin, a chip expert from Tencent Technology, believes that although Groq’s chip performs well in terms of speed, it is not enough to completely replace Nvidia’s products in the market. He pointed out that Groq’s architectural feature is small memory with high-intensity computing power, which makes it show extremely high computing speed when processing limited content, but also means that it needs to rely on the expansion capability of distributed systems when dealing with large data sets. In other words, to maintain the same data processing capability as Nvidia H100, users may need to purchase more Groq LPU cards8.

However, Yao Jinxin also emphasized Groq’s unique advantages and its potential application scenarios. He said that Groq’s chip architecture is especially suitable for those applications that frequently perform data transfer and require extremely high response speed, such as real-time speech recognition, autonomous driving decision-making and other fields, and its excellent computing efficiency will bring a competitive advantage that cannot be ignored9.

In summary, Groq stands out in the field of AI chips with its innovative LPU technology and breaking the speed barrier, and although it cannot immediately replace the existing leaders in all aspects, it has shown strong competitiveness in specific application scenarios. In the future, as Groq continues to optimize its products and broaden its application fields, it is expected to write a new chapter in the global AI hardware market10.