Can DeepSeek’s Efficient Training Offset Its High Inference Energy Use?

February 3, 2025

The Chinese AI model DeepSeek has stimulated extensive debates, ranging from privacy issues to the possibility of a groundbreaking transformation in artificial intelligence. Yet, a crucial aspect that merits rigorous scrutiny is whether DeepSeek’s efficient training methods genuinely lead to overall energy savings in AI operations. This discussion explores the energy implications of DeepSeek, focusing intently on its training and inference phases.

The Promise of Efficient Training

DeepSeek has attracted considerable attention for its innovative “mixture of experts” technique coupled with enhanced reinforcement learning during the training phase. This methodology activates only a portion of the model’s parameters at any given moment, which, in theory, should make it significantly more energy-efficient. Moreover, DeepSeek has streamlined the training process by automating the scoring of model outputs, a task traditionally performed by human annotators, further bolstering its efficiency.

Despite these promising advancements, the critical question that lingers is whether these efficiencies genuinely translate to substantial energy savings. The training phase has indeed become more streamlined. However, to gauge the overall energy impact, one must consider both the training and inference phases comprehensively. This scrutiny is essential to understand the real-world implications of these advancements and to determine if the initial promise holds up under practical conditions.

The Energy-Intensive Inference Phase

While the training phase of DeepSeek is lauded for its efficiency, the inference phase—the stage where a trained model generates responses—turns out to be more energy-intensive. DeepSeek uses a sophisticated reasoning-based approach known as “chain of thought,” which breaks down tasks into logical steps to derive conclusions. This approach enables DeepSeek to excel in tasks requiring logic and pattern recognition, but it also necessitates considerably more energy during processing.

An analysis has revealed some striking findings: DeepSeek’s medium-sized R1 model consumed more energy compared to a similarly-sized Meta model when generating responses. For example, a test involving a 1,000-word response to a query about lying—addressed using both utilitarian and Kantian ethics—found that DeepSeek used approximately 17,800 joules, which is 41% higher than Meta’s equivalent model. These figures raise serious concerns about the overall energy efficiency of DeepSeek, suggesting that the gains in training might be counterbalanced by the higher energy usage during inference.

The Jevons Paradox and AI Energy Consumption

The phenomenon known as the Jevons paradox could be particularly relevant when evaluating DeepSeek’s energy consumption. The paradox suggests that improvements in efficiency often lead to an increase in overall energy consumption, as they make certain tasks cheaper and thereby encourage more extensive use. This paradox provides a compelling framework for understanding the potential impact of DeepSeek. The gains in training efficiency might indeed lead to increased usage, ultimately offsetting any anticipated energy savings.

Experts speculate that DeepSeek’s methods could spur other tech companies to develop similar low-cost reasoning models, potentially leading to an uptick in energy consumption across the AI industry. Evidence from testing supports this concern: DeepSeek’s models demand significantly more energy than comparable models, despite having similar parameter counts. This trend could become more pronounced if other companies adopt similar strategies, driving up industry-wide energy consumption.

Broader Impacts on the AI Industry

A broader adoption of reasoning models akin to DeepSeek’s could have sweeping implications for the AI industry at large. AI researcher Sasha Luccioni has voiced concerns that the enthusiasm around DeepSeek could prompt a surge in employing this approach, even in contexts where it is unnecessary. Such unwarranted use could lead to excessive energy consumption, negating the operational efficiencies touted for the training phases due to the higher energy needs during inference.

The AI industry has experienced paradigm shifts before, such as the transition from extractive to generative AI, which inherently demands more energy because of its predictive nature. Luccioni’s research suggests that this shift has caused an exponential increase in energy use for equivalent tasks. If the market embraces DeepSeek similarly, energy costs could spike once again, affecting not only larger models but also medium-sized and smaller models destined for various applications.

Balancing Economic Incentives and Environmental Impact

The discussion also delves into the delicate balance between economic incentives and environmental impact. Nathan Benaich of Air Street Capital underscores that companies are likely to weigh the economic benefits of deploying more advanced AI models against the energy costs. Only if these costs reach prohibitively high levels will they significantly influence decision-making. This dynamic introduces an economic dimension where the drive for higher intelligence and improved performance can often outweigh energy considerations, unless the energy costs become extreme.

The trend toward greater market availability and use of chain-of-thought models is underscored by OpenAI’s recent announcement of expanding access to its reasoning model, o3. However, concrete data on the energy costs associated with these models remain limited and call for further study to provide a clearer picture. As this trend continues, understanding the full energy implications will be critical for making informed decisions.

The Need for Rigorous Empirical Study

The introduction of the Chinese AI model DeepSeek has sparked extensive debates, covering topics from privacy concerns to the potential for a revolutionary shift in artificial intelligence. One key area that demands thorough examination is the claim that DeepSeek’s efficient training methods result in substantial energy savings for AI operations. To truly understand its impact, it’s essential to delve into the energy implications of DeepSeek, with particular attention to its training and inference stages.

In the realm of AI, training models often consume enormous amounts of energy, raising concerns about the environmental footprint of advancing technology. However, proponents of DeepSeek assert that this model introduces novel techniques that could significantly reduce energy consumption without compromising performance.

As we evaluate these claims, it’s crucial to compare DeepSeek’s energy efficiency with other prominent AI models to determine if it indeed offers a more sustainable alternative. By focusing on both the training and inference phases, we can gain a comprehensive understanding of whether DeepSeek can truly lead to overall energy savings in the ever-evolving field of artificial intelligence.

Subscribe to our weekly news digest.

Join now and become a part of our fast-growing community.

Invalid Email Address
Thanks for Subscribing!
We'll be sending you our best soon!
Something went wrong, please try again later