1. Overview
In this tutorial, we’ll explore the environmental impact of transformers, with a focus on their carbon footprint.
We’ll first discuss the basics of what contributes to these models’ carbon footprint. Then, we’ll review the methods for calculating it and strategies for reducing it.
2. Transformers
A transformer is a neural network architecture proposed by Vaswani in the “Attention Is All You Need.” Unlike RNNs, transformers don’t rely on recurrence but on self-attention.
The self-attention mechanism allows the model to weigh the importance of different input tokens when making predictions, enabling it to capture long-range dependencies without sequential processing. Transformers consist of encoder and decoder layers, employing multi-head self-attention mechanisms and feed-forward neural networks.
For a detailed explanation of the Transformer model’s functions and architecture, please refer to our articles on the ChatGPT model and transformer text embeddings.
3. What Is a Carbon Footprint?
A carbon footprint of an entity represents the total amount of greenhouse gases (GHGs) it emits directly or indirectly. These gases, primarily carbon dioxide (CO2), contribute to global warming and climate change and affect our urban environments. Transformers’ carbon footprint includes development, training, and deployment emissions:
3.1. Key Contributors to the Carbon Footprint of Transformers
To understand the origin of transformers’ carbon footprint, we should remember that developing models such as BERT and GPTs requires substantial computational resources. The initial phase of development involves designing the model architecture, conducting numerous experiments to test the designs at a lower scale, and optimizing hyperparameters. Each of these activities involves running multiple training cycles, which consumes a significant amount of electricity. For example, training a large model like GPT-4 can take several days to weeks using high-performance GPUs or TPUs, resulting in considerable energy consumption.
The energy required for these computational tasks translates directly into GHG emissions, especially if the electricity comes from non-renewable sources. A study by Strubell, Ganesh, and McCallum highlighted that training a single AI model like BERT can emit as much CO2 as five cars over their entire lifespans.
Additionally, the infrastructure supporting training and deployment, such as cooling systems and power supply in data centers, contributes significantly to the overall carbon footprint. According to a report by OpenAI, the computational power used for training large AI models has been doubling approximately every 3.4 months since 2012, leading to increased energy demands.
4. Calculating the Carbon Footprint
To calculate the carbon footprint of a transformer, we should consider the following:
- energy used in developing and training the model
- operational energy consumption during deployment
- emissions associated with the hardware and infrastructure used
We can use freely available online calculators and select the time and resources we used to train or test our models to get some estimates: