是什么让大型语言模型如此昂贵？

1. Introduction

In this article, we’ll describe the factors that make large language models expensive.

2. Large Language Models and Big Tech

In the field of large language models (LLMs), big tech companies play an important role as key players. These companies, such as Google, Microsoft, and OpenAI, have become leaders in developing state-of-the-art LLM systems. The most important factor distinguishing them as leaders in this field is their substantial funding.

Developing LLMs is a costly process. The cost of creating and maintaining is substantial due to various factors. For example, according to Sam Altman, CEO of OpenAI, the cost of training GPT-4 was more than $100 million. In addition to that, there are expenses related to data preparation, human resources, operation costs, and similar.

Because of that, all top LLM systems are backed by big tech. Some of them offer pre-trained models for fine-tuning, which is a significantly cheaper process. Additionally, there are API integrations that allow smaller firms and developers to integrate LLMs into their applications and services.

3. Why Are LLMs Expensive?

Several factors influence the cost of developing and using LLMs. To be more structural, we’ll break these factors into three use cases:

Building LLMs from scratch
Fine-tuning existing LLMs
Using existing LLM APIs

In the following sections, we’ll look at these factors and discuss the pros and cons of every use case.

4. What Makes Building LLMs From Scratch Expensive?

Building an LLM from scratch is probably the most expensive and difficult option. Considering the concurrency with big tech companies and the availability of open-sourced LLMs or proprietary LLM APIs, it’s very uncommon for someone to build an LLM from scratch. Despite that, if we need to develop one, below are the factors that will influence the development cost.

Collecting and preparing a massive corpus of legal data is an extensive and demanding task. This process involves gathering, cleaning, and organizing the data, which can be both time-consuming and resource-intensive. We might need to purchase data or invest in tools and labor for data preprocessing.

The research and development phase involves an iterative process of experimenting with various model architectures, hyperparameters, and training strategies. Achieving the desired performance is time-consuming and costly. For that, we’ll need a team of experts in AI, machine learning, and natural language processing.

Training a new language model from scratch demands significant computational resources. We’ll need access to specialized hardware, like GPUs or TPUs, which is expensive to purchase or rent. In addition to that, the experimenting and training process can take weeks or even months.

4.1. Pros of Building LLMs From Scratch

Building LLMs from scratch offers full control over every development aspect, allowing for a highly specialized model tailored to specific needs. It ensures data privacy by eliminating the need to share information with third parties, which is crucial for industries handling confidential data. Additionally, with this approach, we can experiment with new designs, training methods, and data sources, which can lead to unique features and improvements in performance and efficiency.

4.2. Cons of Building LLMs From Scratch

In contrast to the benefits, building LLMs from scratch involves high costs, requiring substantial financial and computational resources for development, training, and maintenance. It’s also time-consuming, often taking weeks or months, which can delay deployment and benefits. Moreover, there is a significant risk that the final model may not meet desired expectations or performance standards, potentially failing in accuracy, efficiency, or robustness.

5. What Makes Fine-Tuning Existing LLMs Expensive?

Fine-tuning LLM is significantly less expensive than building from scratch. There are several factors that influence the fine-tuning cost.

The process of data collection and preparation is the same as for building LLMs from scratch but on a smaller scale. More likely, we’ll need more specialized, domain-specific data. The cost is highly related to our application. For instance, data collection and preparation for some specific medical diagnoses might be significantly more expensive than using existing in-house data sets.

Accessing pre-trained models involves costs related to adaptation or licensing. Leveraging a pre-trained model requires financial investment, but it’s generally more cost-effective than training a new model from the ground up.

While less demanding than training a model from scratch, fine-tuning and inference require substantial computing power and time. The resources needed for fine-tuning and inference, although reduced, remain a significant cost factor.

The process of fine-tuning requires skilled researchers and engineers who can effectively adapt the pre-trained model to the desired application. The cost of expertise includes the salaries and associated expenses of these professionals.

5.1. Pros of Fine-Tuning Existing LLMs

Fine-tuning existing LLMs is faster and cheaper than building from scratch, as it leverages pre-trained models, reducing development time and costs. It allows for domain specialization, tailoring the model to better handle specific queries. Also, we can benefit from the pre-trained model’s existing knowledge and abilities, enhancing overall performance.

5.2. Cons of Fine-Tuning Existing LLMs

Fine-tuning existing LLMs also comes with some limitations. First, it offers limited customization, as significant changes to the base model’s architecture or capabilities are often not feasible. It also requires substantial datasets for fine-tuning, which can be challenging and resource-intensive to obtain and prepare, especially for specialized domains.

6. What Makes Using Existing LLM APIs Expensive?

The most cost-effective option for the majority of projects. API pricing is typically based on usage metrics such as the number of tokens processed. This needs to be monitored and controlled as it can add up quickly for applications with high traffic.

Although using an API reduces the need for extensive infrastructure, some infrastructure and technical expertise are still required. Engineers are needed to integrate the API into our applications and to make any necessary adjustments to tailor the LLMs to our specific needs.

Ongoing maintenance is necessary to ensure that our applications remain compatible with updates to the API. This involves handling any issues that arise and adapting to changes made by the API provider, which can incur additional costs.

6.1. Pros of Using Existing LLM APIs

Pros of using existing LLM APIs include cost-effectiveness due to pay-as-you-go pricing, making them accessible for individuals and smaller businesses. Integration is typically straightforward, requiring minimal technical expertise. Also, utilizing an API means less infrastructure management as the backend is handled by the API provider.

6.2. Cons of Using Existing LLM APIs

On the other hand, drawbacks of utilizing existing LLM APIs come when we need to scale our systems, as the pay-as-you-go model can lead to significant expenses for high-traffic applications. Moreover, concerns about data privacy arise since using third-party APIs includes sharing sensitive information with the provider. Additionally, dependence on a third-party provider introduces risks related to service availability, as any issues or downtimes on their end can directly impact the functioning of applications relying on the API.

7. Conclusion

In this article, we’ve described why LLMs are expensive to develop. While building an LLM from scratch offers full control and potential innovation, it is the most costly and time-consuming option with high risks. Fine-tuning existing LLMs is a more economical alternative that allows for domain specialization but still requires considerable resources. Lastly, using existing LLM APIs is the most cost-effective for many projects but it can become expensive at scale and raises concerns about data privacy and third-party dependence.

Persistence

REST

Security