Virtually everyone believes artificial intelligence use cases are going to drive important changes in employment, work processes, applications, use cases, processing operations, power consumption and data center requirements, though precisely how much change will occur, and when, remains unclear.
More practically, firms and entities are having to estimate how much it will cost to create generative AI models and then draw inferences from those models.
The answers, inevitably, are that “it depends” on what one wishes to accomplish, using which engines, which compute platform, scraping how much data, how much customization for a generic model is required, the number of users of the model; the complexity of the tasks that the model supports; the amount of data that is needed to train the model and the cost of computing resources.
Context length, which determines the amount of information the LLM can consider when formulating an output, also affects pricing. if a generative AI model has a context length of 10, it will consider the 10 previous words when generating the next word.
The context length of ChatGPT-4 is 8,192 tokens for the 8K variant and 32,768 tokens for the 24K variant. This means that ChatGPT-4 can consider up to 8,192 or 32,768 previous words when generating the next word, depending on the variant.
The cost for using the GPT-4 8K context model API is about $0.03 per 1,000 tokens for input and $0.06 per 1,000 tokens for output.
Using the 32K context model, the cost is $0.06 per 1,000 tokens for input and $0.12 per 1,000 tokens for output.
And the cost of building a model, offered as a platform, is not the same as the cost for an entity to use that model, when offered as a subscription; a pay-per-use model or bundled as a feature.
Certainly, everyone expects model building, training and customization costs to come down over time. But the costs appear to be significant, whether enterprises choose to build using their in-house resources or use a cloud computing “as a service” provider.
The costs of building generic models will likely, over time, mostly be the province of LLM platform suppliers, as few entities will have the financial resources to build and train proprietary models.
For most entities, the relevant cost question will be “how much will it cost to use an existing platform,” including the cost of adapting (customizing) a generic model for a particular enterprise or entity.
For example, costs of generating inferences when using "as a service" providers are based on the number of tokens. A generative AI token is a unit of text or code that is used by a generative AI model to generate new text or code. Generative AI tokens can be as small as a single character or as large as a word or phrase.
As a simplified rule, the number of tokens can be likened to the number of words in a generated response, for example.
OpenAI offers a variety of generative AI models as a service through its API. Licensing costs range from $0.00025 to $0.006 per 1000 tokens for inference.
Google AI Platform offers a variety of generative AI models as a service through its Vertex AI platform. Licensing costs range from $0.005 to $0.02 per 1000 tokens for inference.
Microsoft Azure offers a variety of generative AI models as a service through its Azure Cognitive Services platform. Licensing costs range from $0.005 to $0.02 per 1000 tokens for inference.
The point is that the cost of deploying generative AI for any particular business function is highly variable at the moment.