Wednesday, August 9, 2023

Data Warehouses and Generative AI Model Training

Snowflake, Databricks, Teradata, Amazon Redshift, Google BigQuery or Microsoft Azure Synapse Analytics, to name the obvious contenders, are data warehouses whose value for building and running AI models is foundational. After all, AI models are applications that have to be housed someplace and must be queried to produce inferences. 


But some might note that those differences are relatively inconsequential compared to the alternative of trying to build models and make inferences on a private enterprise data warehouse platform. The point many would argue is that building big generative AI models, for example, on a private data warehouse basis is arguably less reasonable than doing so using a cloud-based approach. 


The ability to customize might be among the few areas where a private data warehouse might offer some advantages. 


Feature

Private Enterprise Data Warehouse

Snowflake

Databricks

Amazon Redshift

Google BigQuery

Azure Synapse Analytics

Processing speed

Depends on the hardware and software used

Very fast

Fast

Fast

Very fast

Very fast

Cost effectiveness

Can be expensive to set up and maintain

Cost-effective

Expensive

Expensive

Cost-effective

Cost-effective

Ease of use

Can be difficult to use for non-technical users

Easy to use

Difficult to use

Difficult to use

Easy to use

Easy to use

Security

Can be complex to implement and manage

Very secure

Secure

Secure

Very secure

Secure

Scalability

Can be difficult to scale up or down

Highly scalable

Highly scalable

Scalable

Highly scalable

Highly scalable

Other key attributes

Can be customized to meet specific needs

Columnar storage

Lakehouse architecture

Shared-disk architecture

Columnar storage

Hybrid architecture


Different observers might evaluate performance and other aspects of each platform differently. Still, the basic capabilities of any data warehouse are functionally the same as required to support AI. 


In some cases, relative strengths could be an advantage for artificial intelligence processing tasks, some might argue. But, as always, platform choices can turn on subtleties, including other choices a buyer already has made. 


Feature

Snowflake

Databricks

Amazon Redshift

Google BigQuery

Azure Synapse Analytics

Processing speed

Fast

Fast

Good

Good

Good

Cost effectiveness

Good

Variable

Variable

Excellent

Variable

Ease of use

Good

Challenging

Good

Excellent

Challenging

Security

Excellent

Excellent

Excellent

Excellent

Excellent

Scalability

Excellent

Excellent

Excellent

Excellent

Excellent

Other key attributes

Columnar storage

Unified analytics platform

Fully managed

Serverless

Hybrid


Such warehouses are crucial during the initial model training. Afterwards, experts say only some of the training data has to remain in the warehouse. But new data also is expected to be added over time, to update the model. 


And of course the data warehouses must be used to house the model, once built. Data warehouses are essential for inference queries, addition of new data over time. 


Platform

Queries per second

Snowflake

12,000

Amazon Redshift

9,000

Google BigQuery

8,000

Microsoft Azure Synapse Analytics

7,000


As a rule, some would say, large global enterprises, with vastly-larger amounts of data to use as part of the training, will be more costly than building models for mid-market firms with less-voluminous training mass. Small businesses with relatively limited amounts of data to parse will face smaller charges.


Most observers might tend to agree that training arguably will cost more for any entity, of any size, when conducted using private data resources, rather than engaging a cloud computing partner. 


Building a model and training it are precisely the sorts of “one off” activities information technology professionals are advised to outsource, rather than doing themselves.


Business size

Cost of building generative AI model on-premises

Cost of building generative AI model on the cloud

Fortune 500

$10 million - $100 million

$5 million - $50 million

Mid-market

$1 million - $10 million

$500,000 - $5 million

Small business

$100,000 - $1 million

$50,000 - $100,000


Small entity costs likely will fall over time as suppliers increasingly supply generic models, already trained, to the requirements of smaller entities. As always with any software, computing or application products, versions intended for small entities will not have the same robust features as provided to the largest enterprises, but will be far more affordable.


No comments:

Have LLMs Hit an Improvement Wall, or Not?

Some might argue it is way too early to worry about a slowdown in large language model performance improvement rates . But some already voic...