IP Carrier: Data Warehouses and Generative AI Model Training

Wednesday, August 9, 2023

Data Warehouses and Generative AI Model Training

Snowflake, Databricks, Teradata, Amazon Redshift, Google BigQuery or Microsoft Azure Synapse Analytics, to name the obvious contenders, are data warehouses whose value for building and running AI models is foundational. After all, AI models are applications that have to be housed someplace and must be queried to produce inferences.

But some might note that those differences are relatively inconsequential compared to the alternative of trying to build models and make inferences on a private enterprise data warehouse platform. The point many would argue is that building big generative AI models, for example, on a private data warehouse basis is arguably less reasonable than doing so using a cloud-based approach.

The ability to customize might be among the few areas where a private data warehouse might offer some advantages.

Feature	Private Enterprise Data Warehouse	Snowflake	Databricks	Amazon Redshift	Google BigQuery	Azure Synapse Analytics
Processing speed	Depends on the hardware and software used	Very fast	Fast	Fast	Very fast	Very fast
Cost effectiveness	Can be expensive to set up and maintain	Cost-effective	Expensive	Expensive	Cost-effective	Cost-effective
Ease of use	Can be difficult to use for non-technical users	Easy to use	Difficult to use	Difficult to use	Easy to use	Easy to use
Security	Can be complex to implement and manage	Very secure	Secure	Secure	Very secure	Secure
Scalability	Can be difficult to scale up or down	Highly scalable	Highly scalable	Scalable	Highly scalable	Highly scalable
Other key attributes	Can be customized to meet specific needs	Columnar storage	Lakehouse architecture	Shared-disk architecture	Columnar storage	Hybrid architecture

Different observers might evaluate performance and other aspects of each platform differently. Still, the basic capabilities of any data warehouse are functionally the same as required to support AI.

In some cases, relative strengths could be an advantage for artificial intelligence processing tasks, some might argue. But, as always, platform choices can turn on subtleties, including other choices a buyer already has made.

Feature	Snowflake	Databricks	Amazon Redshift	Google BigQuery	Azure Synapse Analytics
Processing speed	Fast	Fast	Good	Good	Good
Cost effectiveness	Good	Variable	Variable	Excellent	Variable
Ease of use	Good	Challenging	Good	Excellent	Challenging
Security	Excellent	Excellent	Excellent	Excellent	Excellent
Scalability	Excellent	Excellent	Excellent	Excellent	Excellent
Other key attributes	Columnar storage	Unified analytics platform	Fully managed	Serverless	Hybrid

Such warehouses are crucial during the initial model training. Afterwards, experts say only some of the training data has to remain in the warehouse. But new data also is expected to be added over time, to update the model.

And of course the data warehouses must be used to house the model, once built. Data warehouses are essential for inference queries, addition of new data over time.

Platform	Queries per second
Snowflake	12,000
Amazon Redshift	9,000
Google BigQuery	8,000
Microsoft Azure Synapse Analytics	7,000

As a rule, some would say, large global enterprises, with vastly-larger amounts of data to use as part of the training, will be more costly than building models for mid-market firms with less-voluminous training mass. Small businesses with relatively limited amounts of data to parse will face smaller charges.

Most observers might tend to agree that training arguably will cost more for any entity, of any size, when conducted using private data resources, rather than engaging a cloud computing partner.

Building a model and training it are precisely the sorts of “one off” activities information technology professionals are advised to outsource, rather than doing themselves.

Business size	Cost of building generative AI model on-premises	Cost of building generative AI model on the cloud
Fortune 500	$10 million - $100 million	$5 million - $50 million
Mid-market	$1 million - $10 million	$500,000 - $5 million
Small business	$100,000 - $1 million	$50,000 - $100,000

Small entity costs likely will fall over time as suppliers increasingly supply generic models, already trained, to the requirements of smaller entities. As always with any software, computing or application products, versions intended for small entities will not have the same robust features as provided to the largest enterprises, but will be far more affordable.

IP Carrier

Wednesday, August 9, 2023

Data Warehouses and Generative AI Model Training

No comments:

Yes, Follow the Data. Even if it Does Not Fit Your Agenda

Translate

Blog Archive

Translate

Report Abuse

Pages