Inference might already represent up to 90 percent of all machine learning costs.
As AI adoption scales, cloud and data center operations will prioritize inference-driven AI workloads. That will highlight a growing need for specialized hardware optimized for inference tasks, and that arguably is where large end users (Amazon Web Services, Google Cloud, Meta and others) have been working to create homegrown solutions.
AWS and Google Cloud, for example, have invested heavily in developing their own AI accelerators, specifically designed for inference tasks.
The AWS Inferentia is purpose-built for AI inference workloads.
Google Cloud Tensor Processing Units are specifically designed for AI workloads, including inference. On the other hand, Meta also is developing its own custom chips for model training. And lots of capital is being invested in startups aiming to improve processing efficiency.
No comments:
Post a Comment