Wednesday, December 27, 2023

LLM Costs Should Drop Over Time: They Almost Have To Do So

One reason bigger firms are likely to have advantages as suppliers and operators of large language models is that LLMs are quite expensive--at the moment--compared to search operations. All that matters for LLM business models.


Though costs should change over time, the current cost delta between a single search query and a single inference operation are quite substantial. It is estimated, for example, that a search engine query costs between $0.0001 and $0.001 per query.


In comparison, a single LLM inference operation might cost  between $0.01 and $0.10 per inference, depending on model size, prompt complexity, and cloud provider pricing. 


Costs might vary substantially if a general-purpose LLM is used, compared to a specialized, smaller LLM adapted for a single firm or industry, for example. It is not unheard of for a single inference operation using a general-purpose model  to cost a few dollars, for example, though costs in the cents per operation are likely more common. 


In other words, an LLM inference operation might cost 10 to 100 times what a search query costs. 


Here, for example, are recent quotes by Google Cloud’s Vertex AI service. 


Model

Type

Region

Price per 1,000 characters

PaLM 2 for Text (Text Bison)

Input

Global

  • Online requests: $0.00025

  • Batch requests: $0.00020

Output

Global

  • Online requests: $0.0005

  • Batch requests: $0.0004

Supervised Tuning

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing


Reinforcement Learning from Human Feedback

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing


PaLM 2 for Text 32k (Text Bison 32k)

Input

Global

  • Online requests: $0.00025

  • Batch requests: $0.00020

Output

Global

  • Online requests: $0.0005

  • Batch requests: $0.0004

Supervised Tuning

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing


PaLM 2 for Text

(Text Unicorn)

Input

Global

  • Online requests: $0.0025

  • Batch requests: $0.0020

Output

Global

  • Online requests: $0.007

  • Batch requests: $0.0060

PaLM 2 for Chat (Chat Bison)

Input

Global

  • Online requests: $0.00025

Output

Global

  • Online requests: $0.0005

Supervised Tuning

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing


Reinforcement Learning from Human Feedback

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing


PaLM 2 for Chat 32k (Chat Bison 32k)

Input

Global

  • Online requests: $0.00025*

Output

Global

  • Online requests: $0.0005*

Supervised Tuning

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing


Embeddings for Text

Input

Global

  • Online requests: $0.000025

  • Batch requests: $0.00002

Output

Global

  • Online requests: No charge

  • Batch requests: No charge

Codey for Code Generation

Input

Global

  • Online requests: $0.00025

  • Batch requests: $0.00020

Output

Global

  • Online requests: $0.0005

  • Batch requests: $0.0004

Supervised Tuning

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing


Codey for Code Generation 32k

Input

Global

  • Online requests: $0.00025

Output

Global

  • Online requests: $0.0005

Supervised Tuning

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing


Codey for Code Chat

Input

Global

  • Online requests: $0.00025

Output

Global

  • Online requests: $0.0005

Supervised Tuning

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing

Codey for Code Chat 32k

Input

Global

  • Online requests: $0.00025

Output

Global

  • Online requests: $0.0005

Supervised Tuning

us-central1

europe-west4

$ per node hour Vertex AI custom training pricing


Codey for Code Completion

Input

Global

  • Online requests: $0.00025

Output

Global

  • Online requests: $0.0005


But training and inference costs could well decline over time, experts argue.  Smaller, more efficient models are likely to develop, using cost-reduction techniques like parameter pruning, knowledge distillation, and low-rank factorization, some will argue. 


Sparse training methods focused only on the relevant parts of the model for specific tasks also will help. 


Use of existing pre-trained models that are fine-tuned for specific tasks also can reduce training costs. 


Dedicated hardware specifically optimized for LLM workloads already is happening. In similar fashion, optimizing training algorithms; quantization and pruning (removing unnecessary parameters); automatic model optimization (tools and frameworks that automatically optimize models for specific hardware and inference requirements) and open source all will help lower costs. 


Tuesday, December 26, 2023

Anthropic Makes Huge and First-Ever Move to Protect its Users from Copyright Lawsuits

In what appears to be a first for large language models, Anthropic says its new Commercial Terms of Service will “enable our customers to retain ownership rights over any outputs they generate through their use of our services and protect them from copyright infringement claims. 


It is an important move to promote use of large language models without fear of such legal actions, given the nascent state of AI copyright law as it applies to use of LLMs. One might also note it creates a huge new level of uncertainty about the business risks faced by LLMs, as such litigation is virtually certain, over time. 


“Under the updated terms, we will defend our customers from any copyright infringement claim made against them for their authorized use of our services or their outputs, and we will pay for any approved settlements or judgments that result,” Anthropic says. “These new terms will be live on January 1, 2024 for Claude API customers and January 2, 2024 for those using Claude through Amazon Bedrock.”


There are other steps LLMs can take to limit the uncertainty associated with copyright risks, such as using robust copyright filters, which help identify and flag potentially infringing content before it's generated or shared with users.


Ensuring transparency and responsible sourcing of training data, with clear mechanisms for identifying and excluding copyrighted material, also can minimize the risk of incorporating infringing elements into LLM outputs.


Establishing partnerships and clear guidelines for collaboration with copyright holders can lead to mutually beneficial licensing agreements and promote fair use of copyrighted material within LLMs, and is an obvious avenue. 


Beyond all that, copyright related to LLMs will develop over time based at least in part on prior rulings related to use of content. 


Several existing legal precedents offer potential legal avenues for addressing large language model (LLM) copyright issues, one might suggest. 


Fair use is an obvious issue, as large language models are trained on huge amounts of existing content. After all, all human knowledge is built on prior work, with only the unique expression of facts being protected by copyright, not the facts themselves. 


Campbell v. Acuff-Rose Music, Inc. (1994) established a four-factor test for fair use, considering the purpose and character of the use, the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect of the use upon the potential market for or value of the copyrighted work. 


Sony Music Entertainment v. Diamond Way Recordings, Inc. (2003) clarified the definition of a derivative work, stating that it must "recapture the essential elements of the original" and create a new work with a different purpose or character. 


Also, the case of Sony Corp. of America v. Universal City Studios, Inc. (Betamax VCR case) (1984) said the creation and use of a device solely for the purpose of making fair use copies of copyrighted works does not constitute copyright infringement.


Some might argue no LLM can create copyrighted material. Alfred A. Knopf, Inc. v. Colby (1992) held that an expert system's creative output lacked the requisite human authorship for copyright protection.


Is accumulated human knowledge similar to a database? If so, then some precedents related to databases could apply. Feist Publications, Inc. v. Rural Telephone Service Co., Inc. (1991) ruled on the scope of copyright protection for databases, stating that only the selection and arrangement of facts, not the underlying data itself, is protected. 


The European Union's "Text and Data Mining" exception allows certain research institutions to mine copyrighted works for non-commercial purposes without the copyright holder's consent. 


Also, open-source licenses like the GNU General Public License (GPL) could be relevant if LLMs are trained on datasets containing open-source materials.


New legal doctrines specific to AI and machine learning, such as the "scene a faire" doctrine and the "merger doctrine" limit copyright protection for elements that are dictated by the functionality or nature of a particular work.


Saturday, December 23, 2023

Who are the Big Internet Gatekeepers?

It sometimes strikes me as odd that regulators continue to believe network neutrality rules are necessary to restrain the “gatekeeper power” of internet service providers, since there now is relatively common understanding that gatekeeper power resides with content platforms and data aggregators. 


It is not the first time we have mistaken where “gatekeeper” power might exist. Compared to 1995, for example, neither regulators, industry nor consumers really believe browsers are a serious source of gatekeeper power. 


But that does not stop some from arguing that Google’s payments to Apple that make Google the default search engine on iPhones is similarly anti-competitive. 


It’s a complicated issue. On one hand it is not hard to change a browser default On the other hand, not many consumers seem to do so. 


The case of United States v. Microsoft Corporation was a major legal battle fought in the late 1990s and early 2000s between the United States government and Microsoft that was in large part about  preventing Microsoft from bundling Explorer with the operating system.


The case centered around whether Microsoft, the dominant player in the personal computer (PC) market at the time, had abused its monopoly power by bundling its Internet Explorer web browser with the Windows operating system. 


Google’s business deal with Apple that makes Google the default search engine similarly raises issues for some. 


But today’s potential gatekeepers are quite clear, and not related directly to search or browser market share, nor ISP market share. 


Instead, we tend to see danger in:

  • Platforms: Social media platforms like Facebook, Twitter, and YouTube have become powerful gatekeepers, controlling access to vast audiences and shaping online discourse through algorithms and content moderation policies.

  • Data-driven personalization: Search engines like Google and advertising platforms like Amazon leverage vast amounts of user data to personalize experiences and influence user behavior, creating targeted echo chambers and potentially manipulating information access.

  • E-commerce dominance: Amazon and other major online retailers control a significant portion of online commerce, influencing consumer choices and shaping the online marketplace.

  • Government regulation: Increased government involvement in regulating online content and data privacy adds another layer of gatekeeping power, raising concerns about censorship and control of information.


Worrying about ISPs as “gatekeepers” seems about as big an issue as web browsers and search being sources of antitrust danger. 


Granted, the antitrust arguments about “no charge to use” services is complicated, as it is next to impossible to cite the actual consumer harm for “free” products including search, email, browsers and social media. 


But even if an advocate uses some non-economic argument about harm, it is hard to see that browser, search or ISP choices and market share are the biggest dangers, in that regard.


Sunday, December 17, 2023

Will TinyML Displace Multi-Access Edge Computing?

New technologies have a way of redesigning older markets out of existence or reshaping older markets. TinyML, for example, is already a possible functional replacement for mobile edge computing. 


By enabling on-device intelligence, TinyML replaces (and limits the size of) the MEC revenue opportunity for mobile operators, as it supports AI and other processing right on the device. For example, if TinyML enabled on-device processing, it could cut MEC revenue in half. 


MEC Scenario

Growth Rate Assumption

Estimated Revenue (2024-2028)

Low

Slow adoption of MEC, primarily driven by enterprise use cases with limited consumer uptake.

$10-20 billion

Moderate

Moderate adoption across both enterprise and consumer sectors, with MEC-enabled services like augmented reality and connected vehicles gaining traction.

$30-50 billion

High

Rapid adoption fuelled by widespread deployment of 5G and increased integration of MEC into essential services like healthcare and smart cities.

$50-100 billion


Tiny machine learning (TinyML) is machine learning (including hardware, algorithms and software) capable of performing on-device sensor data analytics at extremely low power, enabling a variety of always-on use-cases and targeting battery operated devices.


It therefore enables any number of AI inference operations on a device, eliminating the need to transmit data to an external processor located elsewhere. Though part of the broader “AI at the edge” possibility, it further decentralizes AI inference operations, reduces latency to the greatest extent and likely will be used to support  highly-specialized inference operations running lightweight language models. 


Some obvious use cases include:

  • Wearables such as a fitness tracker that analyzes your movements in real-time, offering personalized coaching or detecting falls.

  • Smart homes devices that monitor temperature, humidity, and air quality, adjusting settings automatically.

  • Predictive maintenance on machinery to predict potential failures before they happen.

  • Environmental monitoring

  • Agricultural sensors to optimize irrigation, detect pests and diseases.


But note that such lists of use cases actually are substitutes for older categories such as “internet of things” sensors. In many cases, devices and software that support TinyML will be used by the same devices once touted as being in the  “IoT at the edge” category.


It is not a new development. In the past, we saw tablets and smartphones displace PCs.  Smartphones displaced watches, cameras, GPS sensors, home phones and pagers. Now watches have in many cases become wearable computers. 


By some estimates, at least $48 billion worth of global device and product sales are lost every year because smartphones have displaced them. Some estimates believe the substitution results in a lost $70 billion of legacy product and sales activity annually. 


Displaced Device

Annual Revenue Loss from Sales of Other Products Because Smartphone Replaces Them (USD Billion)

Assumptions

Home Phone Service

25-35

Includes landlines and traditional VoIP services. Assumes partial displacement, with landlines remaining in niche markets.

Digital Cameras (Standalone)

15-20

Considers point-and-shoot and high-end DSLR cameras. Excludes mirrorless cameras still maintaining market share.

Wristwatches (Traditional)

5-10

Accounts for lost sales of non-smart watches, with smartwatches capturing a new market segment.

GPS Devices (Standalone)

2-3

Considers dedicated car and portable GPS units. Increased smartphone navigation usage contributes to revenue loss.

MP3 Players and Portable Media Players

1-2

Includes digital audio players and video playback devices. Niche market remains for high-fidelity audio equipment.

Pagers

Negligible

Pagers are practically obsolete, with near-complete displacement by smartphones.


In the U.S. market, lost revenue likely was in the $8 billion range in 2023, for example. That arguably is a low estimate, as the loss of a single residential phone line is assumed to be only $30 a month worth of lost revenue. 


There also could be a loss of other revenue from cost recovery mechanisms and also customer churn when the phone line is part of a service bundle.


Displaced Device

Lost Revenue Estimate (USD Billion)

Notes

Home Phone Service

5.8

Based on decline in residential landline subscriptions and average monthly service fee of $30.

Camera Sales

1.9

Based on decline in point-and-shoot camera sales and estimated average camera price of ~$200.

Watch Sales

0.5

Based on decline in traditional watch sales and estimated average watch price of ~$100.


On the other hand, mobile phone service also creates new markets for mobile service providers. 


But you get the point: new technologies often can redefine older markets and in many cases can be substitutes for the legacy products and services. 


We have seen this process at work in estimates of revenue to be earned by mobile service providers using network slicing to create new types of virtual private networks. But traditional VPNs, private networks,  traffic prioritization or edge computing are substitutes for network slicing, for example.


Will Generative AI Follow Development Path of the Internet?

In many ways, the development of the internet provides a model for understanding how artificial intelligence will develop and create value. ...