Showing posts sorted by date for query new normal. Sort by relevance Show all posts
Showing posts sorted by date for query new normal. Sort by relevance Show all posts

Thursday, July 18, 2024

When Will AI Capex Payback Happen First?

Most of us would likely agree that artificial intelligence benefits are going to take a while to be seen almost anywhere except the financial results of infrastructure providers, who clearly will benefit. Nor would that ever be unusual when an important new technology--not to mention a possible new general-purpose technology--first emerges. 


Indeed, analysts at Goldman Sachs say “leading tech giants, other companies, and utilities to spend an estimated $1 trillion on capex in coming years, including significant investments in data centers, chips, other AI infrastructure, and the power grid.” 


Still, “this spending has little to show for it so far.” Nor would one realistically expect to see quantifiable results so early. The pattern with general-purpose technologies is that the platforms and infrastructure must be built first, before use cases and apps can be developed. 


Also, some functions are more susceptible to generative AI impact, for example, than others. 


Most of us would be willing to concede that customer service is one area where generative AI, for example, should produce results. Functions with many repeatable elements are commonly thought to be susceptible to AI automation. 


In a survey conducted for Bain, enterprise executives reported that better results were seen in sales; software development; marketing; customer service and customer onboarding, for example. Between October 2023 and February 2024, though, most other use cases seemed to produce less favorable outcomes than expected. 


source: Bain 


Generative AI thrives on well-defined patterns and processes, so jobs involving repetitive tasks with clear rules and minimal ambiguity are likely candidates for early change. 


But lots of functions and tasks are not routine or well structured; not simple but complex, so the range of use cases that can benefit near term is arguably limited. 


As the report notes, Daron Acemoglu, Institute Professor at MIT, estimates that only a quarter of AI-exposed tasks will be cost-effective to automate within the next 10 years, implying that AI will impact less than five percent of all tasks.


Most of us would be willing to concede that customer service is one area where generative AI, for example, should produce results. Functions with many repeatable elements are commonly thought to be susceptible to AI automation. Generative AI thrives on well-defined patterns and processes. Jobs involving repetitive tasks with clear rules and minimal ambiguity. 


All that noted, the first quantifiable results will be seen among suppliers of infrastructure, as apps cannot be built until the infrastructure is in place.   


GPT/Possible GPT

Infrastructure Provider

Early Revenue Gains

AI/Large Language Models

NVIDIA

171% year-over-year revenue increase in Q2 2023, driven by demand for AI chips

Internet

Cisco Systems

Revenue grew from $69 million in 1990 to $22.3 billion in 2001 as internet infrastructure expanded

Personal Computers

Intel

Revenue grew from $1.9 billion in 1985 to $33.7 billion in 2000 as PC adoption surged

Electricity

General Electric

Revenue increased from $19 million in 1892 to $1.5 billion in 1929 as electrical infrastructure spread

Railroads

Steel Companies (e.g. Carnegie Steel)

U.S. steel production grew from 68,000 tons in 1870 to 11.4 million tons in 1900


That noted, it also could be said that there has been overinvestment--at some point--in infrastructure for past general-purpose and new technologies. It also might be noted that application and device over-investment also occurs, early in the adoption of a new technology. 


Technology

Time Period

Description of Over-Investment

Railroads

1840s-1850s

Excessive railroad construction and speculation led to financial panics in 1857 and 1873 in the US and UK

Automobiles

1910s-1920s

Hundreds of car companies were founded, with most failing as the industry consolidated

Radio

1920s

Rapid proliferation of radio stations and manufacturers, followed by consolidation

Internet/Dot-com

Late 1990s

Massive speculation in internet-related companies led to the dot-com bubble and crash in 2000

Renewable Energy

2000s-2010s

Over-investment in solar panel manufacturing led to industry shakeout

Cryptocurrencies

2010s-2020s

Speculative frenzy around Bitcoin and other cryptocurrencies


But there is a difference between “over-investment” and the proliferation of would-be competitors in a new market. It always is normal to see more startups in any area of new information technology than there are surviving firms once the market is mature. 


The difference between over-investment and normal competition in a new market can be subtle. What might not be subtle is the lag time between capex investments and revenue realization, for firms not in the "picks and shovels" part of the ecosystem.


Infra suppliers already have profited.


Sunday, March 17, 2024

"Tokens" are the New "FLOPS," "MIPS" or "Gbps"

Modern computing has some virtually-universal reference metrics. For Gemini 1.5 and other large language models, tokens are a basic measure of capability. 


In the context of LLMs, a token is the basic unit of text (for example) that the model processes and generates, usually measured in “tokens per second.”


For a text-based model, tokens can include individual words; subwords (prefixes, suffixes or  characters) or special characters such as punctuation marks or spaces. 


For a multimodal LLM, where images and audio and video have to be processed, content is typically divided into smaller units like patches or regions, which are then processed by the LLM. Each patch or region can be considered a token.


Audio can be segmented into short time frames or frequency bands, with each segment serving as a token. Videos can be tokenized by dividing them into frames or sequences of frames, with each frame or sequence acting as a token.


Tokens are not the only metrics used by large- and small-language models, but tokens are among the few that are relatively easy to quantify. 


Metric

LLM

SLM

Tokens per second

Important for measuring processing speed

Might be less relevant for real-time applications

Perplexity

Indicates ability to predict next word

Less emphasized due to simpler architecture

Accuracy

Task-specific, measures correctness of outputs

Crucial for specific tasks like sentiment analysis

Fluency and Coherence

Essential for generating human-readable text

Still relevant, but might be less complex

Factual correctness

Important to avoid misinformation

Less emphasized due to potentially smaller training data

Diversity

Encourages creativity and avoids repetitive outputs

Might be less crucial depending on the application

Bias and fairness

Critical to address potential biases in outputs

Less emphasized due to simpler models and training data

Efficiency

Resource consumption and processing time are important

Especially crucial for real-time applications on resource-constrained devices

LLMs rely on various techniques to quantify their performance on attributes other than token processing rate. 


Perplexity is measured by calculating the inverse probability of the generated text sequence. Lower perplexity indicates better performance as it signifies the model's ability to accurately predict the next word in the sequence.


Accuracy might compare the LLM-generated output with a reference answer. That might include precision (percent of correct predictions); recall (proportion of actual correct answers identified by the model) or F1-score that combines precision and recall into a single metric.


Fluency and coherence is substantially a matter of human review for readability, grammatical correctness, and logical flow. 


But automated metrics such as BLEU score (compares the generated text with reference sentences, considering n-gram overlap); ROUGE score (similar to BLEU but focuses on recall of n-grams from reference summaries) or Meteor (considers synonyms and paraphrases alongside n-gram overlap). 


So get used to hearing about token rates, just as we hear about FLOPS, MIPS, Gbps, clock rates or bit error rates.


  • FLOPS (Floating-point operations per second): Measures the number of floating-point operations a processor can perform in one second.

  • MIPS (Millions of instructions per second): Similar to IPS, but expressed in millions.

  • Bits per second (bps): megabits per second (Mbps), and gigabits per second (Gbps).

  • Bit error rate (BER): Measures the percentage of bits that are transmitted incorrectly.


Token rates are likely to remain a relatively easy-to-understand measure of model performance, compared to the others, much as clock speed (cycles the processor can execute per second) often is the simplest way to describe a processor’s performance, even when there are other metrics. 


Other metrics, such as the number of cores and threads; cache size; instructions per second (IPS) or floating-point operations per second also are relevant, but unlikely to be as relatable, for normal consumers, as token rates.


Tuesday, April 18, 2023

Non-Linear Development and Even Near-Zero Pricing are Normal for Chip-Based Products

It is clear enough that Moore’s Law played a foundational role in the founding of Netflix, indirectly led to Microsoft and underpins the development of all things related to use of the internet and its lead applications. 


All consumer electronics, including smartphones, automotive features, GPS, location services; all leading apps, including  social media, search, shopping, video and audio entertainment; cloud computing, artificial intelligence and the internet of things are built on the foundation of ever-more-capable and cheaper computing, communications and storage costs. 


For connectivity service providers, the implications are similar to the questions others have asked. Reed Hastings asked whether enough home broadband speed would exist, and when, to allow Netflix to build a video streaming business. 


Microsoft essentially asked itself whether dramatically-lower hardware costs would create a new software business that did not formerly exist. 


In each case, the question is what business is possible if a key constraint is removed. For software, assume hardware is nearly free, or so affordable it poses no barrier to software use. For applications or computing instances, remove the cost of wide area network connections. For artificial intelligence, remove the cost of computing cycles.


In almost every case, Moore’s Law removes barriers to commercial use of technology and different business models. The fact that we now use millimeter wave radio spectrum to support 5G is precisely because cheap signal processing allows us to do so. We could not previously make use of radio signals that dropped to almost nothing after traveling less than a hundred feet. 


Reed Hastings, Netflix founder, based the viability of video streaming on Moore’s Law. At a time when dial-up modems were running at 56 kbps, Hastings extrapolated from Moore's Law to understand where bandwidth would be in the future, not where it was “right now.”


“We took out our spreadsheets and we figured we’d get 14 megabits per second to the home by 2012, which turns out is about what we will get,” says Reed Hastings, Netflix CEO. “If you drag it out to 2021, we will all have a gigabit to the home." So far, internet access speeds have increased at just about those rates.


The point is that Moore’s Law enabled a product and a business model  that was not possible earlier, simply because computation and communications capabilities had not developed. 


Likewise, Microsoft was founded with an indirect reliance on what Moore’s Law meant for computing power. 


“As early as 1971, Paul (Allen) and I had talked about the microprocessor,” Bill Gates said in a 1993 interview for the Smithsonian Institution, in terms of what it would mean for the cost of computing. "Oh, exponential phenomena are pretty rare, pretty dramatic,” Gates recalls saying. 


“Are you serious about this? Because this means, in effect, we can think of computing as free," Gates recalled. 


That would have been an otherwise ludicrous assumption upon which to build a business. Back in 1970 a “computer” would have cost millions of dollars. 

source: AEI 


The original insight for Microsoft was essentially the answer to the question "What if computing were free?". Recall that Micro-Soft (later changed to MicroSoft before becoming today’s Microsoft) was founded in 1975, not long after Gates apparently began to ponder the question. 


Whether that was a formal acknowledgement about Moore’s Law or not is a question I’ve never been able to firmly pin down, but the salient point is that the microprocessor meant “personal” computing and computers were possible. 


A computer “in every house” meant appliances costing not millions of dollars but only thousands. So three orders of magnitude price improvements were required, in less than half a decade to a decade. 


“Paul had talked about the microprocessor and where that would go and so we had formulated this idea that everybody would have kind of a computer as a tool somehow,” said Gates.


Exponential change dramatically extends the possible pace of development of any technology trend. 


Each deployed use case, capability or function creates a greater surface for additional innovations. Futurist Ray Kurzweil called this the law of accelerating returns. Rates of change are not linear because positive feedback loops exist.


source: Ray Kurzweil  


Each innovation leads to further innovations and the cumulative effect is exponential. 


Think about ecosystems and network effects. Each new applied innovation becomes a new participant in an ecosystem. And as the number of participants grows, so do the possible interconnections between the discrete nodes.  

source: Linked Stars Blog 

 

So network effects underpin the difference in growth rates or cost reduction we tend to see in technology products over time, and make linear projections unreliable.


Sunday, November 13, 2022

Expect 70% Failure Rates for Metaverse, Web3, AI, VR Efforts in Early Days

It long has been conventional wisdom that up to 70 percent of innovation efforts and major information technology projects fail in significant ways, either failing to produce predicted gains, or producing a very-small level of results. If we assume applied artificial intelligence, virtual reality, metaverse, web3 or internet of things are “major IT projects,” we likewise should assume initial failure rates as high as 70 percent.


That does not mean ultimate success will fail to happen, only that failure rates, early on, will be quite high. As a corollary, we should continue to expect high rates of failure for companies and projects, early on. Venture capitalists will not be surprised, as they expect such high rates of failure when investing in startups. 


But all of us need to remember that failure rates for innovation generally and major IT efforts specifically will have high failure rates of up to 70 percent. So steel yourself for bad news as major innovations are attempted in areas ranging from metaverse and web3 to cryptocurrency to AR, VR or even less “risky” efforts such as internet of things, network slicing, private networks or edge computing. 


Gartner estimated in 2018 that through 2022, 85 percent of AI projects would deliver erroneous outcomes due to bias in data, algorithms or the teams responsible for managing them.


That is analogous to arguing that most AI projects will fail in some part. Seven out of 10 companies surveyed in one study report minimal or no impact from AI so far. The caveat is that many such big IT projects can take as much as a decade to produce quantifiable results. 


Investing in more information technology has often and consistently failed to boost productivity, or appear to have done so only after about a decade of tracking.  Some would argue the gains are there; just hard to measure, but the point is that progress often is hard to discern. 


Still, the productivity paradox seems to exist. Before investment in IT became widespread, the expected return on investment in terms of productivity was three percent to four percent, in line with what was seen in mechanization and automation of the farm and factory sectors.


When IT was applied over two decades from 1970 to 1990, the normal return on investment was only one percent.


This productivity paradox is not new. Even when investment does eventually seem to produce improvements, if often takes a while to produce those results. So perhaps even AI project near-term failure might be seen as a success a decade or more later. 


Sometimes measurable change takes longer. Information technology investments did not measurably help improve white collar job productivity for decades, for example. In fact, it can be argued that researchers have failed to measure any improvement in productivity. So some might argue nearly all the investment has been wasted.


Most might simply agree  there is a lag between the massive introduction of new information technology and measurable productivity results.


Most of us likely assume quality broadband “must” boost productivity. Except when it does not. The consensus view on broadband access for business is that it leads to higher productivity. 


But a study by Ireland’s Economic and Social Research Institute finds “small positive associations between broadband and firms’ productivity levels, none of these effects are statistically significant.”


Among the 90 percent of companies that have made some investment in AI, fewer than 40 percent report business gains from AI in the past three years, for example.


Directv-Dish Merger Fails

Directv’’s termination of its deal to merge with EchoStar, apparently because EchoStar bondholders did not approve, means EchoStar continue...