Showing posts sorted by date for query new normal. Sort by relevance Show all posts
Showing posts sorted by date for query new normal. Sort by relevance Show all posts

Sunday, March 17, 2024

"Tokens" are the New "FLOPS," "MIPS" or "Gbps"

Modern computing has some virtually-universal reference metrics. For Gemini 1.5 and other large language models, tokens are a basic measure of capability. 


In the context of LLMs, a token is the basic unit of text (for example) that the model processes and generates, usually measured in “tokens per second.”


For a text-based model, tokens can include individual words; subwords (prefixes, suffixes or  characters) or special characters such as punctuation marks or spaces. 


For a multimodal LLM, where images and audio and video have to be processed, content is typically divided into smaller units like patches or regions, which are then processed by the LLM. Each patch or region can be considered a token.


Audio can be segmented into short time frames or frequency bands, with each segment serving as a token. Videos can be tokenized by dividing them into frames or sequences of frames, with each frame or sequence acting as a token.


Tokens are not the only metrics used by large- and small-language models, but tokens are among the few that are relatively easy to quantify. 


Metric

LLM

SLM

Tokens per second

Important for measuring processing speed

Might be less relevant for real-time applications

Perplexity

Indicates ability to predict next word

Less emphasized due to simpler architecture

Accuracy

Task-specific, measures correctness of outputs

Crucial for specific tasks like sentiment analysis

Fluency and Coherence

Essential for generating human-readable text

Still relevant, but might be less complex

Factual correctness

Important to avoid misinformation

Less emphasized due to potentially smaller training data

Diversity

Encourages creativity and avoids repetitive outputs

Might be less crucial depending on the application

Bias and fairness

Critical to address potential biases in outputs

Less emphasized due to simpler models and training data

Efficiency

Resource consumption and processing time are important

Especially crucial for real-time applications on resource-constrained devices

LLMs rely on various techniques to quantify their performance on attributes other than token processing rate. 


Perplexity is measured by calculating the inverse probability of the generated text sequence. Lower perplexity indicates better performance as it signifies the model's ability to accurately predict the next word in the sequence.


Accuracy might compare the LLM-generated output with a reference answer. That might include precision (percent of correct predictions); recall (proportion of actual correct answers identified by the model) or F1-score that combines precision and recall into a single metric.


Fluency and coherence is substantially a matter of human review for readability, grammatical correctness, and logical flow. 


But automated metrics such as BLEU score (compares the generated text with reference sentences, considering n-gram overlap); ROUGE score (similar to BLEU but focuses on recall of n-grams from reference summaries) or Meteor (considers synonyms and paraphrases alongside n-gram overlap). 


So get used to hearing about token rates, just as we hear about FLOPS, MIPS, Gbps, clock rates or bit error rates.


  • FLOPS (Floating-point operations per second): Measures the number of floating-point operations a processor can perform in one second.

  • MIPS (Millions of instructions per second): Similar to IPS, but expressed in millions.

  • Bits per second (bps): megabits per second (Mbps), and gigabits per second (Gbps).

  • Bit error rate (BER): Measures the percentage of bits that are transmitted incorrectly.


Token rates are likely to remain a relatively easy-to-understand measure of model performance, compared to the others, much as clock speed (cycles the processor can execute per second) often is the simplest way to describe a processor’s performance, even when there are other metrics. 


Other metrics, such as the number of cores and threads; cache size; instructions per second (IPS) or floating-point operations per second also are relevant, but unlikely to be as relatable, for normal consumers, as token rates.


Tuesday, April 18, 2023

Non-Linear Development and Even Near-Zero Pricing are Normal for Chip-Based Products

It is clear enough that Moore’s Law played a foundational role in the founding of Netflix, indirectly led to Microsoft and underpins the development of all things related to use of the internet and its lead applications. 


All consumer electronics, including smartphones, automotive features, GPS, location services; all leading apps, including  social media, search, shopping, video and audio entertainment; cloud computing, artificial intelligence and the internet of things are built on the foundation of ever-more-capable and cheaper computing, communications and storage costs. 


For connectivity service providers, the implications are similar to the questions others have asked. Reed Hastings asked whether enough home broadband speed would exist, and when, to allow Netflix to build a video streaming business. 


Microsoft essentially asked itself whether dramatically-lower hardware costs would create a new software business that did not formerly exist. 


In each case, the question is what business is possible if a key constraint is removed. For software, assume hardware is nearly free, or so affordable it poses no barrier to software use. For applications or computing instances, remove the cost of wide area network connections. For artificial intelligence, remove the cost of computing cycles.


In almost every case, Moore’s Law removes barriers to commercial use of technology and different business models. The fact that we now use millimeter wave radio spectrum to support 5G is precisely because cheap signal processing allows us to do so. We could not previously make use of radio signals that dropped to almost nothing after traveling less than a hundred feet. 


Reed Hastings, Netflix founder, based the viability of video streaming on Moore’s Law. At a time when dial-up modems were running at 56 kbps, Hastings extrapolated from Moore's Law to understand where bandwidth would be in the future, not where it was “right now.”


“We took out our spreadsheets and we figured we’d get 14 megabits per second to the home by 2012, which turns out is about what we will get,” says Reed Hastings, Netflix CEO. “If you drag it out to 2021, we will all have a gigabit to the home." So far, internet access speeds have increased at just about those rates.


The point is that Moore’s Law enabled a product and a business model  that was not possible earlier, simply because computation and communications capabilities had not developed. 


Likewise, Microsoft was founded with an indirect reliance on what Moore’s Law meant for computing power. 


“As early as 1971, Paul (Allen) and I had talked about the microprocessor,” Bill Gates said in a 1993 interview for the Smithsonian Institution, in terms of what it would mean for the cost of computing. "Oh, exponential phenomena are pretty rare, pretty dramatic,” Gates recalls saying. 


“Are you serious about this? Because this means, in effect, we can think of computing as free," Gates recalled. 


That would have been an otherwise ludicrous assumption upon which to build a business. Back in 1970 a “computer” would have cost millions of dollars. 

source: AEI 


The original insight for Microsoft was essentially the answer to the question "What if computing were free?". Recall that Micro-Soft (later changed to MicroSoft before becoming today’s Microsoft) was founded in 1975, not long after Gates apparently began to ponder the question. 


Whether that was a formal acknowledgement about Moore’s Law or not is a question I’ve never been able to firmly pin down, but the salient point is that the microprocessor meant “personal” computing and computers were possible. 


A computer “in every house” meant appliances costing not millions of dollars but only thousands. So three orders of magnitude price improvements were required, in less than half a decade to a decade. 


“Paul had talked about the microprocessor and where that would go and so we had formulated this idea that everybody would have kind of a computer as a tool somehow,” said Gates.


Exponential change dramatically extends the possible pace of development of any technology trend. 


Each deployed use case, capability or function creates a greater surface for additional innovations. Futurist Ray Kurzweil called this the law of accelerating returns. Rates of change are not linear because positive feedback loops exist.


source: Ray Kurzweil  


Each innovation leads to further innovations and the cumulative effect is exponential. 


Think about ecosystems and network effects. Each new applied innovation becomes a new participant in an ecosystem. And as the number of participants grows, so do the possible interconnections between the discrete nodes.  

source: Linked Stars Blog 

 

So network effects underpin the difference in growth rates or cost reduction we tend to see in technology products over time, and make linear projections unreliable.


Sunday, November 13, 2022

Expect 70% Failure Rates for Metaverse, Web3, AI, VR Efforts in Early Days

It long has been conventional wisdom that up to 70 percent of innovation efforts and major information technology projects fail in significant ways, either failing to produce predicted gains, or producing a very-small level of results. If we assume applied artificial intelligence, virtual reality, metaverse, web3 or internet of things are “major IT projects,” we likewise should assume initial failure rates as high as 70 percent.


That does not mean ultimate success will fail to happen, only that failure rates, early on, will be quite high. As a corollary, we should continue to expect high rates of failure for companies and projects, early on. Venture capitalists will not be surprised, as they expect such high rates of failure when investing in startups. 


But all of us need to remember that failure rates for innovation generally and major IT efforts specifically will have high failure rates of up to 70 percent. So steel yourself for bad news as major innovations are attempted in areas ranging from metaverse and web3 to cryptocurrency to AR, VR or even less “risky” efforts such as internet of things, network slicing, private networks or edge computing. 


Gartner estimated in 2018 that through 2022, 85 percent of AI projects would deliver erroneous outcomes due to bias in data, algorithms or the teams responsible for managing them.


That is analogous to arguing that most AI projects will fail in some part. Seven out of 10 companies surveyed in one study report minimal or no impact from AI so far. The caveat is that many such big IT projects can take as much as a decade to produce quantifiable results. 


Investing in more information technology has often and consistently failed to boost productivity, or appear to have done so only after about a decade of tracking.  Some would argue the gains are there; just hard to measure, but the point is that progress often is hard to discern. 


Still, the productivity paradox seems to exist. Before investment in IT became widespread, the expected return on investment in terms of productivity was three percent to four percent, in line with what was seen in mechanization and automation of the farm and factory sectors.


When IT was applied over two decades from 1970 to 1990, the normal return on investment was only one percent.


This productivity paradox is not new. Even when investment does eventually seem to produce improvements, if often takes a while to produce those results. So perhaps even AI project near-term failure might be seen as a success a decade or more later. 


Sometimes measurable change takes longer. Information technology investments did not measurably help improve white collar job productivity for decades, for example. In fact, it can be argued that researchers have failed to measure any improvement in productivity. So some might argue nearly all the investment has been wasted.


Most might simply agree  there is a lag between the massive introduction of new information technology and measurable productivity results.


Most of us likely assume quality broadband “must” boost productivity. Except when it does not. The consensus view on broadband access for business is that it leads to higher productivity. 


But a study by Ireland’s Economic and Social Research Institute finds “small positive associations between broadband and firms’ productivity levels, none of these effects are statistically significant.”


Among the 90 percent of companies that have made some investment in AI, fewer than 40 percent report business gains from AI in the past three years, for example.


Sunday, November 6, 2022

"Sending Party Pays" is a Classic Example of Channel Conflict

Whatever positions one takes on whether a few hyperscale app providers ought to pay fees to internet service providers, there is no question that the emergence of the internet as the next-generation “telco” platform raises tricky issues about business models, competitive dynamics and available supplier responses. 


Differences in regulation of “public telephone networks,” radio and TV broadcast, cable TV and data networks always have existed. Those differences are exacerbated now that the internet has effectively become a universal distribution system for all content, communications and media. 


“Sending party pays” is a new concept that would make a few hyperscalers pay ISPs for usage by ISP customers. Ignore for the moment whether that is just, fair or reasonable. The concept highlights new business model strains in the internet ecosystem between content owners and distributors. 


Sending party pays also illustrates changes in the ways regulators might--or could--change their thinking about how to regulate various communication networks. There also are major issues around how much value chain participants can, or should, work out business agreements between themselves. 


That also necessarily raises questions about where value lies in the ecosystem, and what policies best promote the growth and health of the ecosystem. Industrial policy also is inextricably interwoven in those choices. 


Value chains are different for the internet, compared to traditional “telecommunications.” Traditional voice is a vertically-integrated app created, controlled and sold by telcos over their own networks. Enterprise wide area data networks provide another example. 


The internet is different: it consists of loosely-coupled ecosystem partners operating on “open” rather than “closed” networks. No app or content or commerce provider needs an internet service provider’s permission to be used by any internet-connected user (government permission is another matter). 


In other words, an ISP’s customer buys internet access service. The ISP does not control access to any internet-available app, service or site, and does not participate in a direct way in monetization of those apps, services and sites. 

source: Kearney 


Like it or not, an ISP’s role in the ecosystem lies in supplying internet access to its own customers. Some ISPs might also participate in other roles, but in their role as access provider, their revenues are based on access customer payments, supplemented in some cases by universal service payments, government subsidies or, in a few cases, advertising. 


That does not mean ISPs are barred from other roles and revenue streams. It does mean that in their role as access providers, their customers are the revenue drivers. 


That has been the general pattern for home broadband and mobile internet access: customers pay based on consumption, or potential consumption, with mobile services having the clearest consumption-based pricing. 


Mobile buckets of usage differentiated by potential consumption limits have been the norm, where for fixed networks “speed” has been the mechanism for pricing differential. 


The big principle is that the usage is paid for by the access customer. The proposed new taxes on content providers move beyond that framework, making a few content providers liable for usage, not just the access customers. 


At a high level, this is a somewhat normal sparring between buyers and sellers in a value chain, where one partner’s costs are another partner’s revenue. But there are issues. If an electrical utility requires more generation capacity, it has to build new power plants, encourage conservation or take other steps to match generation with consumption. 


If a water utility has to support more customers, homes and businesses, it has to increase supply, by building dams, acquiring new rights to tap aquifers or other bodies of water, or discourage consumption restraint, or both. 


There is an obvious remedy that ISPs have not taken, possibly because they feel they cannot do so: raise prices on customers (subscribers) that recover the costs of network capacity. Nor do ISPs generally take any measures to encourage conservation. They could do so; they simply do not. 


With the caveat that there are revenue or business reasons for such inaction, it nevertheless remains the case that ISPs could act themselves to better match capacity supply with customer demand.


Assuming network neutrality rules are not a fundamental issue, ISPs also could institute policies for trading partners that likewise discourage “wasteful” bandwidth consumption practices, such as enabling autoplay video. 


ISPs need the right to do so, if such practices benefit their customers by reducing the need to invest in new capacity at high rates without any compensation for doing so. 


To be sure, the problem results from the economics of delivery networks. Content delivery networks are most efficient when they can operate in multicasting mode (broadcasting). Those networks are least efficient when they must operate in unicast mode (traditional voice sessions or any form of on-demand access). 


In principle, edge-based content delivery networks help reduce wide area network capacity demand. It is never so clear even content delivery networks alleviate access network congestion, though. 


That leaves a few levers yet not pulled: raise subscriber prices to approach the full costs of actual usage, and create incentives for conservation. Subscribers could be rewarded for downloading content overnight (when networks have spare capacity), stored locally and then consumed later. 


Stripped to its essentials, channel conflict is what the telco-hyperscaler “sending party pays” proposals are about.


Thursday, October 20, 2022

Can VR/AR or Metaverse Wait 2 Decades for the Compute/Connectivity Platform to be Built?

The Telecom Infra Project has formed a group to look at metaverse-ready networks. Whether one accepts the notion of “metaverse” or not, virtually everyone agrees that future experiences will include use of extended, augmented or virtual reality on a wider scale. 


And that is certain to affect both computing and connectivity platforms, in the same way that entertainment video and gaming have shaped network performance demands, in terms of latency performance and capacity. 


The metaverse or just AR and VR will deliver immersive experiences that will require better network performance, for both fixed and mobile networks, TIP says. 


And therein lie many questions. If we assume both ultra-high data bandwidth and ultra-low latency for the most-stringent applications, both “computing” and “connectivity” platforms will be adjusted in some ways. 


Present thinking includes more use of edge computing and probably quality-assured bandwidth in some form. But it is not simply a matter of “what” will be required but also “when” resources will be required, and “where?”


As always, any set of performance requirements might be satisfied in a number of ways. What blend of local versus remote computing will work? And how “local” is good enough? What mix of local distribution (Wi-Fi, bluetooth, 5G and other) is feasible? When can--or should--remote resources be invoked? 


And can all that be done relying on Moore’s Law rates of improvement, Edholm’s Law of access bandwidth improvement or Nielsen’s Law of internet access speed? If we must create improvements at faster rates than simply relying on historic rates of improvement, where are the levers to pull?


The issue really is timing. Left to its own internal logic, the headline speed services in most countries will be terabits per second by perhaps 2050. The problem for metaverse or VR experience providers is that they might not be able to wait that long. 


That means the top-end home broadband speed could be 85 Gbps to 100 Gbps by about 2030. 

source: NCTA  


But most consumers will not be buying service at such rates. Perhaps fewer than 10 percent will do so. So what could developers expect as a baseline? 10 Gbps? Or 40 Gbps? And is that sufficient, all other things considered? 


And is access bandwidth the real hurdle? Intel argues that metaverse will require computing resources 1,000 times better than today. Can Moore’s Law rates of improvement supply that degree of improvement? Sure, given enough time. 


As a rough estimate, vastly-improved platforms--beyond the Nielsen’s Law rates of improvement--might be needed within a decade to support widespread use of VR/AR or metaverse use cases, however one wishes to frame the matter. 


Though the average or typical consumer does not buy the “fastest possible” tier of service, the steady growth of headline tier speed since the time of dial-up access is quite linear. 


And the growth trend--50 percent per year speed increases--known as Nielsen’s Law--has operated since the days of dial-up internet access.


The simple question is “if the metaverse requires 1,000 times more computing power than we generally use at present, how do we get there within a decade? Given enough time, the normal increases in computational power and access bandwidth would get us there, of course.


But metaverse or extensive AR and VR might require that the digital infrastructure  foundation already be in place, before apps and environments can be created. 


What that will entail depends on how fast the new infrastructure has to be built. If we are able to upgrade infrastructure roughly on the past timetable, we would expect to see a 1,000-fold improvement in computation support perhaps every couple of decades. 


That assumes we have pulled a number of levers beyond expected advances in processor power, processor architectures and declines in cost per unit of cycle. Network architectures and appliances also have to change. Quite often, so do applications and end user demand. 


The mobile business, for example, has taken about three decades to achieve 1,000 times change in data speeds, for example. We can assume raw compute changes faster, but even then, based strictly on Moore’s Law rates of improvement in computing power alone, it might still require two decades to achieve a 1,000 times change. 


source: Springer 


And that all assumes underlying demand driving the pace of innovation. 


For digital infrastructure, a 1,000-fold increase in supplied computing capability might well require any number of changes. Chip density probably has to change in different ways. More use of application-specific processors seems likely. 


A revamping of cloud computing architecture towards the edge, to minimize latency, is almost certainly required. 


Rack density likely must change as well, as it is hard to envision a 1,000-fold increase in rack real estate over the next couple of decades. Nor does it seem likely that cooling and power requirements can simply scale linearly by 1,000 times. 


So the timing of capital investment in excess of current requirements is really the issue. How soon? How Much? What Type?


The issue is how and when to accelerate rates of improvement? Can widespread use of AR/VR or metaverse happen if we must wait two decades for the platform to be built?

Saturday, June 4, 2022

Innovation Takes Time, Be Patient

Anybody who expected early 5G to yield massive upside in the form of innovative use cases and value has not been paying attention to history. Since 3G, promised futuristic applications and use cases have inevitably disappointed, in the short term. 


In part, that is because some observers mistakenly believe complicated new ecosystems can be developed rapidly to match the features enabled by the new next-generation mobile platform. That is never the case. 


Consider the analogy of information technology advances and the harnessing of such innovations by enterprises. There always has been a lag between technology availability and the retooling of business processes to take advantage of those advances. 


Many innovations expected during the 3G era did not happen until 4G. Some 4G innovations might not appear until 5G is near the end of its adoption cycle. The point is that it takes time to create the ubiquitous networks that allow application developers to incorporate the new capabilities into their products and for users to figure out how to take advantage of the changes. 


Non-manufacturing productivity, in particular, is hard to measure, and has shown relative insensitivity to IT adoption.






Construction of the new networks also takes time, especially in continent-sized countries. It easily can take three years to cover sufficient potential users so that app developers have a critical mass of users and customers. 


And that is just the start. Once a baseline of performance is created, the task of creating new use cases and revenue models can begin. Phone-based ride hailing did develop during the 4G era. 


But that was built on ubiquity of mapping and turn-by-turn directions, payment methods and other innovations such as social media and messaging.


Support for mobile entertainment video also flourished in 4G, built on the advent of ubiquitous streaming platforms. But that required new services to be built, content being assembled and revenue models created. 


The lag between technology introduction and new use cases is likely just as clear for business use cases. 


The productivity paradox remains the clearest example of the lag time. Most of us assume that higher investment and use of technology improves productivity. That might not be true, or true only under some circumstances. 


Investing in more information technology has often and consistently failed to boost productivity.  Others would argue the gains are there; just hard to measure.  There is evidence to support either conclusion.


Most of us likely assume quality broadband “must” boost productivity. Except when it does not. The consensus view on broadband access for business is that it leads to higher productivity. 


But a study by Ireland’s Economic and Social Research Institute finds “small positive associations between broadband and firms’ productivity levels, none of these effects are statistically significant.”


“We also find no significant effect looking across all service sector firms taken together,” ESRI notes. “These results are consistent with those of other recent research that suggests the benefits of broadband for productivity depend heavily upon sectoral and firm characteristics rather than representing a generalised effect.”


“Overall, it seems that the benefits of broadband to particular local areas may vary substantially depending upon the sectoral mix of local firms and the availability of related inputs such as highly educated labour and appropriate management,” says ESRI.


Before investment in IT became widespread, the expected return on investment in terms of productivity was three percent to four percent, in line with what was seen in mechanization and automation of the farm and factory sectors.


When IT was applied over two decades from 1970 to 1990, the normal return on investment was only one percent.


This productivity paradox is not new. Information technology investments did not measurably help improve white collar job productivity for decades. In fact, it can be argued that researchers have failed to measure any improvement in productivity. So some might argue nearly all the investment has been wasted.


Some now argue there is a lag between the massive introduction of new information technology and measurable productivity results, and that this lag might conceivably take a decade or two decades to emerge.


Work from home trends were catalyzed by the pandemic, to be sure. Many underlying rates of change were accelerated. But the underlying remote work trends were there for decades, and always have been expected to grow sharply. 


Whether that is good, bad or indifferent for productivity remains to be seen. The Solow productivity paradox suggests that applied technology can boost--or lower--productivity. Though perhaps shocking, it appears that technology adoption productivity impact can be negative


All of that should always temper our expectations. 5G is nowhere near delivering change. It takes time.


Cloud Computing Keeps Growing, With or Without AI

source: Synergy Research Group .  With or without added artificial intelligence demand, c loud computing   will continue to grow, Omdia anal...