IP Carrier: "Tokens" are the New "FLOPS," "MIPS" or "Gbps"

Sunday, March 17, 2024

"Tokens" are the New "FLOPS," "MIPS" or "Gbps"

Modern computing has some virtually-universal reference metrics. For Gemini 1.5 and other large language models, tokens are a basic measure of capability.

In the context of LLMs, a token is the basic unit of text (for example) that the model processes and generates, usually measured in “tokens per second.”

For a text-based model, tokens can include individual words; subwords (prefixes, suffixes or characters) or special characters such as punctuation marks or spaces.

For a multimodal LLM, where images and audio and video have to be processed, content is typically divided into smaller units like patches or regions, which are then processed by the LLM. Each patch or region can be considered a token.

Audio can be segmented into short time frames or frequency bands, with each segment serving as a token. Videos can be tokenized by dividing them into frames or sequences of frames, with each frame or sequence acting as a token.

Tokens are not the only metrics used by large- and small-language models, but tokens are among the few that are relatively easy to quantify.

Metric	LLM	SLM
Tokens per second	Important for measuring processing speed	Might be less relevant for real-time applications
Perplexity	Indicates ability to predict next word	Less emphasized due to simpler architecture
Accuracy	Task-specific, measures correctness of outputs	Crucial for specific tasks like sentiment analysis
Fluency and Coherence	Essential for generating human-readable text	Still relevant, but might be less complex
Factual correctness	Important to avoid misinformation	Less emphasized due to potentially smaller training data
Diversity	Encourages creativity and avoids repetitive outputs	Might be less crucial depending on the application
Bias and fairness	Critical to address potential biases in outputs	Less emphasized due to simpler models and training data
Efficiency	Resource consumption and processing time are important	Especially crucial for real-time applications on resource-constrained devices

LLMs rely on various techniques to quantify their performance on attributes other than token processing rate.

Perplexity is measured by calculating the inverse probability of the generated text sequence. Lower perplexity indicates better performance as it signifies the model's ability to accurately predict the next word in the sequence.

Accuracy might compare the LLM-generated output with a reference answer. That might include precision (percent of correct predictions); recall (proportion of actual correct answers identified by the model) or F1-score that combines precision and recall into a single metric.

Fluency and coherence is substantially a matter of human review for readability, grammatical correctness, and logical flow.

But automated metrics such as BLEU score (compares the generated text with reference sentences, considering n-gram overlap); ROUGE score (similar to BLEU but focuses on recall of n-grams from reference summaries) or Meteor (considers synonyms and paraphrases alongside n-gram overlap).

So get used to hearing about token rates, just as we hear about FLOPS, MIPS, Gbps, clock rates or bit error rates.

FLOPS (Floating-point operations per second): Measures the number of floating-point operations a processor can perform in one second.
MIPS (Millions of instructions per second): Similar to IPS, but expressed in millions.

Bits per second (bps): megabits per second (Mbps), and gigabits per second (Gbps).
Bit error rate (BER): Measures the percentage of bits that are transmitted incorrectly.

Token rates are likely to remain a relatively easy-to-understand measure of model performance, compared to the others, much as clock speed (cycles the processor can execute per second) often is the simplest way to describe a processor’s performance, even when there are other metrics.

Other metrics, such as the number of cores and threads; cache size; instructions per second (IPS) or floating-point operations per second also are relevant, but unlikely to be as relatable, for normal consumers, as token rates.

IP Carrier

Sunday, March 17, 2024

"Tokens" are the New "FLOPS," "MIPS" or "Gbps"

No comments:

Gemini has Gotten Much Better at Power and Water Consumption for Inference

Translate

Blog Archive

Translate

Report Abuse

Pages