The principle of "garbage in, garbage out" (GIGO) is a fundamental concept in computing that also applies to artificial intelligence and large language models and generative AI.
GIGO means that the output from a computer program is only as good as the quality of the input data. For generative AI, that means ingesting vast quantities of existing work that might hardly be considered factual, truthful, unbiased or balanced.
Training algorithms are themselves created by authors that might not always be capable of fully, or at least more-fully avoiding author bias.
And many of the difficult issues we have encountered before. All search engine results, for example, must necessarily be ranked on a page, if “quality” of result is desired. So humans must decide what appears first, what appears on the first page of a browser, and so forth. “Choice” therefore must necessarily happen.
And different people will disagree about what is “best,” in that regard. To a great extent, that problem exists because designers themselves have different ideas about what the “truth” of a matter happens to be. In journalism, at least as I learned it, an author’s choice of which adjectives to use with nouns can express bias. The use of an adjective can express bias.
These days, many could argue the problem is not so much the choice of adjectives but the prior decision made about what is the “news” of the day. Some stories that are not deemed to be “news” or worthy of publishing by some are considered quite differently by others.
So many media outlets will decide “stories about X” will not be published. Stories about “Y” must lead, not only be published. Those are expressions of bias, it would not be incorrect to say, keeping in mind the principle that humans, journalists and authors must always make choices, and choices can be examples of “bias.”
Judgments are choices, but also can be biased. Is something a “fact” or an “opinion?” What is “objective” and what is “subjective?” What might be called “truth” and what is “your truth” or “my truth?”
The point is that the established principle in computing that bad data leads to bad conclusions or output also applies to all AI training processes. And we have to deal with AI training the same way we deal with the quality of data to be analyzed or manipulated.
Which is to say, data must be curated. This includes ensuring that the data is representative of the real world, that it is free of bias, and that it is of high quality. But we will always run into trouble when different people do not agree on what is “representative of the real world,” what is “free of bias” and what is of “high quality.”
No comments:
Post a Comment