Monday, October 28, 2024

Here comes Large Language Model PC and Web Browser Control

Large Language Models are starting to shift capabilities from content creation to control of PC or web browser functions. 

Anthropic's updated Claude LLM gives users the option of granting the tool some control over a PC, including looking at a screen, moving a cursor, clicking buttons and typing text. 

Examples of what Claude can do include filling out forms, planning an outing, and building a website.

 

 Claude 3.5 Sonnet is the first frontier AI model to offer computer use in public beta. It goes without saying that such efforts will be subject to user concern about errors and mistakes as well as privacy and security.

Microsoft had to retreat on the Copilot+ PC Recall feature that stored a user's screen shots. Meant to help people find and remember things they've previously seen on their computer. But users seemed to dislike the privacy and security dangers. So the feature now is optional. 

Google, for its part, is said to be working on Jarvis, the next iteration of its Gemini generative AI model. Said to work with web browsers, Jarvis is said to be a tool to automate everyday web tasks such as by taking screenshots, clicking buttons or entering text. Perhaps more important, Jarvis is intended to help users make purchases, fill out forms, compile data into tables, open a series of webpages, or book flights online, for example. All those are examples of how AI can be integrated into useful common experiences for users. 

No comments:

Agentic AI Could Change User Interface (Again)

The annual letter penned by Satya Nadella, Microsoft CEO, points out the hoped-for value of artificial intelligence agents which “can take a...