Use Case | Mode | Reason | Example Models/Apps |
Live captions & transcription | ✅ Fully Local | Latency-critical; real-time audio can't tolerate cloud round-trips | Windows Live Captions, Whisper on-device |
Real-time translation | ✅ Fully Local | Sub-20ms latency required; privacy sensitive | Whisper, Seamless M4T |
Writing autocomplete / suggestions | ✅ Fully Local | Keystroke-level latency; personal content stays private | Copilot+ on Windows, Apple Intelligence |
Smart email summarization | ✅ Fully Local | Personal/sensitive data; short-form task suits SLMs | Apple Mail AI, Outlook Copilot (local tier) |
Voice assistant (personal queries) | ✅ Fully Local | Privacy; always-on would be costly and slow over cloud | Siri (on-device), Google Gemini Nano |
Photo organization & tagging | ✅ Fully Local | Private media; classification is well within SLM range | Apple Photos, Google Photos on-device |
AI-assisted note-taking | ✅ Fully Local | Personal data; summarization is a strong SLM use case | Notion AI (local), Apple Notes |
Offline coding assistant (completions) | ✅ Fully Local | Works without internet; latency-sensitive | Copilot local mode, Continue.dev + Ollama |
Document Q&A (personal files) | ✅ Fully Local | Highly sensitive; RAG over local files suits 7B models | LlamaIndex + local model, Copilot+ |
Background noise removal (calls) | ✅ Fully Local | Real-time signal processing; NPU-optimized | NVIDIA RTX Voice, Windows Studio Effects |
Grammar/style checking | ✅ Fully Local | Short context, low complexity — SLMs excel | Grammarly on-device tier |
General-purpose chat (everyday Q&A) | 🔀 Hybrid | Local handles simple queries; cloud escalates complex ones | Copilot+, Apple Intelligence with ChatGPT fallback |
Coding assistant (complex tasks) | 🔀 Hybrid | Boilerplate → local; architecture/debugging → cloud | GitHub Copilot, Cursor |
Document creation & long-form writing | 🔀 Hybrid | Drafting → local SLM; refinement/research → cloud | Microsoft 365 Copilot |
Clinical note summarization | 🔀 Hybrid | A hybrid setup with on-prem inference and cloud-based model monitoring has proven effective for real-time clinical inference with minimal latency while maintaining compliance oversight | Mistral 7B + cloud monitoring |
Personal finance analysis | 🔀 Hybrid | Sensitive data processed locally; market data fetched from cloud | Custom RAG setups |
Semantic file/photo search | 🔀 Hybrid | Indexing runs locally; fuzzy or cross-device search may use cloud | Windows Recall (when enabled), Spotlight AI |
AI agents (personal tasks) | 🔀 Hybrid | Better small open-weight models are enabling more capable local tool use and agentic workflows, but complex multi-step planning still benefits from frontier model access | Qwen-Agent, local MCP setups |
Deep research & synthesis | ☁️ Fully Cloud | Requires broad world knowledge, long context (1M+ tokens), web access | Claude, GPT-5, Gemini |
Complex reasoning / math | ☁️ Fully Cloud | Frontier reasoning models (chain-of-thought at scale) far exceed on-device capability | Claude Opus, o3, Gemini Deep Think |
Multimodal generation (images/video) | ☁️ Fully Cloud | Diffusion models for quality image/video generation require massive VRAM | Midjourney, Sora, Gemini |
Real-time web knowledge | ☁️ Fully Cloud | By definition requires live internet access and large retrieval systems | All search-augmented LLMs |
Training / fine-tuning | ☁️ Fully Cloud | Requires sustained GPU clusters; not feasible on consumer hardware | AWS, Azure, GCP |
Enterprise-scale RAG (large corpora) | ☁️ Fully Cloud | Document stores of millions of files require server infrastructure | Azure AI Search, Pinecone |
High-stakes legal/medical decisions | ☁️ Fully Cloud | Requires frontier accuracy, audit trails, compliance guarantees | Enterprise Claude, GPT-5 |
Multi-user / collaborative AI | ☁️ Fully Cloud | Shared context across users/devices requires centralized state | Teams Copilot, Google Workspace AI |
Real-time fraud detection (banking) | ☁️ Fully Cloud | Needs global pattern recognition across millions of transactions | Cloud-hosted specialized models |