Thinkgap feed

AI signal, minus the noise.

Curated items are read from the processed items table and served as a bilingual feed.

10 items

MEDIUM LARGE LANGUAGE MODELSJun 14, 2026

Context Window in LLMs: Working Memory Behind AI

This brief tutorial defines the context window in large language models. It explains that a context window is the amount of information an AI model can read and use at once before generating a response. The article serves as an introductory overview of a key LLM concept.

MEDIUM LARGE LANGUAGE MODELSJun 13, 2026

Inside the LLM KV Cache: The Hidden System Behind Fast AI Inference

The provided article body contains only an introductory teaser sentence, with the full content inaccessible behind Medium's continue-reading wall. No concrete information about KV caching, specific models, or inference optimizations is present in the raw content.

MEDIUM LARGE LANGUAGE MODELSJun 13, 2026

Agent Token Usage Balloons from 300 to 7,000 Over 20 Turns, a 20× Increase

A user measured input token costs for an AI agent browsing similar pages over 20 turns. Turn 1 consumed roughly 300 tokens, while turn 20 consumed 7,000 tokens—a 20× increase—as the agent re-reads all previous context. The observation highlights a hidden “context tax” that drives up inference costs in multi-turn agent workflows.

MEDIUM LARGE LANGUAGE MODELSJun 12, 2026

Build a Local AI Coding Assistant with Ollama, Qwen, and VS Code on Mac (No GPU Required)

The article addresses common pain points of cloud-based AI coding tools, such as rate limits, privacy concerns, and connectivity dependence. It presents a tutorial for creating a local alternative by serving the Qwen model via Ollama and integrating it with VS Code on a Mac, without requiring a GPU. The guide walks through the setup process to enable offline, private code assistance.

MEDIUM LARGE LANGUAGE MODELSJun 11, 2026

A Complete Beginner's Guide to Local LLM Inference

This Medium article by Khansa Khanam is billed as a beginner's guide to local LLM inference. The teaser content only asks 'What does Inference actually mean?' and prompts readers to continue reading on Medium. No specific facts, tools, models, or methods are described in the available snippet.

MEDIUM LARGE LANGUAGE MODELSJun 10, 2026

Medium Post Links to Auriko AI Report on LLM Cost Arbitrage via Cache-Aware Routing

The Medium post by Michael Yang contains no detailed content; it merely points readers to an external report at auriko.ai/reports/llm-cost-arbitrage. No quantification of cost savings, technical methodology, or experimental results is included in the raw content. The only available information is the title's mention of cache-aware inference routing. Thus, the post itself does not convey any substantive findings.