Thinkgap feed

AI signal, minus the noise.

Curated items are read from the processed items table and served as a bilingual feed.

36 items

XJun 15, 2026

Cryptic social media post about LLM Arena provides no details

A post on X reacts to the LLM Arena with surprise but offers no specifics about rankings, model performance, or any change. The message consists solely of an exclamation and a t.co link that adds no context. No concrete information about which models or events prompted the reaction is included. The content is effectively empty from an informational standpoint.

XJun 15, 2026

Ethan Mollick criticizes headline on AI math study: Solving 7/10 novel very hard problems is significant progress

Ethan Mollick pushes back against a headline suggesting AI 'did not live up to the task' when a study found it solved 7 out of 10 novel very hard math problems. He notes that 15 months ago LLMs could not do math at all, so this represents substantial improvement. The study itself illuminates both the flaws and successes of AI in mathematical reasoning. The tweet highlights the danger of misinterpreting AI benchmark results when progress is rapid. Mollick frames the result as impressive rather than a failure.

XJun 15, 2026

Deleted Tweet: Many API users underestimate the power of frontier models in native harnesses

Ethan Mollick (emollick) deleted a tweet stating that API users often fail to understand how much more powerful frontier AI models are when used in their native harnesses compared to bare API access. He removed the post because the character limit prevented him from distinguishing between those who carefully evaluate models in different harnesses for tasks and those who simply use the naked API. The observation points to a common misperception about model performance tied to deployment context.

XJun 14, 2026

Methodological Thread Examines Debate Over Paper Claiming Generalist Models Outperform Specialized Medical AI

Ethan Mollick shares a methodological thread that dissects a debate over a recent paper. The paper reportedly finds that generalist AI models outperform specialized medical AI systems. The thread also outlines challenges in benchmarking AI in medicine. No specific details about the paper, models, or benchmarks are provided.

XJun 14, 2026

Zhengyao Jiang Benchmarks 7 Frontier Models on Autoresearch Tasks

A benchmark was conducted comparing seven frontier models on two categories of autoresearch tasks: ML engineering and harness/prompt engineering. The tweet did not disclose the specific models tested or their performance results. No further details were provided.

XJun 14, 2026

DeepSeek V4 Pro on Together AI is now #1 on Artificial Analysis for both output speed and latency.

Together AI has optimized serving of DeepSeek V4 Pro to achieve top performance on the Artificial Analysis benchmark, ranking #1 for both output speed (tokens per second) and latency. The inference optimizations tackled KV cache efficiency, prefix reuse, custom kernel implementation, and endpoint profiling. This breakthrough provides developers with the fastest DeepSeek V4 Pro API experience currently available. The company shared a detailed breakdown of their systems work via a linked blog post.