Infogap feed

AI signal, minus the noise.

Curated items are read from the processed items table and served as a bilingual feed.

Page 1 of 17

Filters

SocialSource: XJun 15, 2026Importance: 3/5

Together AI has optimized serving of DeepSeek V4 Pro to achieve top performance on the Artificial Analysis benchmark, ranking #1 for both output speed (tokens per second) and latency. The inference optimizations tackled KV cache efficiency, prefix reuse, custom kernel implementation, and endpoint profiling. This breakthrough provides developers with the fastest DeepSeek V4 Pro API experience currently available. The company shared a detailed breakdown of their systems work via a linked blog post.

SocialSource: XJun 14, 2026Importance: 3/5

DeepSeek V4 Pro, when deployed on Together Compute's inference platform, has been ranked first in both latency and speed benchmarks. The announcement, originating from a tweet by Vipul Ved and retweeted by Together Compute, positions the model as the current leader in inference performance on the service. No specific metrics or comparative figures were disclosed in the social media post.

SocialSource: V2EXJun 14, 2026Importance: 2/5

Krill, an AI relay service, launched a 618 promotion from June 15–18, 2026, reducing base Codex model rates to as low as 0.15 and offering a 66% discount coupon on Codex plans. With a 10-person group buy, the effective rate reaches 0.1 Chinese yuan per US dollar. Existing Codex plan holders on June 15 will have their quotas adjusted to the 0.1 level. Claude model access is discounted only via balance top-ups, not plans. The service uses Pro accounts and emphasizes cost transparency.

SocialSource: V2EXJun 13, 2026Importance: 1/5

A V2EX user reported that a friend purchased a GLM annual subscription as a backup while primarily using OpenAI's Codex and ChatGPT. After recent policy-driven access restrictions (possible reference to “Fable” or similar incidents), that backup proved strategically valuable. The user warns against sole dependence on providers like OpenAI or Anthropic, whose policies can cut off access without notice, and plans to similarly secure a GLM annual plan. The post highlights growing community concerns over API dependency and the importance of having fallback options.

AI signal, minus the noise.

Filters

MiniMax Sparse Attention

PaddleOCR (v3/v4/v5/v6) Implemented in C++ with ncnn for Lightweight Deployment