Thinkgap feed

AI signal, minus the noise.

Curated items are read from the processed items table and served as a bilingual feed.

19 items

MARKTECHPOSTJun 16, 2026Highlight

Meet Qwen-RobotSuite: Three Embodied AI Models for VLA Manipulation, Video World Modeling, and Navigation

The Qwen team released Qwen-RobotSuite, a suite of three independent embodied AI foundation models for robotics. Qwen-RobotManip is a Vision-Language-Action model based on Qwen3.5-4B that aligns heterogeneous manipulation data into a unified 80-dimensional action vector, achieving 1st place on RoboChallenge Table30-v1 and strong cross-embodiment transfer. Qwen-RobotWorld is a language-conditioned video world model using a 60-layer dual-stream MMDiT and a frozen Qwen2.5-VL encoder, ranking 1st overall on EWMBench and DreamGen Bench. Qwen-RobotNav is a scalable navigation model built on Qwen3-VL with a parameterized observation interface, reaching 76.5% success rate on VLN-CE RxR and enabling agentic planning. RobotManip and RobotNav have public GitHub repositories; RobotWorld is presented as a research paper.

MARKTECHPOSTJun 16, 2026

Hermes Agent Adds Asynchronous Subagents, Preventing Parent Chat Blocking

Nous Research’s open-source Hermes Agent now ships a non-blocking async_delegation toolset, tracked in GitHub issue #5586. The existing delegate_task, which froze the parent chat until all child subagents finished, is supplemented with asynchronous tools: delegate_task_async returns a task_id immediately, while check_task, steer_task, collect_task, cancel_task, and list_task manage the background run. Subagents remain strictly isolated—each gets a fresh conversation and only a final summary returns to the parent, preserving context windows. Background agents execute as in‑process threads using the same AIAgent machinery, model routing, and credential pool; users enable the update with `hermes update`.

MARKTECHPOSTJun 16, 2026Highlight

Google Cloud Introduces Open Knowledge Format (OKF): A Vendor-Neutral Markdown Spec for Giving AI Agents Curated Context

Google Cloud announced Open Knowledge Format (OKF) v0.1, an open, vendor-neutral specification that formalizes the LLM-wiki pattern as a portable directory of markdown files with YAML frontmatter. OKF is not a service or platform—it requires no SDK, runtime, or registry—and a bundle renders on GitHub, ships as a tarball, or mounts on any filesystem. Each concept is one markdown file identified by its path, with only one required field (type) in the frontmatter; cross-links between files form a knowledge graph that agents can navigate without translation. Google released reference tools including a BigQuery enrichment agent, a static HTML visualizer, and sample bundles. The format targets the scattered internal knowledge problem, letting agents consume curated, version-controlled context directly, unlike retrieval-augmented generation (RAG).

MARKTECHPOSTJun 16, 2026

Building a Complete Layout-Aware PDF Parsing Pipeline with Docling Parse

This tutorial demonstrates a full parsing pipeline using Docling Parse to extract text cells (words, characters, lines) with page-level coordinates from a multi-element test PDF. It covers environment setup, generation of a PDF with columns, tables, vector shapes, and an embedded image, and extraction of structured JSON/CSV outputs. The workflow includes reconstruction of layout-aware reading order from word coordinates, rendering of cell overlays for inspection, and benchmarking of threaded parallel parsing. The resulting pipeline is suitable for document AI tasks such as layout analysis, table extraction, and preparation for retrieval-augmented generation (RAG).

MARKTECHPOSTJun 14, 2026

A Coding Hands-On on FineWeb: Streaming, Filtering, Deduplication, Tokenization, and Large-Scale Web Corpus Analytics

A hands-on tutorial streams 3,000 documents from the FineWeb sample-10BT subset without downloading the full multi-terabyte corpus. It reproduces quality filters (Gopher, C4, custom), finding most already-passed due to pre-filtering. MinHash-based deduplication with 128 permutations and 0.7 threshold identifies few near-duplicate pairs, consistent with per-crawl deduplication. GPT-2 token counts are verified against the stored field, showing near-perfect match (mean absolute difference ~0). Analytics cover token distribution, language scores, characters per token, and top domains, providing practical insights for scaling corpus preprocessing pipelines.

MARKTECHPOSTJun 14, 2026Highlight

Databricks Open-Sources Omnigent: A Meta-Harness That Composes, Governs, and Shares AI Agents Across Claude Code, Codex, and Pi

Databricks released Omnigent, an Apache 2.0-licensed open-source meta-harness that standardizes the interface across terminal coding agents (Claude Code, Codex, Pi) and agent SDKs, turning them into interchangeable components. It adds a shared layer for composition (switching agents with one-line changes), contextual control (e.g., pausing at cost limits, requiring human approval for sensitive git pushes), and collaboration (sharing live agent sessions via URL). The architecture consists of a sandboxed runner with a uniform API and a policy server, and sessions sync across terminal, web UI, and mobile. An OS sandbox (Omnibox) secures credentials by injecting tokens only in approved proxy requests. Two example agents—Polly (a multi-agent coding orchestrator) and Debby (a two-headed brainstorming partner)—illustrate its patterns, and an interactive concept demo shows parallel agent delegation and policy enforcement.