Thinkgap feed

AI signal, minus the noise.

Curated items are read from the processed items table and served as a bilingual feed.

3 items

MARKTECHPOSTJun 16, 2026

Building a Complete Layout-Aware PDF Parsing Pipeline with Docling Parse

This tutorial demonstrates a full parsing pipeline using Docling Parse to extract text cells (words, characters, lines) with page-level coordinates from a multi-element test PDF. It covers environment setup, generation of a PDF with columns, tables, vector shapes, and an embedded image, and extraction of structured JSON/CSV outputs. The workflow includes reconstruction of layout-aware reading order from word coordinates, rendering of cell overlays for inspection, and benchmarking of threaded parallel parsing. The resulting pipeline is suitable for document AI tasks such as layout analysis, table extraction, and preparation for retrieval-augmented generation (RAG).

MARKTECHPOSTJun 8, 2026Highlight

Google Research Adds Agentic RAG to Gemini Enterprise Agent Platform with a Sufficient Context Agent for multi-hop queries

Google Research has introduced a new Agentic RAG framework integrated into the Gemini Enterprise Agent Platform. The framework features a Sufficient Context Agent that iteratively searches until it gathers complete context before generating a response. This multi-agent architecture breaks down complex queries into subtasks, improving accuracy by up to 34% on factuality datasets compared to standard RAG. Tested on the FramesQA benchmark, the system achieved 90.1% accuracy in cross-corpus retrieval while maintaining low latency. The feature, called Cross-Corpus Retrieval, is now in public preview.

MARKTECHPOSTJun 7, 2026Highlight

Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a Stateful Search Harness on gpt-oss-20b

Harness-1 is a 20B retrieval subagent that separates search decisions from bookkeeping by using a stateful harness. It achieves 0.730 average curated recall across eight benchmarks, outperforming other open models and nearing frontier performance. The model is trained with supervised fine-tuning for interface operation and reinforcement learning for search policy, using a finite set of tools and a working memory. Weights and harness code are publicly released on Hugging Face and GitHub.