This tutorial demonstrates a full parsing pipeline using Docling Parse to extract text cells (words, characters, lines) with page-level coordinates from a multi-element test PDF. It covers environment setup, generation of a PDF with columns, tables, vector shapes, and an embedded image, and extraction of structured JSON/CSV outputs. The workflow includes reconstruction of layout-aware reading order from word coordinates, rendering of cell overlays for inspection, and benchmarking of threaded parallel parsing. The resulting pipeline is suitable for document AI tasks such as layout analysis, table extraction, and preparation for retrieval-augmented generation (RAG).
Google Research has introduced a new Agentic RAG framework integrated into the Gemini Enterprise Agent Platform. The framework features a Sufficient Context Agent that iteratively searches until it gathers complete context before generating a response. This multi-agent architecture breaks down complex queries into subtasks, improving accuracy by up to 34% on factuality datasets compared to standard RAG. Tested on the FramesQA benchmark, the system achieved 90.1% accuracy in cross-corpus retrieval while maintaining low latency. The feature, called Cross-Corpus Retrieval, is now in public preview.
Harness-1 is a 20B retrieval subagent that separates search decisions from bookkeeping by using a stateful harness. It achieves 0.730 average curated recall across eight benchmarks, outperforming other open models and nearing frontier performance. The model is trained with supervised fine-tuning for interface operation and reinforcement learning for search policy, using a finite set of tools and a working memory. Weights and harness code are publicly released on Hugging Face and GitHub.