Processing is temporarily unavailable. The original item should be reviewed from its source link. This fallback keeps the item compatible with the processing contract.
Loading / 加载中
AI papers, releases, tools, and finance signals
Loading / 加载中
Infogap feed
Curated items are read from the processed items table and served as a bilingual feed.
Page 3 of 25
Processing is temporarily unavailable. The original item should be reviewed from its source link. This fallback keeps the item compatible with the processing contract.
The paper introduces SkMTEB, the first comprehensive MTEB-style text embedding benchmark for Slovak, comprising 31 datasets across 7 task types. Evaluation of 31 embedding models shows large instruction-tuned multilingual models perform best, while existing Slovak-specific NLU models transfer poorly to embedding tasks. The authors develop e5-sk-small (45M parameters) and e5-sk-large (365M) by vocabulary trimming and fine-tuning Multilingual E5 models. Despite size reductions of up to 62%, these open-source models achieve competitive performance with proprietary APIs and are suitable for local deployment in semantic search and RAG. The benchmark, models, datasets, and code are released openly, offering a replicable path for other under-resourced languages.
The paper introduces and formalizes the Recursive Agent Harness (RAH), a code-first extension of recursive language models where a parent agent generates executable scripts that spawn full subagent harnesses with filesystem tools, code execution, and planning. Controlled evaluation on Oolong-Synthetic (199 samples, context lengths up to 4M tokens) shows RAH with a fixed GPT-5 backbone improves the Codex coding-agent baseline from 71.75% to 81.36%. With a stronger backbone, Claude Sonnet 4.5, RAH achieves 89.77%, confirming the gains stem from the harness design rather than model scaling.
This paper investigates the geometric structure of recoverability in continual learning and introduces the Stable Recovery Manifold hypothesis. Using sequentially trained ResNet-18 on Split CIFAR-100, the authors define Recovery Subspace Dimensionality (k_t) as the minimum singular directions needed to retain 90% probe performance, and find it remains stable around a mean of 8.0 despite significant representational drift. Principal-angle drift between task subspaces strongly predicts recoverability (r = -0.862), and a simple geometric model explains 82.2% of recoverability variance. The results suggest catastrophic forgetting is primarily a problem of accessibility and manifold alignment, not information destruction, and that forgotten knowledge stays compactly decodable.
This paper introduces operads—mathematical structures modeling many-in, one-out operations—as a rigorous framework for question decomposition in LLMs. The authors define the questions operad Q in which operations are question templates and composition is substitution of sub-answers, and they interpret QA models as algebras over Q. A key contribution is operadic consistency, a metric that measures how well a model's answers agree across different partial collapses of a decomposition tree. Companion empirical work finds that operadic consistency strongly correlates with accuracy across twelve LLMs and four multi-hop QA datasets, outperforming temperature-based self-consistency baselines. The operadic perspective opens new analytical and improvement directions for multi-step reasoning.
Ion Matei et al. present a framework for aerial wildfire suppression planning that integrates a hybrid neural-cellular automaton fire spread model with gradient-based optimization. The model predicts spatially varying fire behavior from terrain, fuel, and wind inputs, while the intervention module decides binary drop actions with continuous location and orientation parameters. Water and retardant are represented distinctly, reducing active burning immediately or persistently lowering future spread. Aleatoric uncertainty is captured via Monte Carlo sampling of daily fire states, and epistemic uncertainty via spatially correlated prediction-error perturbations. A case study on the 2020 Bear Fire demonstrates the framework's ability to generate coherent suppression schedules and support uncertainty-aware strategy analysis.