Thinkgap feed

AI signal, minus the noise.

Curated items are read from the processed items table and served as a bilingual feed.

5 items

REDDIT MACHINELEARNINGJun 15, 2026

Cleo: Finetuning Qwen3.5-2B-Base into a Full Text-to-SQL Analyst with a Unified Harness

Cleo is an open-source text-to-SQL model built by finetuning Qwen3.5-2B-Base, designed to encapsulate full analyst behavior within a 2B parameter model. The system uses the same structured harness for training, evaluation, and inference, implementing a gather-repair-answer contract that includes live execution evidence during candidate query search. Key design choices include co-optimization of the model contract, SQL safety layer, dialect handling, timeouts, and clarification behavior. The model, harness, and datasets are fully open-source on GitHub and Hugging Face. This project demonstrates how tightly coupling training and inference in a single harness can enable small models to handle complex SQL generation and interactive debugging.

REDDIT MACHINELEARNINGJun 15, 2026

FeynRL: An Open-Source Framework for Transparent RL Post-Training of LLMs, VLMs, and Agents

Reddit user /u/summerday10 released FeynRL, an open-source framework designed to make reinforcement learning post-training for large language models, vision-language models, and agents fully transparent and modifiable. The framework exposes the entire training loop—data loading, rollout generation, reward computation, loss construction, optimization, and evaluation—so researchers can develop new algorithms without fighting hidden systems. It currently includes examples for supervised fine-tuning, DPO, and RL-style training and supports single-GPU, multi-GPU, and cluster setups. The project was motivated by the belief that open weights alone are insufficient; open training codebases that keep algorithms explicit and systems separate are necessary for advancing open ML/AI research.

REDDIT MACHINELEARNINGJun 10, 2026

Pyrecall: Open Source Tool for Detecting Catastrophic Forgetting During LLM Fine-Tuning

Pyrecall is a new open-source tool built to address the lack of practical tooling for continual learning research. It snapshots skill scores before and after fine-tuning, flags performance regressions, and supports rolling back LoRA adapters by name. The tool runs fully locally, is released under the MIT license at v0.1.0, and can be installed via pip. The developer is seeking community feedback on the benchmark design.

REDDIT MACHINELEARNINGJun 6, 2026

Does it make sense to use alternative quantizations of QAT models? [D]

The post discusses whether quantization-aware training (QAT) is designed to work specifically with one quantization method, such as Google's for Gemma-4, or if alternative quantizations like those from Unsloth are valid. Unsloth's quantizations of Gemma-4-QAT reportedly produce results closer to the QAT fine-tuned models. The author questions whether this closeness is beneficial or undermines the purpose of QAT, which is to emulate a particular inference-time quantization. The discussion highlights a potential trade-off between accuracy preservation and adherence to the original quantization scheme.

REDDIT MACHINELEARNINGJun 4, 2026Highlight

On-policy distillation: one of the hottest terms on PapersWithCode [R]

Niels from Hugging Face announces the addition of on-policy distillation (OPD) to PapersWithCode as a key term. OPD is a post-training technique used in models like Qwen 3.6, GLM-5.1, and DeepSeek-V4. The method involves injecting hint tokens to discourage specific errors during rollouts without regenerating new rollouts. A whiteboard explanation by Sasha Rush is linked, and the post invites feedback on other methods to add.