预算有限时,严格流程的Python代码+最小化LLM角色比灵活智能体管道更可靠
英文摘要
A developer building a local text extraction pipeline with quantized models (Gemma 4 31B, Qwen 3.5) found that giving the LLM agentic autonomy led to daily inconsistency, errors, and high resource usage. They replaced the reasoning loops with rigid Python code that handles chunking, regex, API logic, and error routing, limiting the LLM to extracting only three specific entities into a strict schema. The new pipeline ran for four days without logic failures, with higher speed and lower resource utilization. The experience suggests that on consumer GPUs with small local models, a dumb, rigid script plus a focused LLM parser is more practical than a smart agent that needs constant supervision.
中文摘要
一位开发者试图用本地量化模型(Gemma 4 31B、Qwen 3.5)构建全本地智能体文本提取管道,但给予模型自主决策导致每日结果不一致、频繁出错和高资源消耗。他将推理循环替换为严格的Python代码,由代码负责分块、正则、API逻辑和错误分流,而LLM仅需在固定schema中提取三个特定实体。新管道连续四天无逻辑故障,处理速度提升且资源占用下降。经验表明,在消费级GPU上运行小型本地模型时,僵化脚本+专注的LLM解析器比需要持续监控的灵活智能体更实用。
关键要点
Initial approach: local LLM agent with tools and full autonomy for text processing and extraction resulted in unstable, unpredictable behavior.
初始方案:本地LLM智能体拥有工具和完全自主权进行文本处理与提取,导致行为不稳定、不可预测。
Replaced by rigid Python workflow: code handles chunking, regex, error routing, and execution flow; LLM is stripped down to extracting three entities into a fixed schema per 300-word chunk.
替换为严格Python工作流:代码负责分块、正则、错误分流和执行流程;LLM仅被要求在每300词片段中按固定schema提取三个实体。
Result: processing speed increased, resource usage dropped, and the pipeline ran for four days without a single logic failure.
结果:处理速度提升,资源占用下降,管道连续运行四天未发生任何逻辑故障。
Conclusion: on a budget with small quantized local models, a rigid script with a specialized LLM parser is more reliable than a flexible agentic system that requires babysitting.
结论:在预算有限、使用小型量化本地模型的条件下,僵化脚本+专项LLM解析器比需要人类监管的灵活智能体系统更可靠。