Hi, I've been trying to find the right local model to run on my pc with 32GB ram and 3090 with 24Gb of Vram.

What do you recommend me?

Models Tested 1. Qwen3.5-35B-A3B (MoE, 3B active) — UD-Q4_K_XL Result: Tool-calling loops. Got stuck calling todo tool infinitely.

Fix attempted: Disabled thinking ( enable_thinking: false ), set repeat-penalty 1.0

Outcome: Still looped, just on different tools.

Known issue: Reddit threads confirms Qwen3.5/3.6 models get stuck in reasoning loops with Hermes.

2. Gemma 4 26B-A4B (MoE, 3.8B active) — UD-Q4_K_M Result: Tool-calling loops. Got stuck calling delegate_task infinitely.

Fix attempted: Disabled delegate_task in config.

Outcome: Started looping on search_files instead. Every tool it touched, it looped on.

VRAM: 18GB — fit well, left headroom.

3. Gemma 4 E4B (small, ~4B effective) — UD-Q4_K_XL Result: No looping, but misuses tools. Tries web_extract on local file paths. Calls text_to_speech instead of file tool. Cannot figure out which tool to use for basic tasks.

Looped on: text_to_speech (sent hundreds of WhatsApp voice messages).

What Works Basic conversation (all models): "hi", "what can you do?", memory recall — all fine.

Session search and memory work correctly.

WhatsApp gateway connects and delivers messages.

What Doesn't Work Tool calling with local models is unreliable. Every model loops or misuses tools when given Hermes' full toolset (~15 tools).

Reducing tools helps slightly but doesn't eliminate the problem.

The issue is not model-specific — it's a fundamental limitation of local models with complex agentic tool-calling frameworks.

[留言]

为什么值得关注

能改变理解方式,而不只是重复常识;符合当前抓取需求;它提供了新的理解或解释,而不只是表面观点

来源:reddit,领域:tech,保留分:0.61