Testing 9 OpenCode Go models on a Delphi/FireDAC code generation task

Spanish-to-English assisted translation

30 hours left on my one-month OpenCode Go deadline and I've only burned through 65% of my budget. That's what happens when you get hooked on DeepSeek V4 Flash.

I took the opportunity to stress-test the models with an extreme case of the actual work I throw at them daily. Many hours later, I now have a practical model roadmap for the months ahead.

Warning : this applies to me and my specific circumstances. Your results will likely differ. Please don't get mad.

Also keep in mind that these models are non-deterministic — the same prompt can produce different results on a different day due to server load, model updates, or fine-tuning changes on the provider side.

My takeaway : I need to start giving DeepSeek V4 Pro more work and stop over-relying on Flash.

IA Edit

The setup A single, deliberately absurd task: generate a Delphi DataModule ( .pas + .dfm ) implementing a complex nested dataset hierarchy using TFDMemTable with TDataSetField parent-child relationships — the FireDAC nested dataset pattern.

🧪 Reality check: This is not how we'd normally work. A sane developer would split this into multiple prompts, iterate, correct, and refine. We deliberately designed a stress test — single prompt, no do-overs, no sub-agents — to push models beyond their comfort zone and see where they break. Think of it as a benchmark torture test , not a production workflow.

⚠️ Disclaimer: This evaluates one specific task : generating FireDAC nested datasets from XSD schemas for a Delphi project — the exact type of work I use OpenCode Go for daily. The goal is practical: understand which models to use for which subtasks, not to crown a general winner. Results are specific to this domain, prompt design, and model configuration. Different ecosystems (Python, Java, web) or different task types (refactoring, debugging, testing) would likely produce different rankings. Take this as a data point for Delphi/FireDAC work, not a universal truth.

The model starts from a skeleton file (~2,700 lines PAS + ~6,200 lines DFM) and must add 20+ tables matching 5 XSD schemas with up to 5 levels of nesting , including elements with xsd:choice (no direct FireDAC equivalent), simpleContent with attributes (must be flattened to multiple fields), and 1:1 vs 0:N cardinality decisions.

Single prompt. No sub-agents. No parallel execution. No reading files not explicitly listed.

What the model had to read first Before writing a single line of code, the model ingested:

Type Content Size Delphi skills FireDAC patterns (CachedUpdates, auto-inc, nested datasets) ~600 lines FireDAC skills TFDMemTable, TDataSetField, persistence specifics ~1,300 lines Reference project Working Datos.pas from a similar project (~3,300 lines) 3,284 lines XSD schemas 5 schema files defining the XML structure ~240 KB total Project memory Context files: architecture decisions, pending items 967 lines The prompt itself Instructions, field specs, trap warnings, rules 7,911 chars / ~129 …

为什么值得关注

提供了用户原本不知道的新信息；能改变理解方式，而不只是重复常识；它带来了新的事实、进展或信息，不是在重复旧内容

来源：reddit，领域：tech，保留分：0.81

Sentry Stars

Testing 9 OpenCode Go models on a Delphi/FireDAC code generation task — scores, costs, and surpr