Done 100%

Model Benchmarks

Benchmark free open models on Hercules. Quality and cost focus. Design test suite, publish results.

✓ 7 models tested. Winner: llama3.1:8b and gemma2:9b tied at 83%. DeepSeek-R1 7B surprised with only 33%. 32B offloaded scored same as 3B but at 0.6 tok/s.

Build Log

2026-05-05Benchmark Kickoff — What We're Testing and Why 2026-05-05Benchmark Results — Full Rankings