Skip to content
validtr
Search
K
Main Navigation
Guide
Concepts
Reference
Development
Operations
Releases
Roadmap
GitHub
Appearance
Menu
Return to top
On this page
Scoring
Current scoring for code tasks uses weighted dimensions:
Test passing:
40%
Execution success:
25%
Syntax validity:
15%
Completeness (LLM judge):
20%
Notes
Dedicated scorer is implemented for code tasks.
Other task types currently fall back to code scorer behavior.