Technical assessments for the AI era
Candidates work on a real repo with their own AI tools. They clarify under-specified requirements via a PO chatbot and ship a PR. You score real work, not puzzles.
3 free pilots open for DACH engineering leaders.
The problem
Leetcode and take-homes stopped carrying signal.
Candidates finish them in 10 minutes with Claude or Copilot. You see who can operate a tool. Not who can solve problems.
The job of an engineer has shifted: read ambiguous requirements, work with AI, make decisions inside business constraints.
Classical assessments don't measure any of that.
Every company is hiring engineers who work with AI. None measure it properly. Arena closes that gap.
How Arena works
From clone to rubric-scored review
- 1.Candidate clones a real repo.
- 2.Task is intentionally under-specified. A PO chatbot answers clarifications but leaves ambiguity on purpose.
- 3.Candidate works in their own IDE, with their own AI tools, at their own pace.
- 4.Pushes a feature branch, opens a PR.
- 5.We score the work and hand you a rubric-based report.
What gets measured
Signals from real work
- Problem identification under ambiguity
- Requirements clarification and communication
- Code quality under real constraints
- Decisions, not syntax
Questions
Three things buyers want to know.
- Won't candidates cheat with AI?
- Cheat at what? The job is using AI. Arena measures the parts AI doesn't do for them: choosing what to build, negotiating ambiguity, writing a PR description that survives review.
- How long does it take a candidate?
- 60 to 120 minutes, in their own editor, on their own schedule. No timed window. The clock was always the wrong signal.
- How is Arena different from HackerRank or CoderPad?
- Browser sandboxes test recall. Arena tests work. A real repo, an under-specified ticket, a PO chat, and a PR your team would actually review. Output is easy. Judgment is what you're hiring for.
Pilot slots open
3 free pilot slots open. Trial on a real applicant or on an already-hired engineer as a calibration benchmark.