Harvey, the legal AI company, has released LAB, an open-source benchmark designed to measure how well artificial intelligence handles legal work. The new evaluation tool spans 24 practice areas and includes more than 1,200 individual tasks, offering a way to compare AI systems head-to-head.
What the Benchmark Covers
LAB doesn't just test one type of lawyering. It covers a broad spectrum — from corporate transactions to litigation, regulatory compliance to intellectual property. Each task is designed to mimic a real legal assignment, such as drafting a clause, summarizing a case, or identifying a risk in a contract. The 24 practice areas mean the benchmark can assess both general legal knowledge and specialized expertise.
Why an Open-Source Standard Matters
Right now, there's no widely accepted way to judge legal AI. Different companies run their own tests, often keeping results private. Harvey's LAB is open-source, so anyone can inspect the tasks, run the tests, and submit results. That transparency could help law firms, in-house legal teams, and regulators make better comparisons. It also lets the legal tech community contribute new tasks and practice areas over time.
Potential Impact on the Legal Industry
For law firms evaluating AI tools, LAB provides a consistent yardstick. Instead of relying on vendor claims or limited demos, they can run the benchmark themselves. That could speed up adoption — or reveal gaps in current systems. For AI developers, the benchmark highlights where models struggle, pushing them to improve. Harvey itself uses LAB internally, but making it public invites broader scrutiny and collaboration.
What Comes Next
The benchmark is available now on GitHub under an open-source license. Harvey says it plans to update LAB regularly, adding new tasks as legal work evolves. The big question is whether other legal AI companies will adopt the benchmark — or build their own. Without widespread buy-in, any single benchmark's value is limited. For now, LAB gives the legal industry a place to start measuring AI performance, task by task.




