99% of organizations don't evaluate AI agents before production.
Microsoft just made that visible.
ASSERT is an open-source framework that turns natural-language policies into executable agent tests.
It doesn't just score outputs. It records tool calls, intermediate reasoning, and routing behavior. Then scores failures against the exact policy that produced them.
Works across LangChain, CrewAI, OpenAI Agents SDK, and any OTel-traced agent. One framework. All stacks.
Microsoft claims 80-90% judge-human agreement on scoring.
This isn't a benchmark. It's a compliance layer for agents that drift from policy in production.
Run ASSERT against your agent's policy requirements today.
If you can't show a scorecard for your agent's behavior, you're shipping blind.
Agentic AI
Microsoft just open-sourced ASSERT. Your agent evaluation gap is now visible.
1 views
0 Comments