Mercor’s APEX-Agents benchmark finds top AI models score under 25% accuracy on realistic consulting, legal, and finance tasks ...