Anthem
New member
The team recently pushed an update to the AI agent and it completely tanked the logic. Every query started producing massive hallucinations, making the bot totally useless for any real work. Had to roll everything back to the previous version just to keep things operational. Now the guys need to properly run the new model through the existing queries and tune the prompts for the specific version before attempting another deployment. What tools are available for this kind of testing and prompt optimization across different model versions?