Quality - Applied Language Understanding

Applied Language Understanding

Quality

A collection of 4 posts

Evaluating GPT-4-Turbo

Evaluating GPT-4-Turbo

Last Monday (November 6, 2023) OpenAI held their first developer day conference and unveiled several new features. To us, one of the most interesting announcements was the launch of the new GPT-4-Turbo LLM. If you've used GPT-4 for any length of time you know that, at least through

Evaluating Atlassian Intelligence

Evaluating Atlassian Intelligence

Back in April at their annual TEAM conference, Atlassian announced Atlassian Intelligence, a suite of AI-enabled features across the entire product line. This vision and level of commitment to AI's role in the future of every Atlassian product is very exciting. After a few months of eager anticipation,

Gaucho: our evaluation tool

Gaucho: our evaluation tool

We talked recently about the importance of evaluation in continously improving and keeping Connie AI honest. At the beginning, we were running our evaluations through command-line scripts. We would check the output files into git, and as part of the commit we would diff the results and visually check that

Zen and the art of Q.A.

Zen and the art of Q.A.

Our initial RAG-based question answering system worked surprisingly well out of the box, a testament to how good foundation models are. They are also relatively flimsy: a change in a prompt may cause a whole category of questions to suddenly not be answered correctly. In order to make sure we