Updates: Which Large Language Models are best for regulatory work?

The Regulatory Institute posted in February 2025 this article on “Which Large Language Models are best for regulatory work?”. Here is the first update.In a recent test aimed at drafting amendments to a bill, Mistral Le Chat, one of the two Large Language Models that have been highly rated so far, joined the group of underperformers alongside OpenAI’s various (Chat)GPT(s) and Google’s Gemini. For the first time, we tested the Large Language Model Manus. Manus performed slightly better than the other highly rated Large Language Model, Anthropic’s Claude Sonnet 3.7. Based on the new tests and past results, only one Large Language Model can be recommended for all drafting tasks: Claude Sonnet 3.7. Nevertheless, it is worthwhile using some other Large Language Models in parallel and comparing results. The recommendations in our previous article “Which Large Language Models are best for regulatory work?” remain valid but need to be updated to include Manus as top performer and nuanced with regard to Mistral’s Le Chat.

A few days after our last test, Anthropic released Claude Sonnet 4. It is said to have even better reasoning capabilities, and the initial results seem to confirm this. Anthropic could have an even greater lead over all Large Language Models except Manus with Claude Sonnet 4 than they did with Claude Sonnet 3.7.

Finally, we can confirm the observed trend of more “hallucinations” being produced by large language models.

–

2nd Update — April 2026: Whilst we have not conducted a systematic comparative assessment of the various large language models (LLMs) since early 2025, we have on a number of occasions revisited the question of whether alternative LLMs approach the performance of Claude Sonnet — the model recommended by us for legislative drafting tasks. That has not proven to be the case. Claude Sonnet in its standard version already outperforms the other LLMs used by our staff; the “extended thinking” variant does so to an even greater degree.

How to regulate?

The Regulatory Institute's Blog

Updates: Which Large Language Models are best for regulatory work?

Leave a Reply Cancel reply