Now repeat the question to the same model in different contexts several times and count what percentage of the time it’s correct.
Now repeat the question to the same model in different contexts several times and count what percentage of the time it’s correct.