When people say GPT-X has "passed the turing test" I am reminded of these glitches which simply do not happen to a typical person. I have encountered other issues where a slight change in phrasing leads to totally different types of output, despite the meaning of the two phrasings being the same to a normal person.
> I have encountered other issues where a slight change in phrasing leads to totally different types of output, despite the meaning of the two phrasings being the same to a normal person.
I feel like a human might do the same thing, the words might have slightly different connotations which makes them think of different ideas. We can’t test that with a human because we can’t reset them to a previous state whereas with an LLM you can
You can do statistical analysis of groups of humans and look for divergences, and then create a test which tests for all of the diverged criteria. A human might respond out of the norm for a few questions, but would fit closer to the human group overall.
I am extremely cautious about drawing analogies between these systems and human brains. We decided to call them "neural networks" because it sounds nice but they are actually extremely different from human brains.