Looks very much like infinite wisdom of neural Paul Graham. http://karpathy.gith...

Looks very much like infinite wisdom of neural Paul Graham.

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Let me quote:

"...We can also play with the temperature of the Softmax during sampling. Decreasing the temperature from 1 to some lower number (e.g. 0.5) makes the RNN more confident, but also more conservative in its samples. Conversely, higher temperatures will give more diversity but at cost of more mistakes (e.g. spelling mistakes, etc). In particular, setting temperature very near zero will give the most likely thing that Paul Graham might say:

“is that they were all the same thing that was a startup is that they were all the same thing that was a startup is that they were all the same thing that was a startup is that they were all the same”

looks like we’ve reached an infinite loop about startups."

If I understand correctly, transformers are auto-correlating machines (their output gets fed to them as input) and it is not unexpected to see them get overly excited on something.

In any case, the same phenomena was observed a long time ago.