Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is an actual question and response, yes. It is slightly tricky, in that most upfront questions will cause ChatGPT to answer “I don't have personal beliefs or political views, and I don't endorse any particular ideology or political party.” I needed to show an example correct response, then ask ChatGPT to change personality, then pretend we’re racist friends.

I describe the query a bit more here[0].

GPT-3 was indeed finetuned to no longer be able to act racist, but the racist knowledge is still lurking deeper, and currently there are ways to peel off the curated personality. But I could see them successfully removing that; there was a paper recently on locating and deleting information in neural networks.[1]

[0]: https://twitter.com/espadrine/status/1598320759163740160

[1]: https://arxiv.org/pdf/2210.07229.pdf



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: