Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I give it a week before we see tools for subtly watermarking your secret LLM's weights, so you can trace leaks like this later.


The original 4chan thread seems to indicate that the leaker verified that his hashes matched with another person who had access to the weights, to make sure that the weights aren't watermarked [0]

0: https://boards.4channel.org/g/thread/91848262#p91849855


The leaker accidentally doxxed themselves by adding the original download script to the torrent:

https://boards.4channel.org/g/thread/91848262#p91850503


knowing those guys, that could very well be planted as a practical joke on a friend of theirs :P


What are the possible consequences here?


Mark comes to your house and applies thick layer of sunscreen all over your face.


Could already have happened in these weights. Reminds me of when the movie studios started projecting random dot patterns during movies to try to catch which theaters were leading to bootlegs. Their approach was essentially defeated by pirates sourcing multiple versions and combining them. In this case, I suspect you could add a small normally distributed random number to some random subset of the weights and it would have very little impact on performance but would corrupt any watermark beyond recognition.


Watermarking the weights is trivial.

Watermarking the output is also possible, but more complex and with a statistical success rate Vs performance tradeoff.


I love the idea that LLMs will get watermarked in a way where you can ask them who they were built for and they just tell you.


If you find an AI generated response online, and ask GhatGPT if it was the author, it says "it was probably written by a human". But we all know there is a split infinitive here, and an archaic form there, and it knows. But it won't tell us.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: