Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They're certainly aware of the test, but a turtle doing a kickflip on a skateboard? I seriously doubt they train their models for that.

https://x.com/JeffDean/status/2024525132266688757

If anything, the disastrous Opus4.7 pelican shows us they don't pelicanmaxx



I think I found the leaked Claude Mythos version of the turtle benchmark: https://www.youtube.com/watch?v=l82XWTKLZuk




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: