Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What's your suggestion for an alternative headline?


  Can O3 Beat a Master-Level GeoGuessr?
  How Good is O3 at GeoGuessr?
  EXIF Does Not Explain O3's GeoGuessr's Performance
  O3 Plays GeoGuessr (EXIF Removed)
But honestly, OP had the foresight to remove EXIF data and memory from O3 to reduce contamination. The goal of the blog post was to show that O3 wasn't cheating. So by including search, they undermine the whole point of the post.

The problem really stems from the lack of foresight. Lack of misunderstanding the critiques they sought to address in the first place. A good engineer understands that when their users/customers/<whatever> makes a critique, that what the gripe is about may not be properly expressed. You have to interpret your users complaints. Here, the complaint was "cheating", not "EXIF" per se. The EXIF complaints were just a guess at the mechanism in which it was cheating. But the complaint was still about cheating.


>The goal of the blog post was to show that O3 wasn't cheating.

No, the goal of the post was to show that o3 has incredible geolocation abilities. It's through the lens of a Geoguessr player who has experience doing geolocation, and my perspective on whether the chain of thought is genuine or nonsense.

In Simon's original post, people were claiming that o3 doesn't have those capabilities, and we were fooled by a chain of thought that was just rationalizing the EXIF data. It only had the _appearance_ of capability.

The ability to perform web search doesn't undermine the claim that o3 has incredible geolocation abilities, because it still needs to have an underlying capability in order to know what to search. That's not true for simply reading EXIF data.

This is the best way I knew to show that the models are doing something really neat. Disagreements over the exact wording of my blog post title seem to be missing the point.


  > No, the goal of the post was to 
I think you misinterpret my point. The goal of your post is distinct from how people will interpret it. Plenty of times people intend one thing and get a different thing. That's life.

  > In Simon's original post, people were claiming that o3 doesn't have those capabilities, and we were fooled by a chain of thought that was just rationalizing the EXIF data. It only had the _appearance_ of capability.
And this is the key part!

The people questioning O3's capabilities were concerned with cheating. Any mention of EXIF is a guess as to how it was cheating, but the suspicion is still that it is cheating. That's the critique!

If you framed the title as "O3 Does Not Need EXIF Data To Beat A Master-Level GeoGuessr" then I wouldn't have made my comment. The claim is much more specific and reflects the results of your post. You did in fact show that it doesn't need EXIF data to do what it does! BUT by framing it as "Beats a Master-Level" there is an implicit claim that both of you are playing the same game. The fact that you weren't is the issue.

Look at it this way. If I said I beat Tiger Woods at golf and then casually slipped in that I was playing with a handicap, wouldn't you feel a bit lied to? You'd think "Did Godelski really beat Tiger Woods?", and you would mean without the handicap. You'd have every right to be suspicious! And you'd have every right to dismiss me.

Most importantly, take a second here. My whole point is that you can make a much stronger claim! One where there wouldn't be a significant divergence between title and content. I get that it is frustrating to receive criticism, but even if you believe I'm wrong to do so, is it not more effective to show me up by just redoing without search? If you do that, then you only end up with a stronger claim. But by disagreeing and arguing here you're just not convincing me. Even if you disagree with my interpretation of the title, you know full well that it is a valid interpretation. Given the pushback from other comments I think you can't deny that it isn't an unexpected one. So the only way to resolve this is to either change the title or change the data. Besides, you responded to the top comment about how it was a fair criticism. All I've done is explain why the criticism was made in the first place!

And yes, it still undermines the result. Because that is entirely dependent on the (interpretation of the) claim that was made. Your results are still valid, but they only satisfy a weaker claim.

FWIW, I think the updated post is better. My comment here would only be that you could add clarity by showing the non-search scores (especially in the final table). In fact, the "study" being done with and without search makes a stronger post than had it only been one way. So kudos!


You've clearly thought this through, and I agree that had I been more precise at the start it would have avoided some confusion. I'm glad you like the updated post.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: