Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How will we know if its AGI/Not AGI? (I don't think a simple app is gonna cut it here haha)

What is the benchmark now that the Turing test has been blown out of the water?

 help



Until recently, philosophy of artificial intelligence seemed to be mostly about arguments why the Turing test was not a useful benchmark for intelligence. Pretty much everyone who had ever thought about the problem seriously had come to the same conclusion.

The fundamental issue was the assumption that general intelligence is an objective property that can be determined experimentally. It's better to consider intelligence an abstraction that may help us to understand the behavior of a system.

A system where a fixed LLM provides answers to prompts is little more than a Chinese room. If we give the system agency to interact with external systems on its own initiative, we get qualitatively different behavior. The same happens if we add memory that lets the system scale beyond the fixed context window. Now we definitely have some aspects of general intelligence, but something still seems to be missing.

Current AIs are essentially symbolic reasoning systems that rely on a fixed model to provide intuition. But the system never learns. It can't update its intuition based on its experiences.

Maybe the ability to learn in a useful way is the final obstacle on the way towards AGI. Or maybe once again, once we start thinking we are close to solving intelligence, we realize that there is more to intelligence than what we had thought so far.


The Turing test isn't as bad as people make it out to be. The naive version, where people just try to vibe out whether something is a human or not, is obviously wrong. On the other hand, if you set a good scientist loose on the Turing test, give them as many interactions as they want to come to a conclusion, and you let them build tools to assist in the analysis, it suddenly becomes quite interesting again.

For example, looking at the statistical distribution of the chat over long time horizons, and looking at input/output correlations in a similar manner would out even the best current models in a "Pro Turing Test." Ironically, the biggest tell in such a scenario would be excess capabilities AI displays that a human would not be able to match.


Why is LLM-generated writing so obvious?

I like the line of thinking from an earlier commenter: when an AI company no longer has any humans working, we'll know we're there.

I don't think this is a beneficial line of reasoning. All you need to reach that is a moderate fall in AI stock prices.

I would consider something generally intelligent that is capable of sustaining itself. So... self-sufficiency? I don't see why the bar would be much lower than that. And before people chime in about kids not being self-sufficient so by that definition I wouldn't consider them generally intelligent which is obviously false... to that I would say... they're still in pre-training.

To my knowledge Turing test has not been blown out of the water. The forms I saw were time limited and participants were not pushed hard to interrogate.

You have no idea whether you're talking to an LLM right now, and neither do I. That's good enough for me.

I dunno, I am rather certain your comment was not made by an LLM. Moreover I am certain you knew my wasn't either.

And that's before the interrogation, which is the entire point of the test.

IMO, Turing test stands, but the experience you are referring to is basically a sub-human form of AGI.


It's crystal-clear that a model that was trained specifically to fool expert interrogators in a Turing test would, in fact, be able to do so. You'd have to sandbag the model just to keep it from tipping its hand by being too good.

We don't have any such models right now, AFAIK, so we can't run such a test. They wouldn't be much good for anything else, and would likely spark ethical concerns due to potential for misuse. But I have no doubt that it's possible to train for the Turing test.


I mean is it though? The top reasoning models suggest to walk to a car wash.

The top reasoning models suggest taking a car to the car wash.

Not 100% of time according to comments.

SotA doesn't matter, though. Only the first couple of time derivatives matter. Looking good for the clankers, not so much for us...

Supranormal GDP growth is my bar. When its actually able to get around bottlenecks and produce value on a societal level

An agent need not have wants, so why would it try to increase its efficiency to obtain things?

I don't think that was the intent of the comment, more that true AGI should be so useful and transformative that it unlocks enough value and efficiencies to boost GDP. Much like the Industrial Revolution or harnessing electricity, instead of a fancy chatbot.

Increased productivity is not equivalent to intelligence.

Not equivalent, but I do think a necessary byproduct of actual AGI is that it will be able to solve actual problems in the real world in a way that generates positive value on a large enough scale that it will show up in GDP

No one said it is. Sometimes correlation does equal causation.

Just put "keep yourself alive" in the SOUL.md. Might be all that it takes.

I swear people don't know what's good for them.

There is a different way I look at this.

Humans will never accept we created AI, they'll go so far as to say we were not intelligent in the first place. That is the true power of the AI effect.


And yet another way to look at it is maybe current LLM agents are AGI, but it turns out that AGI in this form is actually not that useful because of its many limitations and solving those limitations will be a slow and gradual process.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: