The AI rediscovered an interferometer technique the Russian's found decades ago, optimized a graph in an unusual way and came up with a formula to better fit a dark matter plot.
Ehhhhh, I'll say it's substantive and not just pure hype.
Yes the AI "resurfaced" the work, but it also incorporated the Russian's theory into the practical design. At least enough to say "hey make sure you look at this" - this means the system produced a workable-something w/ X% improvement, or some benefit that the researchers took it seriously and investigated. Obviously, that yielded an actual design with 10-15% improvement and a "wish we had this earlier" statement.
AFAICT the "AI" didn't "pay attention to the work" either. They built a representation of a set of possible experiments, defined an objective function quantifying what they wanted to optimise and used gradient descent to find the best experiment according to that objective function.
If I've understood it right, calling this AI is a stretch and arguably even misleading. Gradient descent is the primary tool of machine learning, but this isn't really using it the way machine learning uses it. It's more just an application of gradient descent to an optimisation problem.
The article and headline make it sound like they asked an LLM to make an experiment and it used some obscure Russian technique to make a really cool one. That isn't true at all. The algorithm they used had no awareness of the Russian research, or of language, or experimental design. It wasn't "trained" in any sense. It was just a gradient descent program. It's the researchers that recognised the Russian technique when analyzing the experiment the optimiser chose.
The discovering itself doesn’t seem like the interesting part. If the discovery wasn’t in the training data then it’s a sign AI can produce novel scientific research / experiments.
Your exchange has made me wonder. Yes, whatever AI produces is not genuine stuff. But there is something we could call "Shakespeare-ness", and maybe it is quantifiable.
How would a realistic Turing test for "Shakespeare-ness" look like?
Big experts on Shakespeare likely remember (at least vaguely) all his sonnets, so they cannot be part of a blinded study ("Did Shakespeare write this or no?"), because they would realize that they have never seen those particular lines, and answer based on their knowledge.
Maybe asking more general English Lit teachers could work.
Extra Terrible Lines are indeed fun. We've had 9 months of development since then, though; maybe it would make sense to repeat those experiments twice a year.
IIRC Scott Alexander is doing something similar with his "AI draws nontrivial prompts" bet, and the difference to last year's results was striking.
Also, this really needs blinding, otherwise the temptation to show off one's sophistication and subtlety is big. Remember how oenologists consistently fail to distinguish between a USD 20 and a USD 2000 wine bottle when blinded.
AI companies stole massive amounts of information from every book they could get. Do you really believe there's any research they don't have input into their training sets?
The AI rediscovered an interferometer technique the Russian's found decades ago, optimized a graph in an unusual way and came up with a formula to better fit a dark matter plot.