Allen Institute for AI today debuted AutoDiscovery, a new artificial intelligence system, now available as an experimental feature that helps science researchers ask questions when they are overwhelmed with data.

Discovering patterns across papers can be a tremendous burden for scientific research, becoming the single most time-consuming portion of everyday work. However, the biggest bottleneck for research often isn’t the pile of papers and books that researchers read to get to the answers they need. It’s knowing the right question to ask.

AutoDiscovery, formerly AutoDS, is now available in AstaLabs, part of the scientific AI ecosystem from Ai2 named Asta that allows the analysis, summarization and search of more than 108 million academic abstracts and 12 million full-text papers.

Instead of starting with a question, the feature starts with data and asks its own questions by generating hypotheses in natural language, proposes experiment plans, writes its own Python code and executes them, interprets statistical results and uses that to generate new hypotheses.

It is essentially its own researcher capable of walking through a branching statistical analysis of a few or hundreds of papers in a structured dataset and exploring. According to Ai2, it can run a quick analysis or run overnight, and it will provide a complete list of possible research directions, and each one is reproducible for further investigation.

“AutoDiscovery’s ability to reveal discoveries that may be hiding in plain sight is especially valuable in cancer research,” said Dr. Kelly Paulson, medical oncologist and head of the center for immuno-oncology at the Swedish Cancer Institute.

Just like a scientist, AutoDiscovery uses open-ended exploration by generating a hypothesis. It then uses what the company calls Bayesian surprise, a measure of how the system’s beliefs change after seeing evidence.

Before running an experiment on a paper, it holds a prior belief in whether it expects the hypothesis to be true based on a probability distribution. This “belief” comes from a set of world knowledge already accessible to the model. After examining the results from the paper, the model updates its expectations and the “surprise” factor, either positive (in other words, it was confirmed) or negative (it was falsified).

What’s most important about the surprise factor is not just the surprise, but how surprising it was. A disconfirmed hypothesis could be just as valuable as a confirmed one, especially if it completely uproots expectations or reinvents assumptions.

Examples from history about hypotheses that completely shattered understanding include when, in the 1800s, it was believed that “miasma” or bad air caused illness. This was completely uprooted and replaced by the germ theory of disease when it was discovered that contagions, specific germs, not miasma, caused specific illnesses. The shift began in the 1860s and 1870s after Dr. John Snow mapped cholera cases in London and demonstrated that it was spread through contaminated water, not bad air, challenging miasma.

The company said this follows along with scientific rigor, results that meaningfully shift our expectations are often more interesting than those that simply confirm what we already assumed. By chasing surprise, AutoDiscovery attempts to gravitate toward the unexpected and tries to represent genuine discoveries rather than obvious patterns.

However, surprise itself is not enough. Exploring the breadth of scientific discovery space requires intelligent search. Thus, it also implements Monte Carlo Tree Search, which balances exploring new hypotheses and prioritizing known leads. This helps push computational efforts towards what will build on the most likely paths towards unlocking better information.

In the words of Ai2, it uses Bayesian surprise and MCTS to co-collaborate with researchers to answer the question: “What should be investigated next?”

“The ability to generate multiple hypotheses that can then be thoroughly evaluated by the user is extremely powerful,” said Dr. Fabio Favoretto, marine ecologist at the Scripps Institution of Oceanography.

Ai2 said the system changes the relationship between scientists and their data by transforming datasets from static repositories into collaborative partners. AutoDiscovery is available today as an experimental feature in Asta, an open-science, scholarly agentic AI framework.

Photo: Pixabay

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.