It’s Prime Time for Smarter Digital Libraries. Meet ORKG reborn.
By Lauren Snyder
It’s Friday night and you’ve finally plopped down on the couch after a long week. You can’t wait to unwind with your favorite show. But as Netflix opens with the familiar, “Duh dun!” you remember you watched the final episode yesterday. With a sigh, you begin searching for the next binge-worthy series. Fortunately, Netflix has a carousel of suggestions waiting for you and in a matter of minutes you’re immersed in a new storyline. Phew, Friday night is saved.
Information retrieval systems, like the one helping you find your next great escape, also play a critical role in science. Just as you search for an interesting new show in Netflix’s digital library, researchers scour scientific digital libraries for information. Admittedly nerdier than surfing Netflix, but the concept’s the same. The difference is that while you’re searching through roughly 1,800 TV shows on Netflix, researchers have to sift through tens or even hundreds of thousands of scientific articles depending on the question they’re asking. It’s like looking for the proverbial needle in a haystack. This is why effective information retrieval systems are key for helping scientists find the information they’re looking for.
In fact, these systems are so important there are entire scientific conferences dedicated to them. One recently held in Padova, Italy from July 13-17 was the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. This year, close to 1,000 researchers gathered to discuss new information retrieval systems and techniques aimed at bringing order to vast amounts of information. As Saifeldin Mandour, a conference attendee from the University of Heidelberg, put it, “Information retrieval systems make overwhelming piles of data seem manageable. They bring structure to information that is dispersed and unorganized, making it more accessible and easier to understand.”
Many information retrieval systems do this by training machines to interpret human-language, such as recognizing key words or even interpreting longer passages of text from scientific articles. While these approaches continue to advance, machines still struggle to accurately interpret human-language text and can produce content that is irrelevant or made-up.
With these challenges in mind, an emerging digital library, ORKG reborn, presented at the conference stood out for its unique approach to organizing scientific information. Rather than relying on machines to learn human language, the digital library publishes scientific information in a structured form that machines can already read. In other words, the content is produced machine-readable from the get-go. “Our goal is to make scientific research open and reproducible, and this influenced how we designed the system. The digital library is open source and the machine-readable information it houses is produced and published in a distributed manner. That means it can be collected and published by other systems,” explains Hadi Ghaemi, an ORKG reborn software developer and first author of the SIGIR conference paper that introduces the digital library.
The research team behind this work envisions a future where all science is produced machine-readable. “In a world with well-populated digital libraries of machine-readable scientific information, we really could leverage machines to help us sort through large volumes of information. While scientific information retrieval may never be as effortless as planning your next movie marathon, we would be one big step closer,” says Dr. Markus Stocker, innovator of ORKG reborn.
More about ORKG reborn:
“Advancing Scientific Knowledge Retrieval and Reuse with a Novel Digital Library for Machine-Readable Knowledge” offers a first, concise presentation of the ORKG reborn digital library. The full article was published at the 48th ACM SIGIR Conference on Research and Development in Information Retrieval in the demo track and is available at https://doi.org/10.1145/3726302.3730134
The authors are Hadi Ghaemi, Lauren Snyder, and Markus Stocker.
Explore the emerging digital library at https://reborn.orkg.org/
For more information, please contact:
Hadi Ghaemi, PhD Student in the Lab Knowledge Infrastructures at the TIB, hadi.ghaemi@tib.eu
Markus Stocker, Head of the Lab Knowledge Infrastructures at the TIB, markus.stocker@tib.eu
A special thank you to:
SIGIR attendees from the Kempelen Institute of Intelligent Technologies, Saarland University, the University of Amsterdam, and the University of Heidelberg for sharing their insights into information retrieval systems and providing easy-to-understand explanations of how they work.
Ricardo Perez Alvarez for an inspiring brainstorming discussion around this article.