High-Level Project Summary
We developed a "Concept Exploration" approach to allow researchers to query the hundreds of thousands of STI documents using a concept description instead of a word search. This provides a capability to find and follow additional lines of research based on a relevance score to the concept being queried.As an example, a query for "effective heat material" will return results such as "Improving Thermoelectric Properties Of (Si/Ge)/GaP Alloys" - which has none of the words from the words from the query, but it still relevant to the concept of effective heat materials.Live Demonstration Available Here: - https://spaceapps.etheredge.co/
Link to Final Project
Link to Project "Demo"
Detailed Project Description
The project uses a transformer based language model to generate embeddings for each document in the corpus. In our case, we used all of the titles and abstracts that exist in the NASA Technical Reports Server.
When a query is performed, it converts the query concept into an embedding and performs a proximity search across the embedding space to retrieve the documents most relevant to the concept.
We also extracted images from the PDFs in order to provide additional exploration with image search.
We used the gradio project to provide a quick UI suitable for a demonstration.
Space Agency Data
We used the API for the NASA Technical Reports Server.
Hackathon Journey
We chose the challenge because we have been working in adjacent fields using the same underlying theme.
For example, see https://github.com/HSV-AI/bug-analysis which is an attempt to support the maintenance of open source software.
Our approach was to use the AI tools we had, while learning the NASA API for the Technical Reports Server.
The challenge we experienced was with API errors and timeouts when downloading PDF files. We overcame that setback by using on the titles and abstracts of the documents which were available through other API calls.
References
Live Demonstration: https://spaceapps.etheredge.co/
Huntsville AI Website - https://hsv.ai
GitHub Page for Huntsville AI - https://github.com/HSV-AI
Tags
#transformers, #nlp, #concept, #exploration, #ai

