NTRS+: An AI-based extension of NTRS for easier searching and discovery of relevant contents

High-Level Project Summary

The objective of our group is to make it easier to locate relevant documents in the NASA Technical Report Server (NTRS). Our project involves 2 main components to achieve this purpose. First, we created an AI model that automatically associates keywords to PDF documents in the NTRS. To do this, we collected PDF files and their annotations using the NTRS API. The PDF files are then preprocessed and split into a training and test dataset. The training data set is used for fine-tuning a pre-trained NLP model. The resulting model is then used for generating the keywords of the test dataset. Finally, we created a web app that allows users to search and navigate based on the generated keywords.

Link to Final Project

Detailed Project Description

Our project uses AI to automatically tag keywords to NTRS documents, making sure that the keywords are relevant. To demonstrate our result, we also created a web application that can search documents based on the keywords we generated. We used the following tools for this project: Python, Roberta-Base (https://huggingface.co/roberta-base), KeyBERT (https://github.com/MaartenGr/KeyBERT/), Firebase, and JQuery. We hope that, through this project, we can make it easier to search relevant content in the NTRS.


Space Agency Data

NASA (NASA Technical Report Server)

Hackathon Journey

It was a fun and fulfilling experience

References

NTRS API (https://ntrs.nasa.gov/api/openapi/#/)

Python (https://www.python.org/)

Roberta-Base (https://huggingface.co/roberta-base)

KeyBERT (https://github.com/MaartenGr/KeyBERT/)

Firebase (https://firebase.google.com/)

JQuery (https://jquery.com/)

Tags

ntrs