High-Level Project Summary
ATHENA is a beautiful software that provides an abstractive summary to any NASA research available in the NTRS database. It is a highly advanced Natural Language Processing model that is able to understand information, and summarize it in its own words so researchers do not have to spend long hours going through papers that might not be of use to them. It also provides a list of top keywords used in the research paper. We have also built a pdf extractor that pulls out all the text from the pdf if the user wants to view the whole document's text. We have provided a beautiful User Interface for you to try out ATHENA!
Link to Final Project
Link to Project "Demo"
Detailed Project Description
Github: https://github.com/aadityayadav/NasaHacksAthena
Website: http://athena-10123.df.r.appspot.com/
NOTE: If the website gives a 502 Bad Gateway error, please clear your cache and reload!
What exactly does it do?
Our software solves the challenge of "Can AI preserve our science legacy". We have successfully used NLPs to provide a useful and comprehensive overview of any NASA research without having to go through, or even open the research papers.
How does it work?
For testing out our model, We have already uploaded a bunch of pdfs extracted from the NTRS database. You can choose any of the available pdfs on the website and easily test our model. Upon selecting a pdf, you will see the summary along with the top keywords appear on the screen. If the summary aligns with your research needs, you can use the associated keywords to find more such related pdfs.
NOTE: Many of NASA's pdfs are not stored in the right format, they are very blurry scanned images which are difficult to extract. Due to the time constraints of this hackathon, we have built our NLP for well documented pdfs. However, an NER recognition NLP can be integrated in this project's pipeline if it were to be expanded.
What benefits does it have?
This saves researchers a lot of time. If this was to be integrated with the NTRS database, then it can use its API to dynamically pick pdfs (instead of us having pdfs uploaded on the front-end, which we have done right now for ease of testing our project).
What do you hope to achieve?
We hope that this software will have a positive impact on the usability of the NTRS database. We sincerely believe that this will greatly ease the work of researchers who are trying to skim through researches to decide which one suits them the most.
What tools, coding languages, hardware, or software did you use to develop your project?
For the model, We used HuggingFace's extensive NLP library to understand what suits our needs for abstractive summarization and keyword extraction. For the backend, we used Flask to integrate all our components together along with HTML and CSS for the front end
Space Agency Data
We strictly used the NASA data given in the resources.
NTRS Database: We collected a subset of pdfs from the NTRS database so that the judges can use them on our software and see that our code actually works on REAL NASA data.
Hackathon Journey
How would you describe your Space Apps experience? What did you learn?
I had an amazing experience with NASA Space Apps! It was a highly enriching weekend where I got to learn so much and experience a whole new tech stack.
What inspired your team to choose this challenge?
Our team members have a genuine passion for Machine Learning and found the space of Natural Language processing highly interesting, so this was a comfortable choice for us. We had a great time building it coz we were genuinely driven to the cause that we were trying to solve.
What was your approach to developing this project?
Ojas started off by building the NLP model for abstractive summarization while Aaditya built the Flask for the model. Upon completion, both the components were integrated along with the keywords extraction algorithm and a couple bug fixes.
How did your team resolve setbacks and challenges?
Our team faced numerous challenges especially in Flask since neither of us are proficient in it. Initially, the model was not integrating well with Flask and it took us a while to understand the underlying errors causing it. We worked and brainstormed together through all challenges and were successfully able to make the project!
References
Data:
NTRS Database
Resources:
All of the software was built by us.
Tools:
NLP API from HuggingFace
Flask
Python
HTML
CSS
JavaScript
Jinja
Tags
#Athena #MachineLearning #NLP #AI #Software #Advanced #Flask

