High-Level Project Summary
The NTRS has data of more than a half decade, an entire history of NASA work, the entire data consists of a diverse form of textual PDF documents, which requires a responsible preservation, for the current era of researchers/future generations' accessibility. For this purpose, we had to developed a program that can provide accessibility in terms of summarization and textual analysis, that's given by our used algorithm via deep learning methods, the algorithm summarizes a pdf/scanned document and gives us an output summary of the entire +PDF contains/has described. This application will be highly useful for today's researchers, students developmental organizations and general public.
Link to Final Project
Link to Project "Demo"
Detailed Project Description
This project initially started with our team involved in space and AI related activities, as early as we learned that the NASA Space Challenge has begun, we starting doing our own research of deep Learning methods of NLP involved in summarizing textual information in few understandable words, that refer the central idea of the entire PDF/Scanned documents. This process start with an input of PDF to our algorithm, that has the capability to generate textual analysis in form of word count, and overall "Title Generation" . Our transformers' architecture (BART), which is decisive in solving the problem on hundreds of variables to give us our desired and relevant variables, while giving us data analytics as a cloud of keywords and it's visualization. We hope to achieve a text summarizer, a title generator for ease of access from the historical data, by this way researchers, university students, developmental organizations/firms, and general public can access the old pdfs/textual files more efficiently, more easily for their own research/academic or any nature of work, a summary or reference full of knowledge can provide. So, our goal was eventually to increase the accessibility of science, the preservation of mankind's science legacy, what the older generations have done, a connection in knowledge has to establish through such kind of framework, in form of a web application. Also, a recommendation engine for referring relevant documents for a searcher/user, that will help him further.
In this process, we used python programming language for algorithm building, NLTK, Transformers (HUGGINFACE) and spacy. We aim to include a backend through DJANGO, and a UI based on html/CSS/JavaScript.
Space Agency Data
We have used NTRS pdf documents tagged by CDMS that have data from different institutions like Canadian Space Agency, European Space Agency, Indian Space Research Organization, Japan Aerospace Exploration.
Hackathon Journey
The hackathon was full of excitement, as we learned a lot about the subject matters such as space sciences, artificial intelligence and subdomains. We learned working with transformers, advance use of python, astronomy jargon, team work and so on. We would like to thank NASA for its data and holding this competition on such a huge scale.
References
https://ntrs.nasa.gov/search
https://huggingface.co/models?pipeline_tag=summarization&sort=downloads
Tags
#artificialintelligence #nlp #nasa #spaceapp #NTRS #alphaintelligence #isst

