Alexandria

High-Level Project Summary

Alexandria is an artificial intelligence app that will improve the accessibility and discoverability of records in the public NASA Technical Report Server. Alexandria can read and summarize documents, generate text analytic data, and produce a list of topic keywords to help researchers find the information they need.

Detailed Project Description

After checking several of NASA Technical Report Server´s publicly available documents, I created a test corpus with some Legacy CDMS PDF files. Then, using Jupyter Notebook and the NLP open-source code library NLTK I processed the files, generated a summary, extracted keywords and plotted a bar chart with the Seaborn library.



Jupyter notebook capture



Then, using the Django Python framework I built a website where I could host the data, and also create an interactive application that would allow the user to navigate through the test corpus.



app capture



Finally, I created a page where the users can read all the extracted information from each document.






You can find the full code for both the Jupyter Notebook and the Django app in this link:

https://github.com/barrancocarlos/alexandria-spaceapps

Space Agency Data

The first stop was the NASA Technical Report Server (NTRS) where I navigated through several dozens of documents, searching for PDF files that were searchable .


After that, I consulted the The NASA Scope and Subject Category Guide, to get an idea of he type of keywords the app should handle.


Finally I constructed a test corpus with a few random PDF files that prove to be searchable by the NLTK library.

Hackathon Journey

I think this is one of the most gratifying experiences I've had as a developer, I had to study a lot about NLP and about frameworks and libraries before I could start creating the app. I tried to solve the problem by approaching it as a user who needs the information but is not familiar with the NASA Technical Report Server site, so I tried to make it as simple and visual as possible.

References

Python Language . Available at http://www.python.org


Django Framework. Retrieved from https://www.djangoproject.com/.


Small Apps Theme

Copyright © 2022. Designed & Developed by Themefisher


NTRS - NASA Technical Reports Server


NASA Scope and Subject Category Guide


NLTK: Natural Language Toolkit


PyPDF2 2.11.0


Jupyter Notebook


Seaborn


Pandas


Matplotlib

Tags

#NLP #AI #Python