Northern Lights: Preserving Science for a Better Future

High-Level Project Summary

The challenge was to create an AI that sorts through the NASA repository and generates reports with statistical analysis and a summary. Our project processed data from the NASA repository and turns it into PDF reports with data visualization and natural language processing. I think that our project does solve the challenge. It contains a summary of the data in each report but it does statistical analysis in a fun way because it uses data visualization. I think that this challenge is important because sorting through databases is tedious and takes a long time. It is a perfect task to pass off to an AI. Something which will never get tired and processes data much faster than us.

Detailed Project Description

Our project takes data out of the NASA STI repository using RESTful API. It then processes that data using NLTK's Natural Language Processing. We then analyze the data for weighted word frequency. This processed data is then visualized using bar graphs and word clouds. These charts, along with a summary of the PDF are used to generate PDF reports of the data. I think it is a good solution to the problem because the documents in the database are really boring to sort through and just looking at text over and over can be mundane. I think that using colorful charts to visualize data can add some color to any researchers day. And the data contained within is also very useful. We used the Python coding language, alongside the Pandas, NumPy, Matplotlib, Seaborn, Regex, heapq, requests, os, PIL, reportlab, and NLTK libraries.

Space Agency Data

We extracted NASA's data from the CDMS database, using the NASA Open API STI Repository. We took the text versions of pdfs from the database and then used NLP to process and summarize their data and visualized it using graphs.

Hackathon Journey

I would describe this experience as an amazing and inspirational learning experience. We learned a lot about using different libraries together and we learned more about Natural Language Processing. Our team was inspired to join this challenge because we are really interested in Artificial Intelligence and we believe that AI can make the world a better place. Our approach to developing this project was first creating a framework of code that we thought would work and then filling in all the required code afterwards. We faced a pretty big challenge utilizing the NLTK library. The raw data needs to be cleaned pretty well and our regex skills weren't that great, so we took the abstract out of each doc and processed it using NLP.

References

stackoverflow.com

matplotlib documentation

seaborn documentation

nltk documentation

google.com

NASA Open API STI repository

CDMS center

stackabuse.com

Tags

#ai, #python, #scripts, #NLP, #data visualization, #data analysis, #science, #NASA, #API