AstroCodex Summarization and analysis of documents

High-Level Project Summary

AstroCodex improve the NTRS using: -Natural Language Processing (NLP) to automatically read NTRS documents, summarize them, generate text analytic data, and produce a list of topic keywords to help researchers find the documents they need. -Improving the Graphical User Interface (GUI) of the NTRS website to be suitable with User Experience (UX). -Building a visual search on our keywords and summarize, to allow users to explore concepts, see related concepts, and drill down into the data. The project has already been picked up for an internal NASA big data. Our purpose is to make people, especially young generations, interested in astronomy and physics with an easy and mesmerizing path

Detailed Project Description

We had a problem with the access of the server in interaction with the client, so :

Firstly, we modify (Graphic User Interface) GUI by splitting it into two GUIs; one of them is for professionals (standard website) and the other is for students, which is designed as a simplified UI for them to be more accessible.

Secondly, adding a new tab in the navigation bar made for summarizing documents and research papers by extracting main keywords and declarative abstracts. The algorithm behind it is Natural Language Processing (NLP) by using PyPDF2 to read documents, then using "Yake library" built-in to extract keywords, after that, we use "Summa library" for summarizing the documents. We also imported a "Seaborn library" to plot the most five frequent keywords to be able to categorize the subject.

Finally, we showed a summary and the plots, then, uploaded the NLP model on Kaggle.

Space Agency Data

Our team members are keen on learning about astronomy and the bizarre universe. Thanks to this ambition, we thought we might convert this devotion into something useful for the public. During the development of this project, we have used:

1- NASA Technical Reports Server (NTRS) 

2- OpenAPI-Data-Dictionary-062021.pdf (nasa.gov)

3- STI_Open_API_Documentation_20210426.pdf (nasa.gov)

4- NASA Data Home - https://data.nasa.gov/

Hackathon Journey

Space Apps was not always the largest annual global hackathon in the world. Since then. Next, the program experienced rapid growth in numbers, breadth, and depth. Moreover, that encouraged us to participate to get experience. We have evolved to be more collaborative, creative, and innovative. We applied critical thinking to try to solve the challenges, particularly the NTRS challenge. We also gained the spirit of the team.

We learned how to work under stress and meet deadlines. Everyone on the team did their best to make a perfect and qualified project. Each was responsible for a specific part of the project so that we could deliver each part of the project within the deadlines. To summarize what we learned: time management, team spirit, how to form a team, problem-solving, and working collaboratively.

We wanted to choose a unique challenge and, at the same time, related to AI, so we chose the NTRS challenge. Furthermore, one of our goals is to help users quickly get the documents they need with abstracts and design a suitable GUI.

AstroCodex improved the NTRS using:

- Natural Language Processing (NLP) to automatically read NTRS documents, summarize them, generate text analytic data, and produce a list of topic keywords to help researchers find the documents they need.

 - Improving the Graphical User Interface (GUI) of the NTRS website to be more suitable for User Experience (UX).

 - Building a visual search on our keywords allows users to explore concepts, see related concepts, and drill down directly into the data.

AstroCodex improved the NTRS using:

- Natural Language Processing (NLP) and Support Vector Machine (SVM) to automatically read NTRS documents, summarize them, generate text analytic data, and produce a list of topic keywords to help researchers find the documents they need.

- Improving the Graphical User Interface (GUI) of the NTRS website to be more suitable for User Experience (UX).

- Building a visual search on our keywords allows users to explore concepts, see related concepts, and drill down directly into the data.

- Improving the Graphical User Interface (GUI) of the NTRS website to be more suitable for User Experience (UX).

 - Uniformly extracting natural Key words from more than 26,000 datasets. It is hard to search keywords if you do not know what they are already or how they relate. - Building a visual search on our keywords allows users to explore concepts, see related concepts, and drill down directly into the data.

We included other government data as NASA does not live in a void. There are more datasets tagged ‘space’ in non-NASA datasets than NASA datasets. Therefore, the cross-over is particularly strong with the National Science Foundation and the Department of Energy - both significant funding bodies for space-based research. Now, for the first time, it is easy to see how concepts are being discussed across different agencies.

 Once we had a uniform process for extracting core concepts from the documents, we needed to provide users with an easy way to query that data. Searching for a keyword does not solve the problem - you usually need to know the keyword beforehand. Instead, we allowed for a fuzzy search on the extracted concepts and then surfaced the most common keyword and the constellation of related concepts. You may be interested in Space Science, but you may not know that you can search directly for ‘Pulsars’ or ‘Neutron Stars’ to narrow the universe of results.

The visual graph-based search makes it easy to search datasets, understand the related concepts, and access the data directly from the source. The final similarity score between these two projects is then 2.5444444444444447.

We would like to thank our mentors and all those who have worked on simulating space apps in Cairo.

As they gave us a chance to compete with all students all over the world and give solutions for the challenges.

References

  1.      Swagger UI (nasa.gov)
  2.      NASA Scope and Subject Category Guide
  3.     Socrata APIs - http://dev.socrata.com/ \
  4. Convert XLSX Spreadsheet to JSON - http://oss.sheetjs.com/js-xlsx/
  5. Kaggle 
  6. https://aclanthology.org/W97-0711.pdf
  7. https://aclanthology.org/P98-1009.pdf
  8. https://ieeexplore.ieee.org/abstract/document/8554831/
  9. https://ieeexplore.ieee.org/abstract/document/9358703

Tags

#AI #NLP #natural_language_processing #text_manipulation