NTRS NLP

High-Level Project Summary

Our product, NTRS NLP, is a web application that will improve the accessibility and discoverability of 300 000+ records in the NTRS (NASA Technical Report Server). Nowadays it is vital for scientific and historical research communities to have trouble-free and fast access to data that is stored on the NTRS. Otherwise, they may have to “reinvent the wheel” or spend their valuable time, which can be spent on further research and development. This web application employs search tools, which allow users to search necessary files by keywords and collocations, understand the uploaded file’s general content without fully reading it, and generate statistical reports of language use.

Detailed Project Description


NTRS NLP


Overview:


NTRS NLP is a web application that is a game changer. It allows users to upload a file that is analyzed by NLP (Natural Language Processing) and generates a detailed language use report. Moreover, it saves each uploaded file, so users are able to search for them with keywords and collocations. (See the image below) Users are also able to view/download already uploaded files.



Future things our team wants to add and achieve:


Improve performance by:

  •  Fixing bugs connected with Celery tasks (This part of the project was commented out due to bugs and errors)
  • Moving to Elastic Search for text search


New features:

  • Improve text analyzer (Add summarization)
  • Add automatic file downloader and analyzer with help of NASA Open API
  • Use NASA STI Scope and Subject Category guide for file categorization and sorting.
  • Add more file filters


Tools, coding languages, hardware, and software we used:

  • Python - The main programming language used
  • Javascript - For frontend stuff
  • Flask - Web framework
  • SQLAlchemy - ORM for database
  • SQLite - Database (Temporary, will change to PostgreSQL in production) 
  • NLTK (Python) - Natural Language Toolkit for Python. (Used for text analysis)
  • Github, Git - Code storage, version control system
  • Docker - For the development server
  • HTML, CSS


(Present, but not active due to bugs)

  • Celery - Message broker


Software:

  • Pycharm - Code Editor
  • Docker - Development Server


Space Agency Data

How we used space agency data in our project: 

We utilized portions of data from the NASA Technical Report Server, as well as from the NASA NTRS OpenAPI, which we used to understand the general file structure, content and identify other unique signs. We used all this gained knowledge and resources to generate file language use reports.

Hackathon Journey

Our Hackathon Journey: 


Our team chose this challenge due to our passion for AI and software development. It is great to have people with different backgrounds and skill sets in the team, as it allows us to solve the challenge we chose more easily. These skill sets include basic software engineering, script writing, graphic design and etc. To sum up, we had an enjoyable and educational experience during the hackathon.


We would like to thank the hackathon organizers for the amazing challenges and other things that kept us busy during the weekend.

References

Data: NASA Technical Report Server - https://ntrs.nasa.gov/


Resources/Articles: NLTK Python - https://realpython.com/nltk-nlp-python/


Tools and Languages:

  • Python - The main programming language used
  • Javascript - For frontend stuff
  • Flask - Web framework
  • SQLAlchemy - ORM for database
  • SQLite - Database (Temporary, will change to PostgreSQL in production) 
  • NLTK (Python) - Natural Language Toolkit for Python. (Used for text analysis)
  • Github, Git - Code storage, version control system
  • Docker - For the development server
  • HTML, CSS


(Present, but not active due to bugs)

  • Celery - Message broker


Software:

  • Pycharm - Code Editor
  • Docker - Development Server
  • Canva - Presentation
  • Figma - Design, Logos …


Other:

Tags

#software, #Webapp, #NLP, #AI, #ML