Word Space Exploror

Kaohsiung | Can AI Preserve Our Science Legacy?

Awards & Nominations

Word Space Exploror has received the following awards and nominations. Way to go!

Global Nominee

Word Space Explorer

High-Level Project Summary

The Word Space Explorer is a Web application that accelerates researcher to locate desired information in the ever-growing repository. The Web application features 2-tiered A.I. models (Search Engine and Tag Recommender) through Natural Language Processing (NLP), trained on years of scientific reports within the NASA Technical Report Server (NTRS).When a user's query comes in, the query will be preprocessed, tokenised, and predicted through the pre-trained word vector space model. As this is a unsupervised learning model, our search engine compares all the trained documents, provides a list of document rankings for user to choose from.

Link to Final Project

https://word-space-explorer.azurewebsites.net/

Link to Project "Demo"

https://www.youtube.com/watch?v=Kt1u5CHpbEw&ab_channel=chengjoseph

Detailed Project Description

The Word Space Explorer is a web application that is hosted in Azure and built with frontend (HTML, CSS, JavaScript) and backend (Python), featured with an AI engine written in Python. The most important open data used was the pre-trained GloVe word embedding model, which we base our Search Engine Version 1 in, and used it as a vectorisor to compare the similarities of the scientific documents that were trains against users query. Further more , We have trained our own word vector space by feeding in raw text files harvested from the NTRS. Other open source libaries (Gensim, Spacy, Tensorflow Keras) were mainly used for data pre-processing.

The application is expected to allow users to access their desired documents easily from anywhere and on any hardware device as long as the modern browsers and internet are available.

The application can offer an easy way for researchers to locate the right NASA documents with the help of AI engine, here are the exact steps:

Open your favourite Browser and go to the Word Space Explorer website: https://word-space-explorer.azurewebsites.net/
Type any questions you might have related to NASA or topics within the scope of "NASA Scope and Subject Category Guide"
Your query would then be sent to the AI engine to assist you with the relevant Tag Query Enhancer shown on the screen where you can select the relevant tags to guide you closer to your desired documents.
The search result will only list out the most relevant documents based on your filter, you can then access the right documents and enjoy.
Note: If you have entered random queries, the web application might return an error message

Note: Code Repository for Model Training and Web application is located here: https://github.com/josephwccheng/NASA_Word_Space_Explorer

Space Agency Data

NASA Scope and Subject Category Guide https://ntrs.nasa.gov/api/citations/20000025197/downloads/20000025197.pdf

Extracted all the Subjects and Category which is used to do NTRS document querying to obtain diversivied data for our training.

The NASA Technical Report Server (NTRS) https://ntrs.nasa.gov/

Extracted a subset of the NTRS (5500 over the 380,000) documents used to train our A.I. Tag Recommender, over all the topics mentioned in the Scope and Subject Category Guide

Stanford GloVe data http://nlp.stanford.edu/data/glove.6B.zip

A 600MB + Pre-trained word embedding developed by Stansford, with over 400,000 vocabularies and a dimentions in the range from 50 to 200.

Hackathon Journey

We had a lot of fun learning new things, coming up with different solutions and connecting with people around the world with the same interest to explore technologies. The vectorization techniques in Natural Language Processing (NLP) is one of the new things we’ve learned. Though the process of brainstorming is the most challenging, yet the most exciting part.

The motivation was clear, to provide an AI solution that would allow users to quickly and easily find information that is of genuine interest, without the need to wade through numerous irrelevant contents. Word Embeddings methodology was used to optimize the accessibility and discoverability of records in the The NASA Technical Report Server (NTRS), with the aim to provide better search experience for the researchers on scientific and technical records. We’ve learned new technology and been inspired to explore the unknown from NASA Hackathon. It's worth the time and effort to participate in all the challenges and awesomeness.

We would like to thank the NASA Earth Science Division and the sponsors who initiated and made the event possible. A special thanks to the staff team from NASA Hackathon Taiwan with the well-organized instructions to guide us to the Hackaton and valuable feedback to make our project even better.

References

Data

NASA Scope and Subject Category Guide https://ntrs.nasa.gov/api/citations/20000025197/downloads/20000025197.pdf
The NASA Technical Report Server (NTRS) https://ntrs.nasa.gov/
Stanford GloVe data http://nlp.stanford.edu/data/glove.6B.zip

Tools

Python http://www.python.org
tqdm https://github.com/tqdm/tqdm
requests https://requests.readthedocs.io/en/latest/
pandas https://pandas.pydata.org/
numpy https://numpy.org/
tensorflow https://tensorflow.org
scipy https://scipy.org/citing-scipy/
gensim https://github.com/RaRe-Technologies/gensim
flask_restful https://github.com/flask-restful/flask-restful
Flask https://flask.palletsprojects.com/
HTML, CSS and JavaScript@W3C
Microsoft Azure: https://azure.microsoft.com/
Image editing tool - https://www.photopea.com/

Videos, images and font

Star wars intro https://www.youtube.com/watch?v=K9Xjj35PcRQ&ab_channel=PRINCETUTORIAL
Star wars font https://www.dafont.com/star-jedi.font
Earth image http://clipart-library.com/free/earth-drawing-png.html
Image of vector space https://stock.adobe.com/search/images?k=3d%20grid