Awards & Nominations
Word Space Exploror has received the following awards and nominations. Way to go!

Word Space Exploror has received the following awards and nominations. Way to go!
The Word Space Explorer is a Web application that accelerates researcher to locate desired information in the ever-growing repository. The Web application features 2-tiered A.I. models (Search Engine and Tag Recommender) through Natural Language Processing (NLP), trained on years of scientific reports within the NASA Technical Report Server (NTRS).When a user's query comes in, the query will be preprocessed, tokenised, and predicted through the pre-trained word vector space model. As this is a unsupervised learning model, our search engine compares all the trained documents, provides a list of document rankings for user to choose from.
The Word Space Explorer is a web application that is hosted in Azure and built with frontend (HTML, CSS, JavaScript) and backend (Python), featured with an AI engine written in Python. The most important open data used was the pre-trained GloVe word embedding model, which we base our Search Engine Version 1 in, and used it as a vectorisor to compare the similarities of the scientific documents that were trains against users query. Further more , We have trained our own word vector space by feeding in raw text files harvested from the NTRS. Other open source libaries (Gensim, Spacy, Tensorflow Keras) were mainly used for data pre-processing.
The application is expected to allow users to access their desired documents easily from anywhere and on any hardware device as long as the modern browsers and internet are available.
The application can offer an easy way for researchers to locate the right NASA documents with the help of AI engine, here are the exact steps:
Note: Code Repository for Model Training and Web application is located here: https://github.com/josephwccheng/NASA_Word_Space_Explorer
Extracted all the Subjects and Category which is used to do NTRS document querying to obtain diversivied data for our training.
Extracted a subset of the NTRS (5500 over the 380,000) documents used to train our A.I. Tag Recommender, over all the topics mentioned in the Scope and Subject Category Guide
A 600MB + Pre-trained word embedding developed by Stansford, with over 400,000 vocabularies and a dimentions in the range from 50 to 200.
We had a lot of fun learning new things, coming up with different solutions and connecting with people around the world with the same interest to explore technologies. The vectorization techniques in Natural Language Processing (NLP) is one of the new things we’ve learned. Though the process of brainstorming is the most challenging, yet the most exciting part.
The motivation was clear, to provide an AI solution that would allow users to quickly and easily find information that is of genuine interest, without the need to wade through numerous irrelevant contents. Word Embeddings methodology was used to optimize the accessibility and discoverability of records in the The NASA Technical Report Server (NTRS), with the aim to provide better search experience for the researchers on scientific and technical records. We’ve learned new technology and been inspired to explore the unknown from NASA Hackathon. It's worth the time and effort to participate in all the challenges and awesomeness.
We would like to thank the NASA Earth Science Division and the sponsors who initiated and made the event possible. A special thanks to the staff team from NASA Hackathon Taiwan with the well-organized instructions to guide us to the Hackaton and valuable feedback to make our project even better.
Data
Tools
Videos, images and font
#nlp #ai #searchengine #nlp #ntrs
The NASA Technical Report Server (NTRS) includes hundreds of thousands of items containing scientific and technical information (STI) created or funded by NASA. Imagine how difficult it can be to locate desired information in such a large repository! Your challenge is to develop a technique using Artificial Intelligence (AI) to improve the accessibility and discoverability of records in the public NTRS.
