SpacePapers

High-Level Project Summary

Our NLP-powered search engine is equipped with semi-supervised Topic Modeling, ER, POS, STT, TTS, Recommendation, etc., that helps one explore the NTRS easier than ever. Extracting information from pdfs and all web contents of NTRS, we have made this web app, SpacePapers ,that will intelligently offer just the information one seeks. Explore via topic, entities, keywords, and other custom search features and even explore within the pdf via NLP. The vast Socian Corpus has made the task possible and fun! We have added features like Automated Speech Recognition with 93% accuracy and a basic Recommendation Engine. There is much more to do if given the opportunity, but we enjoyed the 24 hours!

Detailed Project Description

What?

We understand there are several obstacles in exploring such vast arrays of datasets and how vital those articles, videos, and resources are. Problems may include unmatched keywords, different synonyms, multiple factors on search, misspellings, difficulty extracting data contents, etc. Luckily we enjoy making the search, recommendation, and user experience as convenient as gazing at the palm of one's hands via Natural Language Processing. Our algorithm extracted topics, keywords, entities, POS, and so on to offer an utterly convenient search experience. All you have to do is 'search', and the platform does the work. You may feel more comfortable and use logic-oriented searching and filters as well. You could search with your voice and on your terms. We made things easy.

How?

We have used FastText for Topic Modeling, and Spacy, Flare, and Scikit Learn in Entity Recognition. We have utilized the complete NTRS data and the API. For the database, Elastic and Mongo was our go-to choice. The backend mainly was Node and Django, with Kaldi used for Speech Recognition. Socian Corpus was a huge resource that boosted our capabilities 10X.

Why?

Apart from our love of hackathons and interest in space, we at Socian, a NLP startup from Bangladesh, wanted to prove our competitiveness and passion to NASA. I, a teenager, leading my startup in a NASA hackathon seemed like an excellent opportunity for us. We would love to connect and grow.

What else?

We have much more to offer but only had 24 hours since we knew about it late. We could add a complete Recommendation Engine, fully custom ER, TTS offerings, Computer Vision powered graph and much much more. But we did the utmost in the timeline!

Space Agency Data

We have utilized the complete NTRS dataset. The models have been trained on all the information downloadable from NTRS. Topics used in topic modeling are inspired by the datasets found in NASA OPEN DATA PORTAL and others based on downloads/frequency. The NASA NTRS open API has been an absolute lifesaver!

Hackathon Journey

Learned a lot and Had the Most Fun in 24 hours! The excitement fused with the curiosity was high. We learnt some of the best approaches for data modeling and experienced a separate case of making search easy. The process was all about constant communication and collaboration with each member. Within 24 hours, we may have had 240-speed meets and 24000 slack messages. The first 8 hours were primarily online and the rest in the office. What an experience! Adore it. Special thanks to my CEO, Tanvir H. Sourov, for working all night. Also, to NASA for the opportunity and my mother for the love and coffee that kept us going.

References

Tools Used: PyCharm, VSCode

Data: Nasa NTRS Open API Data (Those with PDF Available)

Resources: NLTK, Spacy, ScikitLearn, FastText, Elastic (Customized), Kaldi, Techontron2, RecBole


//All are OpenSource Libraries that we can use. There are Some Socian own Data, Model & FineTuned Hyperparameters that have been achieved over the period.//


Public Repository: https://github.com/Socian-Ltd/socian-space-papers


Tags

#nlp, #searchengine, #convenience, #teenstartup, #smartsearch, #spacepapers, #EntityRecognition #NTRS #24hoursoffun