Teratos Semantic Search

High-Level Project Summary

Our project is based on two main components: a collection of NLP techniques and semantic search platform. The former consists on a series of NLP models: Keyword extraction, summarization and Intent Recognition. The latter is a vector search engine where researchers will be provided with: natural language, image, emoji and document extract recognition in order to find NASA documents the way they are the most comfortable with.These two tech blocks will be tangled in the following manner: user inputs will be automatically processed with the NLP techniques in order to search the vector database.Finally the user will receive a gentle set of documents which best match the user expectations.

Detailed Project Description

In the NLP techniques block, outstand 4 different transformers specialities which are: Intent Recognition, Named Entity Recognition, Summarization and Keyword extraction using Huggingface transformer models. In the case of Summarization, using NTRS documentation we developped train, test and validation csv files in order to fine-tune the pre-trained BART model.

These services will be applied to the user input so that the search engine will receive queryable values.

About the search engine, we have used Weaviate as a vector search engine so that all of NTRS documents are shrunk down into a bunch of document identifiers such as: author, title, abstract, documentId, etc...

This will allow us to search for the nearest neighbors to a given query and retrieve those kind of documents the user was looking for in the first place.

To conclude, the client intercation with the platform would be straight forward: just upload images, write natural language text , document extracts or even emojis! All these search types will be, again, processed by our NLP techniques.

Space Agency Data

For this matter we went over NTRS documents at center CDMS.

Hackathon Journey

Space Apps is a great experience for begginers such as ourselves. It allowed us to learn great soft skills such as communication and team work and... sure we did some great code!

In order to select the challenge we have worked on, we wanted to get our hands down to work and still having something more abstract to think about.

We have overcome some difficulties like managing to read some NTRS files or fine-tuning our pretrained model BART.

References

We have mainly used huggingface transformers -> https://huggingface.co/docs/transformers

and weaviate with docker -> https://weaviate.io/

https://www.docker.com/

Tags

#Machine Learning #NLP #NTRS #Transformers #Semantic Search