Berthym

High-Level Project Summary

We have developed a prototype that represents our solution to offer accessibility to the publications of the NTRS platform.The added value is that it can find information regardless of syntactical errors, word order and offers content based on similar context or characteristic terms.Basically it aims to expose knowledge in an agile and appropriate way to expectations

Detailed Project Description

The flow of our solution can be abstracted into this flow:

1) Data collection (reports from January 1, 2000 until current day)

2) Data processing (pdf to text conversion)

3) Artificial intelligence (conversion of text to vectors grouped by grammatical context)

4) Cognitive search (estimation of relevance on the pdf document with the search)


Berthym Project Flow


We have designed this solution to process the PDF's and automatically proceed to extract features from the content.

It perform searches by formulating a sentence similar to the search text with the characteristics of the documents







Link Code:

https://www.kaggle.com/code/maximilianoalarcon/nasa-spacechallenge-2022/

Space Agency Data

NTRS - NASA Technical Reports Server

We use it for the development of the modules that would be part of the system

Hackathon Journey

Our experience served as insight to identify our strengths and weaknesses when facing a problem with an innovative solution.


It was somewhat exhausting but we were pleased to have been able to reach the end, the fact that it is so complex forces us to repeat this experience for as many years as necessary.


We learned to use tools to reuse in other events


We will be anxiously waiting until next year!!!

References

BERT hugging face (large-base-cased)

It was used to tokenize the text from pdf's

https://www.kaggle.com/datasets/sauravmaheshkar/huggingface-bert-variants


Library pdf2image

It was used to convert the PDF's pages to images


Library easyocr

It worked as a text extractor from images


Library ´poppler

It complements the easyocr library for its operation


BERT documentation

https://huggingface.co/docs/transformers/model_doc/bert


Evolution of coherence using neural networks models

I was inspired by the section about the semantic similarity graph

https://www.mdpi.com/2076-3417/11/7/3210


Canva template - Credits to: Jimena Domech

https://www.canva.com

Tags

#SpaceApps #AI #Software #STI #NLP #NTRS #Accessibility