Universalization and accessibility of (NTRS) through NPL models

High-Level Project Summary

The project has the intention of using NLP advances to democratize access to scientific information. To do so, it uses a combination of the sumy library, to find the most important sentences, and the Pegasus summarization AI, to make an abstract summarization, an then translate them to four different languages. The importance of the project relates to the fact that most people can't comprehend the technical language present in most scientific articles, and therefore, feel left out of the new discoveries, and may even feel that science became out of touch with themselves.

Link to Final Project

Link to Project "Demo"

Detailed Project Description

Image: Made for this project.


What does it exactly do?

Summarizing, it turns cientific articles more acessible to the general public, through translations.


For Example:

Left: Abstract of the "Interactions of stars and interstellar matter in Scorpio Centaurus, de Geus, E. J.". Right: Simplified Text made by our team.


Simplified Texts translated into Chinese, Spanish and Portuguese.


How it works?

First, we get the abstracts and titles from many articles in the NTRS. Then, we use Sumy to select the most impactful sentences in the whole abstract. Using the transformer model Pegasus, we make a simple but effective abstract summarization of those impactful sentences, which are then translated to chinese, spanish and portuguese, with the help of the python_translator library.



Which benefits does it have?

A better accessibility and dissemination of empiric knowledge to most people results in a better informed population, that avoids fake news propagation and contributes to rapidly spread new scientific productions in the academic community, for example.


Image: Our website.


What do we expect?

Through this work, the group hopes to get applicable and reproducible results, as science is based, that provides a greater propagation of NTRS knowledge into society and scientific community.


Which tools, programming languages, hardware or software we used in the project?

Programming language: Python

Libraries: Pandas, Sumy, Requests, re, json, Python Translator, NLTK, Transformers and Torch.

Software: Jupyter Lab, Github and VS Code.

Hardware: HPC (ILUM)

Space Agency Data

We used The NASA STI Repository (NTRS).


How did we use it?

We used the repository to get the ID, Title and Abstract from scientific articles, books and other production.


How did it inspire the project?

The repository showed us the mission to regard our knowledge production and the need to bring access to more and more people.

Hackathon Journey


How would you describe your experience in the Space Apps?

The experience was benefic and delightful to every member of the group that helped us to met new people during the event, to work in a group and to bring some ideas to achieve the objective of regard and spread the knowledge in NTRS.


What did you learn?

The group learned how to use or to better use every one of the libraries used to code, like Sumy and Torch and to make a great division of our work production.


What inspired your team to choose this challenge?

We have interest feel that scientific communication is something essential to spread the knowledge for all. We strongly feel that for that information to be available to more people in more areas, it is needed a self-explanatory and accessible way to not only access, but comprehend it. Besides that, we hold interest in NLP, which was one of the reasons for our choice of topic, besides being able to utilize the knowledge we gained in other projects.


What was your approach to develop this project?

Our approach was to use webscrapping, NLP and AI to translate and bring access to different people of the content in NTRS

 

How did your team handle setbacks and challenges?

We solved problems through our interaction, where each one of us has more affinity in certain subjects in the project and that enabled mutual help when somebody needed it.


Do you want to thank someone? Why?

We would like to thank Victor Sofe, from Orion Soluções Digitais, that helped us create the group logo, and James Morais de Almeida, who helped us navigate through the interface of the HPC, in which we ran the summarization algorithm.

Tags

#NLP, #Science, #translation, #articles