GPT-NASA

High-Level Project Summary

A fined-tuned model aimed to preserve all NASA (NTRS) knowledge. Due to time constraints we had to only be limited with Abstract and title of manuscripts as well as be limited with fine-tuning rather than training it on entire metadataset and purposefully training a language model from Scratch for this task only.

Detailed Project Description

In the GitHub repo you'll find all the necessary code and datasets through which anyone can locally train an AI model that can provide abstracts of manuscripts present on NASA database upon given the title of that manuscript. And it streamlines the process of knowledge discovery and innvation by making the oppurtinity to find interesting ideas easily.


We have used a generative model i.e. GPT-2 but GPT-NEO can also be used. In order to do it, we have used an popular AItext generation library so that model training becomes an ease. Furthermore, we have added a gradio version which can be used through an website if ayone wants (provides an UI) though we didn't depolyed yet due to time-contraints by ourselves. But, hope to do it after few day, the hackathon ends.


Programming Language: Python 3.8 and up

Hardware: Google Colab, Azure ML and Local Machine (8 GB RAM + Intel 11th GEN processor + GPU)


Space Agency Data

We have processed the entire NASA (NTRS) database.

Hackathon Journey

Thanks for givng us such an amazing opportunity, otherwise we didn't have the chance to help the humanity and creating something that can be used by everyone to value out from a vast source of human knowledge. Whatever happens in the hackathon, we as a team decided to go forward with this project to make a robust open source language model that can be used anyone.


Also meeting really smart people from different background in the process.


Our future plans are to train a model from scratch using NASA (NTRS) data and deploy it using Gradio as Open Source under MIT License.

References

We thank a lot aitext for their open source GPT template and NASA for providing with the data.

Tags

#GPT #Language_Model