High-Level Project Summary
The solution that we developed for Art in Our Worlds challenge is a website that allows users to input short phrases (text or voice) and matches that input to NASA science data and imagery then displays the output in the following formats:•Documents.•Images.•3D images that virtual reality headsets can access.•Images manipulated in an artistic manner.•Funny filters with the output images. •Videos.•Images generated from text. And the user can share all this data with others.
Link to Final Project
Link to Project "Demo"
Detailed Project Description
Project description in detail
NASA spends every year billions of dollars (25.2 billion dollars in fiscal year 2021), and this money has landed rovers on Mars, and even landed humans on the moon, building along the way a treasure of knowledge made with manpower and lots of time, but what good is this billion dollar knowledge if it’s not easily accessible and not all can benefit from it? So, this is the pipeline to hunt this treasure.
1.Extracting images and their description from NASA earth book
-There was a need for a dataset that has images, the images description, and where to find these images, we used NASA’s earth book found in the challenge’s resources to extract such dataset that is going to be used later when we are filtering the images.
-This is implemented using python programming language
2.Speech to Text Transcription
-There are many methodologies when designing a product, one of them is the design thinking methodology that one of its principles is user-centricity and empathy, so we wanted to target visually impaired people and facilitate their use of our website.
-It takes voice input from the user and maps it to its corresponding text
-It is implemented with machine learning using DeepSpeech model
3.Summarization model
-This machine learning model summarizes images and videos description so that the input to the similarity model is a little shorter thus much more efficient.
-The model used is SpaCy
4.Documents Similarity
-The core of the solution is allowing users to input short text phrases that gets matched with NASA science data or imagery.
-It takes input from the user that can be one word, or short text phrases.
-This feature is implemented with machine learning using BERT model (Bi-directional Representations from Transformers), that makes use of Transformer, an attention mechanism that learns contextual relations between words in a text, it return a list on indices containing the top 10 matches for the input text.
5.Creative Distortion of retrieved Images
-In this feature, we wanted to target the lovers of the space, showing them the retrieved picture from the similarity (if found) in many ways, creatively distorting it, we implemented 3 techniques that can do this.
1.Accessing a single-color channel of the image and changing its color map (needs one picture)
2.Taking 2 pictures and creatively blending them together (mix-up technique in data augmentation)
3.Taking 2 pictures and generating a third one using magenta, a pre trained style transfer generative adversarial network
6.Text to Speech
-This feature is continuation of the speech to text feature, to aid the visually impaired people with the search results.
-This machine learning model is Google Text to Speech, it is integrated with an API that can classify what is the input language
Tech Stack :
Our solution is implemented with:
Programming Languages : Python 3.9, JavaScript
Markup Language: HTML
Styling Language: CSS
Libraries used: ebooklib, numpy, pandas, wget, warning, os, BeautifulSoup, IPython,
Deepspeech, libasound2-dev, portaudio19-dev ,libportaudio2, libportaudiocpp0, ffmpeg, wave, nltk, sklearn, tensorflow_hub, matplotlib, cv2,PIL,random,zipfile, flask, pytextrank
, diffusers, transformers, scipy, mediapy ,deep_translator, gtts, langdetect
Space Agency Data
DATA Source
We used in our website data from” THE NASA API Portal” where NASA data, including imagery, is accessible to application developers. We used 2 APIs: Astronomy of The Day (AOTD) API, and NASA Image and Video Library API to access the NASA Image and Video Library site at images.nasa.gov.
Data usage techniques
We used the AOTD API by making POST requests on the API from 1950 till today and saved the data in a CSV file, we also used NASA Image and Video Library API using search command and passing user search text as a parameter. We use the data retrieved in the f Collection+JSON format which we convert to a dictionary.
Hackathon Journey
It was a fun, distressing, exciting, and great experience. We were stressed about providing the most creative and efficient solution but at the same time, there were a lot of fun activities and joyful moments.
References
The NASA API Portal where NASA data, including imagery, is accessible to application developers.
- Types of transcrption models
- Deep Speech Github
- Show and Tell: A Neural Image Caption Generator, 2015.
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, 2015.
- Long-term recurrent convolutional networks for visual recognition and description, 2015.
- Deep Visual-Semantic Alignments for Generating Image Descriptions, 2015.
- Automatic image annotation on Wikipedia
- Show and Tell: image captioning open sourced in TensorFlow, 2016.
- Presentation: Automated Image Captioning with ConvNets and Recurrent Nets, Andrej Karpathy and Fei-Fei Li (slides).
- Project: Deep Visual-Semantic Alignments for Generating Image Descriptions, 2015.
NeuralTalk2: Efficient Image Captioning code in Torch, runs on GPU, Andrej Karpathy
Tags
#ComputerVision #MachineLearning #DocumentsSimilarity #VR #ImageDistortion #TextToSpeech #Speech to text #Textspeech

