The JThankAi Solution

High-Level Project Summary

Our Solution: A Space-themed AI Text to Image Art GeneratorWith the rise of stylistic adversarial networks, the capabilities of deepfake technology and AI art generators have become scarily brilliant. Inspired by applications like Dall.E and Imagen, we thought it would be interesting to create a theme-based AI art generator that took descriptive input and applied it stylistically to space images. In addition, due to the fact that apart from supporting scientific research, images from the space station mostly show up in movies, YouTube productions, and advertising, we have also decided to add image customization features to our application to help with content creation and spark creativity.

Detailed Project Description

Solution Framework:

- UX Flow:

Chatbot Interaction => Unique Text to Generate AI Space Art => Image Processing and Customization => Download and Share Content!  

- Data Flow:


- Frontend Chatbot Component: with Azure and Node.js

A chatbot that caters to the user's needs and guides the user through the application.

With funding, the Chatbot can be trained with Azure. If no funding is granted however, since both James and Hank are working on their senior projects in the National Cheng Kung University Information Knowledge Management Laboratory in Taiwan, we firmly believe that we will be able to build a competent Chatbot.

*Knowledge base and QA-Pairs work in progress*

- Frontend UI Component:with HTML, javascript, and CSS

The user interface where users interact with our application. 

Will include a search bar to search for images, a menu for image processing and editing, a window pane to interact with the Chatbot and other services such as downloading images... etc. 

*For our project demo submission, we used PyQt5 in python to build a simple interactive GUI for demonstration purposes only. *

- Frontend Image Processing Toolkit:with OpenCV

A simple image processing tab that allows you to tweak and edit the image to fit your liking.

This is a image processing service provided by our application for you to edit and customize images in our application before downloading them. Please do keep in mind that if you are not a subscriber, all images downloaded from our application will have a watermark.

*A small demo can be seen if you run the ImageProcessingExtensions.py file on GitHub.*

Examples:



  • Original and Grayscale

  • Blurring with Bilateral Filter

  • Splitting to RGB channels respectively

  • Edge Detection

- Backend Cloud Drive:with Azure

Stores temporary data of NASA space imagery for AI model input. 

NASA is moving data to the cloud; therefore, cloud drive management is key in making sure that our application runs smoothly, especially since the components in our application, such as our Text to Image AI model, heavily relies on data. 

*For our project demo submission, all the data used is stored locally by the WebCrawler Component for usage. *

- Backend WebCrawler Component:with Python requests 2.28.1

Used to access online datasets and APIs as well as downloading and updating database. 

Even though most of the data can be downloaded and saved beforehand, there are still datasets and API services that are provided online. So in addition to routinely downloading new data and updating database, the WebCrawler Component is also used to call external APIs. 

*In our project demo submission, the WebCrawler Component, NasaAPICrawler.py, is used to collect and query data from the NASA Image and Video Library website. *

- Backend Text to Image AI Component: with Pytorch

This model takes input in the form of a space image and a descriptive text, then generates an output of the space image in the style of the descriptive text. 

The methodology of our approach is based on CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback

*Due to the two day time constraind, our Text to Image AI Space Art Generator model could not be trained in time; which is why we have temporarily substituted it with a Content-Style Image Synthesization AI model solely for demonstration purposes. *

- Substitute Content-Style Image Synthesization AI model: with Pytorch

A temporary substitute of the Backend Text to Image AI Component for demonstration. 

This algorithm is an image combining algorithm that uses the VGG pretrained AI model to implement the merging of 2 images. the resulting image is a blend of the content of one of the images and the style of the other, similar to a customized style filter based on the image. 

*This algorithm is implemented in the StylizeImg.py file on GitHub.*

Example:



  • Content Image + Style Image = Synthesized Image

+=

+=

- Application Frontend-to-Backend Management: with Azure

With the extensive amount of cloud computing, database management, and AI/ML resources provided, Azure is easily one of the best choices for us to implement our solution.

Space Agency Data

Prologue

In order to easily generate Space related pictures according to the input text, we need to get the texts and images together for training. We can easily access the images and their metadata through the powerful NASA API Portal.

There is a vast amount of diverse data such as beautiful pictures, detailed weather data, as well as current technology development, all of which sparked our creativity when brainstorming. However, because our challenge is related to images and some APIs are currently out of order, we picked two APIs mentioned below to access text and image data for our Text to Image AI application.

The APOD API :

APOD stands for the Astronomy Picture of the Day, which is a popular website, we can use date, start_date and end_date as parameters to get NASA imagery. Also, we not only get the image through this API, but also get some explanations and title regarding the image. It might help us training ImageLabeler and we can assign keywords to images. This ga

The NASA Image and Video Library API :

Our challenge is The Art in our worlds, we use text as input and image as output. This API can get the relevant images according to text queries, which is the perfect fit for what we need in our application.

In out demo, we have a search bar that the user can use to input search terms. Once the text is searched, our WebCrawler Component queries the API and downloads the first image that is searched for further reference and editing.

Future Works

In the future, we will use our WebCrawler Component to crawl and query the NASA APIs and databases for information including images, texts, and other metadata such as dates for our application.

We'd also like to take this opportunity to thank NASA for organizing and providing all these resources for us, we found it very helpful!

Hackathon Journey

Opportunity

To be honest, this opportunity sort of came out of the blue for James Hankathon. James and Hank, the two members of James Hankathon, having been close friends since the first year of college, had just joined the Information Knowledge Management Laboratory together in National Cheng Kung University and were starting preparation on their senior project. Then one day, their mentor, Professor Hung-Yu, Kao, broke the news about the 2022 Space Apps Challenge and thus the journey began for James Hankathon.

Journey

The journey for James Hankathon wasn't entirely smooth-sailing, having only 2 out of a possible 6 members, James and Hank struggled to digest the vast amount of resources available and delegate the insane workload of the project. However, in the process, both James and Hank learned a tremendous amount of knowledge, including not only technical knowledge regarding the challenge, but also soft skills such as cooperation, time management, and most importantly, how to work more efficiently.

Reflection

The 2-day hackathon may not have been the most mentally rewarding experience, having nearly broken down multiple times, but for James Hankathon, this hackathon was surely an experience to remember. We gave it our best and did all that we could to find the best possible solution we could come up with. We may not be the best, but we definitely did our best and you can be sure that no matter whether we advance to the next stage or not, we'll be back for more!


References

湯沂達(Yi-Dar, Tang). “Text-Driven Image Manipulation/Generation with Clip.” Medium. Medium, January 20, 2022. https://changethewhat.medium.com/text-driven-image-manipulation-generation-with-clip-d16568f7c16f.

dem108. “Deploy Machine Learning Models - Azure Machine Learning.” Deploy machine learning models - Azure Machine Learning | Microsoft Learn. Accessed October 2, 2022. https://learn.microsoft.com/en-us/azure/machine-learning/v1/how-to-deploy-and-where?tabs=azcli.

Duraj, Maciej. “How to Build a Chatbot with Azure.” Pluralsight, May 1, 2020. https://www.pluralsight.com/guides/how-to-build-a-basic-chatbot-using-microsoft-azure.

Lee, Seungmin, Dongwan Kim, and Bohyung Han. “Cosmo: Content-Style Modulation for Image Retrieval with Text Feedback.” CVF Open Access, January 1, 1970. https://openaccess.thecvf.com/content/CVPR2021/html/Lee_CoSMo_Content-Style_Modulation_for_Image_Retrieval_With_Text_Feedback_CVPR_2021_paper.html.

“NASA Image and Video Library.” NASA. NASA. Accessed October 2, 2022. https://images.nasa.gov/.

Nasa. “NASA/APOD-API: Astronomy Picture of the Day API Service.” GitHub. Accessed October 2, 2022. https://github.com/nasa/apod-api.

O'Connor, Ryan. “Minimagen - Build Your Own Imagen Text-to-Image Model.” News, Tutorials, AI Research. News, Tutorials, AI Research, September 1, 2022. https://www.assemblyai.com/blog/minimagen-build-your-own-imagen-text-to-image-model/.

Rashad, Fathy. “How I Built an AI Text-to-Art Generator.” Medium. Towards Data Science, October 8, 2021. https://towardsdatascience.com/how-i-built-an-ai-text-to-art-generator-a0c0f6d6f59f. 

Tags

#ai #ml #nlp #deeplearning #computervision #texttoimage #art #software #chatbot #jameshankathon #jthankaisolution