High-Level Project Summary
SA-CORP's project AIM is an artificial intelligence pipeline that uses past DSCOVR FC instrument data in order to predict future data for an upcoming time span.Which will be classified into groups based on the solar flare intensity.Thus having an earlier warning of a devastating solar flare by hours or even days compared to today's standard.The earlier the warning the better because intense solar flares today might disrupt the global infrastructure, pushing the world into total chaos.
Link to Final Project
Link to Project "Demo"
Detailed Project Description
Detailed pipeline
We take the data set containing the 4 solar wind parameters: proton-velocity, proton thermal speed, scalar solar proton temperature wind density from DSCOVR spacecraft, for a certain amount of time. which we feed to our time series multivariate prediction AI model. As a result, we get the future data for the upcoming hour and a half. The predicted output has the same shape as the initial data set. The existence of the 4 parameters makes it harder to detect solar flares, so we use a time series classifier to classify its intensity based on the 4 parameters for the given time interval.
How it works
We used the cdflib to read the cdf file from the gsfc.nasa.gov website which we downloaded using the python urllib.request library. We worked with the 2019 data files and joined them all together in one single data frame. We cleaned the data by: isolating the parameters, removing the labels and the missing values, transformed all the data to float type, calculated the velocity vector magnitude, sorted the data frame based on the timestamp index (year-day-ms), and normalized the features values between (-1,1)
For the neural network pipeline, we have a multivariate time series prediction model, which takes 158 minutes as input and predict 79 minutes of the 4 solar wind parameters as output. The data set was split between 85% for training and 15% for testing, then we divided each 158min and it's consecutive 79min
into 2 arrays having as a result 4 dataframes: train_input, train_out, test_input , test_out.
The output is a 2D array composed of the 4 solar wind parameters values for each minutes.
The architecture is as follows : 2 lstm layers with 100 cells.
We trained it for 25 epochs and a batch size of 32.
After training we predict using the test data some output results. Which we input into the classifier , a clustering algorithm for unlabeled multivariate time series data.
In turn the classifier classifies the results from a couple of categories already determined from training.
Thus getting the early warning in case of high intensity solar wind.
Benefits
We can predict the solar flare 2 hours approximately before it happens giving us more time and options to protect our electronics devices and systems.
We get clearer data from the spacecraft, removing the background noises and errors that occurred while measuring
Detection of patterns in the data collected about the sun, like cycles seasons and trends in solar winds.
Tools
Google Colab was our main development environment, it made everything easier, specially managing the large quantities of data to train the model.
Adobe premiere pro and Adobe Photoshop were used to edit and make our pictures, logos, and videos.
We coded the project in python 3.7 using the following libraries:
scipy, h5py, numpy, pandas, cdflib, xarray, keras, sklearn
Space Agency Data
We used NASA's DSCOVR data for training our dataset, the data was FC instrument data from the spacecraft.
We split it for training and testing of the model.
https://cdaweb.gsfc.nasa.gov/pub/data/dscovr/h1/faraday_cup/2019/
And we used for inspiration the Design and Early Observations From the DSCOVR Solar Wind Faraday Cup pdf. We surfaced many space agencies resources to see how to open CDF files or the data that is shared. And to compare the datasets and choose one for our project . The sites and files that we used for our research:
https://www.swpc.noaa.gov/communities/space-weather-enthusiasts-dashboard
Hackathon Journey
We had fun choosing the challenge, choosing what is the most impactful and useful for the earth.
In addition, we researched the idea, find the ways to detect currently the solar flares. Also we learned
more about the sun and solar flare. But It was hard making sense of the data. The AI model took a very big part of our research.
We worked on cleaning data to train the model. The best part was presenting the project.
At the end of this journey we felt like we didn't want it to end because of how it really affected us in our daily life.
We would definitely relive this experience.
References
We used as inspiration for our AI model the following sites:
https://medium.com/mlearning-ai/multivariate-time-series-forecasting-using-rnn-lstm-8d840f3f9aa7
https://stackabuse.com/solving-sequence-problems-with-lstm-in-keras-part-2/
https://towardsdatascience.com/how-to-apply-k-means-clustering-to-time-series-data-28d04a8f7da3
Tags
#ai #prediction #classification #coding #DSCOVR #solar_wind

