What's new?

Taipei | Save the Earth from Another Carrington Event!

Awards & Nominations

What's new? has received the following awards and nominations. Way to go!

Best Use of Science

The solution that makes the best and most valid use of science and/or the scientific method.

What's new in Carrington Event?

High-Level Project Summary

In this project, we propose a machine learning pipeline to predict the probability of the solar storm event. We divide the challenging problem into three subproblems, including (1) map DSCOVR’s magnetic data to Wind’s magnetic data, (2) transform magnetic data to solar proton data, and (3) predict the storm happening probability based on the proton data. Each of the transforms uses a time sequence model to capture the sequence features. By partitioning the process into subproblems, each of them can be more representative, and can be further used in downstream tasks. The results demonstrate the applicability of the pipeline and the success in predicting solar storms.

Link to Final Project

https://github.com/ton731/nasa-hackathon-whats-new

Link to Project "Demo"

https://docs.google.com/presentation/d/1_8j9jWidH1xdR4-PAN5dJLORefONGjVQ/edit?usp=sharing&ouid=109528539944531384435&rtpof=true&sd=true

Detailed Project Description

Our members

Our team

What exactly does it do?

In the project, we present a machine learning-based method to predict and evaluate the probability of the solar storm event as shown in Figure 1.

Figure 1. The illusion figure

Since we want to make the prediction based on DSCOVR’s magnetic field measurement, our input of this method is the Bx, By, and Bz magnetic field data from DSCOVR, and the output is the storm happening probability. Although it is a possible solution to direct feed in the magnetic field and whether an event occurs as input and output, respectively, it might not be an optimal selection. The distortion and the time shift make the features less representative. Moreover, the solar proton behavior data is more correlated to solar storms compared to the magnetic field data.

Figure 2. How do we determine the ways we are going to work on?

As a result, it would be more appropriate and representative to predict probability step by step instead of the end-to-end way.

Our method can be divided into 4 parts:

Data preprocessing. DSCOVR’s magnetic field data and Wind’s magnetic field data and proton data are at different time scale levels. In addition, the number of timesteps is too big for prediction. As a consequence, our preprocessing includes concatenating the sequence, performing moving average, downsampling, then cropping the sequence data based on the R2 similarity. The cropped sequence data is our dataset.
Train the first model. Train a sequence to sequence GRU (gated recurrent units) model to map DSCOVR’s magnetic data to Wind’s ground truth magnetic data. This step removes the distortion and noises in DSCOVR and improves the magnetic data’s quality.
Train the second model. After the mapping, train a sequence-to-sequence GRU model to map the transformed DSCOVR magnetic data to Wind’s proton data. This model learns the relationship between magnetic field data and the density, velocity, and temperature features of the proton.
Train the third model. Finally, by integrating the disturbance storm time (DST) index data on the ground earth, train a sequence to value (many-to-one) GRU model to predict the probability that the storm occurs by giving the proton data. Softmax activation is used in the final layer of the model to produce the probability.

These three models that we have trained in this challenge are shown in Figure 3.

Figure 3. The three models that we have trained in this challenge.

In sum, our prediction pipeline consists of three GRU models. They are trained separately and then chained together after the training. DSCOVR’s magnetic field data is first transformed with the first GRU model, then it becomes the input of the second GRU model to generate the proton mapping. Lastly, it will be fed into the third GRU model for probability prediction.

How does it work?

Time sequence neural network models such as RNN (recurrent neural network), GRU (gated recurrent units), and LSTM (long short-term memory) accept a sequence of features as input. The output of the sequence models can be either a single number (many-input-to-one-output) or a series of numbers (many-input-to-many-output). Compared to the traditional machine learning method which only focuses one-time step at a time, those sequence models can digest the features in the previous steps, and find the important features that contribute to the future outputs. Our project makes use of sequence models to capture the magnetic field and proton’s behavior through the time period.

What benefits does it have?

First of all, the GRU models provide a great capability for mapping nonlinear functions. The relationship between magnetic field data from DSCOVR and Wind and between magnetic data and proton behavior are hard to express explicitly by human or linear functions. Second, dividing an end-to-end training process into three steps is more useful and contains more physical meanings. Even though our target here is to predict the event probability, the GRU models can be used in other situations or downstream cases. For example, the first GRU model can be seen as a filter that removes the noises of the instrument and corrects the distortion.

What do you hope to achieve?

In the overall view, we hope to correctly predict the solar storm event by using DSCOVR’s magnetic field data. In the detailed view, first, we hope the first GRU model to map DSCOVR’s magnetic data to Wind’s magnetic data correctly. Second, we expect the second GRU model to predict the proton’s density, temperature, and velocity correctly using magnetic data. Third, we hope the third GRU model will predict the probability and point out the upcoming event precisely, and save the Earth!

Our project has great results on each of the subtasks and the overall mission:

For mapping DSCOVR’s magnetic data to Wind’s magnetic data, the testing dataset accuracy (R2 score) is 0.9929. The figure 4 shows that the R2 score increased from 0.551 to 0.978 after the first GRU model’s mapping.

Figure 4. The predicting results.

For transforming magnetic field data to proton’s density, temperature, and velocity, the second GRU model can obtain 0.9456 R2 score in the testing dataset.
For predicting the probability of happening solar storm, the accuracy is 0.9551, recall is 1.0000, precision is 0.4478. Although it tends to be conservative, the model can still detect all of the events in the below case.

Figure 5. Geomagnetic Storm Risk Forecasting

Tools, Languages, Hardware

Languages:

We use Python as our coding language and we speak Mandarin.

Some of our members' hardware lists:

Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz

NVIDIA GeForce RTX 3060 Ti

-------

Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz

NVIDIA GeForce RTX 2080 Ti

RAM 64GB

Software:

Visual Studio Code for coding

Microsoft PowerPoint for presentation slides

The python library list one of our members used:

appnope==0.1.3

asttokens==2.0.8

attrs==22.1.0

backcall==0.2.0

cdflib==0.4.7

certifi==2022.9.24

contourpy==1.0.5

cycler==0.11.0

debugpy==1.6.3

decorator==5.1.1

dtw==1.3

dtw-python @ file:///Users/runner/miniforge3/conda-bld/dtw-python_1662356919013/work

entrypoints==0.4

executing==1.1.0

fonttools==4.37.3

h5py==3.7.0

ipykernel==6.16.0

ipython==8.5.0

jedi==0.18.1

jupyter-core==4.11.1

jupyter_client==7.3.5

kiwisolver==1.4.4

matplotlib==3.6.0

matplotlib-inline==0.1.6

nest-asyncio==1.5.5

numpy @ file:///Users/runner/miniforge3/conda-bld/numpy_1662888927969/work

packaging==21.3

parso==0.8.3

pexpect==4.8.0

pickleshare==0.7.5

Pillow==9.2.0

prompt-toolkit==3.0.31

psutil==5.9.2

ptyprocess==0.7.0

pure-eval==0.2.2

Pygments==2.13.0

pyparsing==3.0.9

python-dateutil==2.8.2

pyzmq==24.0.1

scipy==1.9.1

six==1.16.0

spacepy==0.4.1

stack-data==0.5.1

torch==1.12.1

tornado==6.2

tqdm @ file:///private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_2adqcbsqqd/croots/recipe/tqdm_1664392689227/work

traitlets==5.4.0

typing_extensions==4.3.0

wcwidth==0.2.5

wget==3.2

Space Agency Data

The Wind Mission’s Magnetic Field Data Sets, BW(t), https://cdaweb.gsfc.nasa.gov/pub/data/wind/mfi/mfi_h2/2022/
The DSCOVR Magnetic Field Data Sets, BD(t), https://cdaweb.gsfc.nasa.gov/pub/data/dscovr/h0/mag/2022/
The Wind Mission’s Ion Parameters, https://cdaweb.gsfc.nasa.gov/pub/data/wind/swe/swe_h1/2022/
DST Data, https://wdc.kugi.kyoto-u.ac.jp/dstdir/index.html

Hackathon Journey

Timelapse link:

https://drive.google.com/file/d/1hAoBtOVZxhMN0GP-UMnGqyhe0R1aD_87/view?fbclid=IwAR2QHhA4mrhkJUnjKzRIILr07Z0T5EyRZ7nSbaCJEs7Gdc8CIDd-WrHk3Ig

Picture:

We thank the organizer of the Taipei NASA Hackathon for the great pizza, and you can see the tallest building (Taipei 101) in the background.

It's our first time participating NASA Hackathon, and it is a fascinating journey for all of us to join this great competition. We choose this challenge because most of us study machine learning-based research as our graduate thesis. The challenge is very difficult for us. Although we have some background knowledge about machine learning, we only know little about solar winds and these datasets. Therefore, we actually spend tons of time digging up what is in these '.cdf' file and what these variables actually means. Reading all of the resources NASA has provided online. It is a tough path but we have overcome it. Hope to see you on the rocket launch day.

We thank the AI center at National Taiwan University (NTU) provide an excellent place for us to discuss and solve this challenge.

References

Reference

PyTorch official website.
Microsoft PowerPoint for presentation slides
Visual Studio Code for coding
T. Giorgino. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package. J. Stat. Soft., doi:10.18637/jss.v031.i07.
The Wind Mission’s Magnetic Field Data Sets, BW(t), https://cdaweb.gsfc.nasa.gov/pub/data/wind/mfi/mfi_h2/2022/
The DSCOVR Magnetic Field Data Sets, BD(t), https://cdaweb.gsfc.nasa.gov/pub/data/dscovr/h0/mag/2022/
The Wind Mission’s Ion Parameters, https://cdaweb.gsfc.nasa.gov/pub/data/wind/swe/swe_h1/2022/
DST Data, https://wdc.kugi.kyoto-u.ac.jp/dstdir/index.html

Icons & Images

https://www.nesdis.noaa.gov/current-satellite-missions/currently-flying/dscovr-deep-space-climate-observatory
https://www.flaticon.com/free-icons/computer, title="computer icons", Computer icons created by winnievinzence
https://www.flaticon.com/free-icons/wind, title="wind icons", Wind icons created by Freepik
https://www.flaticon.com/free-icons/box, title="box icons", Box icons created by Freepik
https://www.flaticon.com/free-icons/box, title="box icons", Box icons created by bqlqn