Awards & Nominations
Kamakani o ka lā has received the following awards and nominations. Way to go!

Kamakani o ka lā has received the following awards and nominations. Way to go!
We develop a deep neural network anomaly detector and use it to detect anomalous DSCOVR data. Our model is an autoencoder trained on one day of WIND magnetic field vector data. We use reconstruction loss (MAE) with an anomaly threshold to identify anomalous DSCOVR magnetic field vector data. Our model is trained on just one day of data. Training on the full WIND data set may significantly improve performance. This project is important because the WIND and DSCOVR spacecraft observe the solar magnetic field which is critical for forecasting "Space Weather" on Earth caused by solar coronal mass ejections.
We implement an anomaly detector trained on WIND mfi data based on a Keras example of anomaly detection [1]. We use the "astro-ft" data transfer script [2] to download the data in parallel from the NASA GSFC public data website using curl.
We use the CDF library [3] and pycdf package [4] to access the downloaded data in Python. Following the example, we preprocess the data into succesive, contiguous, 300 second sequences to form the training data set comprising 86101 training examples. We use a 90%/10% training/validation split yielding 77491 training and 9610 validation examples. We adapted a useful python example [5] to convert WIND fractional day of year values into seconds. The WIND time sampling interval is different from the DSCOVR time sampling interval. This posed a challenge for us. We emperically determined a linear function to produce indices for WIND "Time_PB5" values closest to each DSCOVR whole second interval. The equation we use is idx_wind = idx_dscovr * 10.828726851851853 + 5 giving a good linear fit [Fig. 1].
Model ArchitectureOur anomaly detector is a 1D convolutional autoencoder [Fig. 2] with an information bottleneck in the first two layers. The information bottleneck constrains the model to learn a compressed latent representation of the data (encoder). The second two 1D transpose convolutional layers learn to reproduce the input from the latent representation (decoder). The final output transpose convolutional layer ourputs (300,3) values matching the input dimension of each training sequence.

Fig. 2 Neural network model architecture - multilayer autoencoder
We experimented with batch size of (128, 256, 512), learning rates of (1e-4, 5e-4 and 1e-3) and dropout percentage of (0, 0.2, 0.5). Our final parameters were batch size 256, learning rate 0.001 and no dropout. Notably, and somewhat counterintuitively, removing dropout significantly reduced overfitting, loss at convergence and increased the number of epochs of training before early stopping. We trained for 50 epochs with early stopping patience 5.
Our best model converged at epoch 48 with training loss of 4.1072e-04 and validation loss of 6.4473e-04. We achieved a very good fit to the training data set [Fig. 3]. More significantly, the out of distribution DSCOVR data fit is nearly as good as for the WIND training set [Fig. 4].
Fig. 3 Fit on training data (WIND mfi)
Fig. 4 Fit on unseen inference data (DSCOVR mfi)
We use the reconstruction MAE loss distribution [Fig. 5] to set an anomaly threshold equal to the maximum MAE on the training set. To filter a DSCOVR data sequence we use the model to reconstruct (predict) the input sequence from itself. The reconstruction loss is the mean absolute error between the reconstructed input and the "real" input. If this loss exceeds the anomaly threshold the data is considered an anomaly. We visualize the detected anomalies in a DSCOVR sample by overplotting anomalies in red [Fig. 6].
Fig. 5 Reconstruction loss distribution of WIND magnetic field vector training data
g. 6 DSCOVR magnetic field vector data with anomalies overplotted in red
We demonstrate how to use the anomaly detector as a filter by removing the anomalous data points from the DSCOVR example and replotting the "corrected" DSCOVR and WIND data [Fig. 7].
Fig. 7 DSCOVR mfi data with anomalies removed (top, green)) and same day WIND data (bottom, magenta)
We used open source software, the most significant of which include Python, Jupyter, TensorFlow, Keras, Matplotlib and pycdf. We used Canva to prepare our presentation and GitHub as a code repository and for version control. We ran our Jupyter notebook in an NVIDIA docker container using an RTX-3090 GPU.
We used the following space agency data for this challenge.
https://cdaweb.gsfc.nasa.gov/cgi-bin/eval2.cgi
https://cdaweb.gsfc.nasa.gov/pub/data/wind/mfi/
https://cdaweb.gsfc.nasa.gov/pub/data/wind/swe/swe_h1/
https://cdaweb.gsfc.nasa.gov/pub/data/wind/mfi/mfi_h2/2022/wi_h2_mfi_20220514_v04.cdf
https://cdaweb.gsfc.nasa.gov/pub/data/dscovr/h0/mag/
https://cdaweb.gsfc.nasa.gov/pub/data/dscovr/h0/mag/2022/dscovr_h0_mag_20220514_v01.cdf
Our team had a lot of fun in this hackathon. We collaborated in Zoom with "stand up" meetings every 3 hours. We tended to just leave the Zoom running and hang out while we worked together. Some of us learned about the Carrington event and CMEs and how they could impact the Earth. Some focused on creative aspects of making a slide show to present our solution. Others sharpened their coding skills, especially with CDF, matplotlib, TensorFlow and Keras. The limited time was the biggest challenge. Some of us worked late into the night with little sleep.
A technical challenge we encountered was the different time sampling for WIND and DSCOVR mfi data. We analyzed the data and ultimately used an empirical approach to find a solution.
The overall approach we took was first to select a high level strategy (anomaly detection). Then we adapted an existing, neural network architecture that we could adapt to the challenge data set and objective. We set out to accomplish more than just an anomaly detector but we ran out of time. We rescoped to focus on submitting early then iterating to improve both our solution and our submission.
[1] AI for Anomaly Detection
https://keras.io/examples/timeseries/timeseries_anomaly_detection/
[2] Data transfer
https://github.com/edubergeek/astro-ft
[3] CDF
https://spdf.gsfc.nasa.gov/pub/software/cdf/dist/cdf38_1/
@software{SpacePy, author = {{Larsen}, B.~A. and {Morley}, S.~K. and {Niehof}, J.~T. and {Welling}, D.~T.}, title = {SpacePy},
[4] pycdf
publisher = {Zenodo}, doi = {10.5281/zenodo.3252523}, url = {https://doi.org/10.5281/zenodo.3252523} }
[5]Time Conversion
https://www.geeksforgeeks.org/python-program-to-convert-seconds-into-hours-minutes-and-seconds/
Websites
https://www.mssl.ucl.ac.uk/grid/iau/extra/local_copy/SP_coords/geo_sys.htm
https://en.wikipedia.org/wiki/Wind_(spacecraft)
https://en.wikipedia.org/wiki/Deep_Space_Climate_Observatory
Creative Commons Attributions
https://commons.wCredit: Mahendra awale, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons
Wikimedia.org/wiki/File:Example_of_a_deep_neural_network.png , BrunelloN, 2021
#ai, #spaceweather, #ml, #solarphysics, #software, #autoencoder
If a major space weather event like the Carrington Event of 1859 were to occur today, the impacts to society could be devastating. Your challenge is to develop a machine learning algorithm or neural network pipeline to correctly track changes in the peak solar wind speed and provide an early warning of the next potential Carrington-like event.
