ρBerTO

High-Level Project Summary

ρBerTO describes a Machine Learning approach for the Carrington Event data comprehension. We propose an innovative approach for tackling the problem of the prediction of symptoms related to possible Carrington Events, leveraging on a powerful Deep Learning Architecture. The proposed solution is based on a comparison between the data coming from the two main observatory spacecrafts actually employed, WIND and DSCOVR. It aims at improving the quality of the measurements of the DSCOVR spacecraft with a particular focus on an intelligent analysis of the solar wind behavior, by addressing a double task of "next event prediction" and "data trend analysis".

Detailed Project Description

Data Interpretation and Pre-processing


The first phase of the overall process involves a human evaluation of the collected data from the two studied spacecrafts, WIND and DSCOVR, based on the documentation provided in the Resources section of the Challenge, making sure to truly understand data meaning. The main focus of this section is to learn how to merge and filter the huge quantity of data provided, in order to create an accurate pre-processing pipeline for the selected sources. In particular, the result of the analysis enables to draw a feasible goal for the entire project and to filter out data according to that goal. Specifically, we cleaned data by outliers that could mislead in the first place our comprehension of the data. Then we re-aligned DSCOVR sampled data with WIND data, originally acquired at different, non-constant, sampling times, in different capture windows, ultimately enabling the data sequences to have the same number of elements. The re-alignment was accomplished by computing the average of the one or two values (depending on data availability) of protons velocity, thermal speed and protons density of DSCOVR which were the closest to each time instant of WIND samples. Data were ultimately divided in training and testing partitions and parsed in a format to be suitable for our deep learning model.



Model Training and Testing


Once the pipeline for the data preparation has been completed, pre-processed input flow into an advanced deep learning architecture based on the so called "Bert Base Uncased" model from Hugging Face organization. The overall training algorithm mainly relies on two tasks learned in parallel by means of two ad-hoc loss functions. Both of the training sections are borrowed from the Neural Language Processing field and are formerly known as Masked Language Model (MLM) and Next Sequence Prediction (NSP). The algorithm takes as input two sequences of each measurement coming from the DSCOVR pre-processed samplings and the WIND ones. The MLM-like task is based on a random masking applied on one of the tokens which build the input extracted by the DSCOVR pre-processing pipeline: the goal is to predict the missing token by looking at the model's output values while retaining the original information. On the other hand, the NSP-like task involves the prediction of whether the sequence coming from the WIND pre-processing pipeline is the one corresponding to the same time window as the DSCOVR one. All the elements in the deep learning architecture have been built or imported by means of the Pytorch framework, which enabled us to start from an already existing training checkpoint for the selected model. The further training phase has been performed by means of a single RTX A6000 48GB GPU, on a dataset comprising values collected within June 2016 and December 2018. Moreover, test phase has been set after each completed epoch, based on a test dataset including values sampled in 2019.

Space Agency Data

[WIND] Solar Wind Experiments Dataset (available at https://cdaweb.gsfc.nasa.gov/pub/data/wind/swe/swe_h1/ ): we collected the entries regarding the velocity, the thermal speed and the density of the protons.


[DSCOVR] Faraday Cup Dataset (available at https://cdaweb.gsfc.nasa.gov/pub/data/dscovr/h1/faraday_cup/): we again collected only the entries concerning the velocity, the thermal speed and the density of the protons.

Hackathon Journey

It has truly been a challenging experience, which allowed us to face several complex problems and look for innovative and interesting solutions within a very limited amount of time. It was the first time that we approached a space-related scenario, and yet we were able to find a solution that tackled the problem at hand. We studied and understood the data that we were provided with, analysing the different fields composing such a complex dataset. We also trained from scratch a model we haven't used before, and we did it by "transfering" somehow our knowledge about AI techniques in a completely different scenario, that is, the one concerning space applications.

We were thrilled about choosing this challenge, among all the ones that have been proposed, since we all agreed that it was the most "challenging" indeed, and the closest to human's future needs. We encountered several difficulties and setbacks during this 24-hours rush, and yet we have overcome them by putting more effort, never letting down and instead sharing thoughts and ideas to bypass the problem. When we were blocked in a situation, we trusted each other's knowledge and background expertise to over the hump, and this was the red line that guided us throughout the whole competition: trust, ideas and collaboration.

We would like to thank the fifth member of out team, who introduced us for the first time, and AIKO for having believed in us and in our ideas.

Tags

#bert #ml #ai #multitasklearning #carringtonevent #a6000 #gpus #datapreprocessing #dscovr #wind #spacecraft