High-Level Project Summary
Importantense: The Earth's changing environments, and the natural distribution of its mineral, water, biota, and energy resources and provide methods for predicting and mitigating the effects of geologic hazards such as earthquakes, volcanic eruptions, floodsDeveloping: Data provide a wealth of information to aid in our understanding of Earth's interrelated processes, in developing innovative solutions for real-world challenges, and in making data-based decisions.Solving: Earth data scientists use programming languages like R and Python to analyze Earth and environmental data from data sources including satellites, drones, social media, field studies, and surveys.
Link to Final Project
Link to Project "Demo"
Detailed Project Description
Infrastructural support:
Most big Earth data analytical systems have already or are being migrated to a cloud
computing environment for rapid prototyping, result sharing, and reproducible research
(Peng, 2011). Some choose the private cloud as it allows for full control (Doelitzscher,
Sulistio, Reich, Kuijs, & Wolf, 2011), but most adopt the public cloud where a third-party
cloud provider performs the updates and maintenance of computing resources (Varia &
Mathew, 2014). For example, Mapbox uses Landsat on Amazon Web Services to power
Landsat-live, a browser-based map that is constantly refreshed with the latest imagery
from the Landsat 8 satellite (Yang, Yu, Hu, Jiang, & Li, 2017a). 
Cloud computing can support sustainable archive, access to different computing node
types, virtual desktops, and collaboration on data analytics. But for large scale, tightly
coupled big data analytics or modeling, high-performance computing is still the solution
for modeling, colocation of computing and data, data assimilation and inverse problems
(Huang et al., 2013). For example, NASA has been planning to go up to support 1.6 Exabytes
data with a 0.75 km resolution and global coverage for climate data (Lee, 2018). This means
to integrate datasets from global Goddard Earth Observing System Model (GEOS), Global
Modeling and Assimilation Office (GMAO), and other sources with sufficient computing and
storage capacity to a) provide data/analytical/knowledge services, b) support artificial
intelligence/machine learning/deep learning for inference, and c) engage PB level data to
support comprehensive analytics and data fusion.
Graphics processing units (GPU) computing has boosted the simulation and analytics of
Earth and space phenomena demonstrating significant speedups than conventional central
computer processors (Madhukar, 2019). For example, the calculation of aerosol optical
depth from the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite data
using GPU can be 43 times faster than the one using central processing units (Liu et al.,
2016). Numerical simulation can also be accelerated using GPU computing. The large-scale
simulation of seismic wave propagation on GPU was 45-fold faster than CPU whilst main-
taining a precise accuracy (Okamoto, Takenaka, Nakamura, & Aoki, 2013).
Recent computing advancements also distribute some computing tasks to the edge of
the infrastructure, for example, the smart things at the edge of the Internet of Things, and
the mobile devices of mobile Internet. They are termed as mobile computing and edge
computing to conduct early processing or preprocessing of data collected at the sensor
side and to provide end visualization and facilitate user interaction.
While the computing infrastructure powers big data analytics, network and security
infrastructure as well as monitoring, scheduling, managing, and integration infra-
structure enables the computing and analytics to be operated in a smooth, dynamic,
safe, and easy-to-use fashion.
Data sources, ingestion, and store:
Another important module of the system architecture is the data store, which is responsible
for archive and access to Earth data archived. Traditionally, Earth science data can be
categorized into the atmosphere, ocean, land, hydrology, and socio-economic data accord-
ing to their disciplines (Acker & Leptoukh, 2007). New data sources in the Big Data era are
expanded to real-time location tracking, observations of the urban environment, and social media data from citizens (Mayer-Schönberger & Cukier, 2013).
Depending on the nature and usage of Earth data, they are traditionally stored in
a file system, relational, or No-SQL database. For example, real-time location tracking
data are usually stored in a Relational Database Management System (RDBMS) (Tian,
Jiang, Chen, Li, & Mu, 2014). Several efforts have been made to store geospatial
coverages when structured as arrays with an array-based database as the coverages
are not well suited to traditional RDBMSs (Baumann, 2014).
Data discovery and analytics:
As a prior step to performing any data analytical tasks, traditional data discovery relies on
open source technologies such as Solr and Elasticsearch (Nogueras-Iso, Zarazaga-Soria,
Béjar, Álvarez, & Muro-Medrano, 2005). Metadata of these data are often stored in a full-
text search engine (e.g. Apache Lucene) (Jiang, Yang, Xia, & Liu, 2016), which can be
searched like a google search engine. Recent endeavors started to integrate smart capabil-
ities, e.g. query understanding, ranking, and recommendation, based on artificial intelli-
gence advancements (Jiang et al., 2018, 2017; Li, Goodchild, & Raskin, 2014; Wiegand &
García, 2007). Common Earth data analytical functions range in complexity from simple
numerical functions to raster and vector operations, visualization and exploration, and
machine learning. More details of analytical functions will be reviewed in the next session.
Distributed computing technologies are widely adopted across different existing
systems (Agrawal, Das, & El Abbadi, 2011) for big data analytics. Apache Spark and
Hadoop MapReduce are two typical open source distributed solutions for big data
analytics. The former is usually much faster as the latter reads and writes from disk
more often (Zaharia et al., 2012). For example, Li et al. (2016) proposed a workflow to
accelerate the Weblog mining process using Spark.
Big Earth data analytics:
Big Earth data analytics include the analytical lifecycle of preparing, reducing, analyzing,
mining, and visualizing large amounts of spatiotemporal and spectral data, encompassing
a variety of data types (Kempler & Mathews, 2017). The volume, velocity, variety, and veracity
in the acquired data pose grant challenges in data processing for value (Yang et al., 2017a).
The analytical process enables the discovery of patterns, correlations, principles, knowledge
and other information for better understanding our Earth system and responding to problems
induced by global and regional changes (Bhattacharyya & Ivanova, 2017). The following
sections summarize the literature from different aspects of big Earth data analytics.
Data analytical methods:
After preprocessing, the main focus of data analytics is to reveal hidden patterns,
unknown correlations, and other useful information from a large volume of heteroge-
neous data to facilitate Earth science study. Big Earth data analytics support all aspects
of Earth science research, such as hypothesis and data discovery-driven methods,
dynamical models, and goal driven decisions (Kempler & Mathews, 2017). The involved
methods can be categorized into model simulation and prediction, statistics, machine
learning, and deep learning.
Space Agency Data
Machine learning methods:
Evolving from artificial intelligence, machine learning methods develop models that are
based on characteristics and features learned from empirical data and can infer unknown
problems and discover unknown patterns (Sellars et al., 2013). Machine learning methods
generally have the advantage over traditional statistical methods in non-linear relationship
understanding, and this advantage can be leveraged to model high-dimensional and non-
linear data with complex interactions and missing values, which is particularly the case for
big Earth data (Thessen, 2016). Derived from statistical methods, regression, classification,
clustering can also be used as machine learning methods, thus the exact division between
machine learning and statistical methods is not always clear. For example, Artificial Neural
Networks can produce regression on approximating and predicting ecological conditions
(Franceschini et al., 2019). Machine learning classifiers including Random Forest, Support
Vector Machines, and Bayesian Classifiers can produce the probability of an observation
belonging to a specific class of Earth process, such as landslide (Hong et al., 2016).
Clustering can group observations based on similarity, which is useful in detecting rare
events such as fire (Chakraborty & Paul, 2010; Khatami et al., 2017). Fuzzy inference and
some tree-based machine learning methods (e.g. Decision Tree) can extract a set of rules
from the observation to make predictions, such as forest cover and change (Sexton et al.,
2016).
Deep learning methods:
Deep learning methods, evolving from machine learning, offer unique capabilities in
extracting and presenting features at different and detailed levels from the Earth data
(Manning, 2015; LeCun, Bengio, & Hinton, 2015). These features and characteristics are
extremely important in Earth data classification and segmentation tasks. Due to its more
powerful expression and parameter optimization capability, deep learning has achieved
great performance in computer vision, natural language processing, recommendation
systems, and others (Collobert & Weston, 2008; Krizhevsky et al., 2012; Schmidhuber,
2015). For example, the deep convolutional neural networks (CNNs), e.g. AlexNet
(Krizhevsky et al., 2012), VGGNet (Chatfield et al., 2014), and PlacesNet (Zhou et al.,
2014), can perform satisfying results in classifying scenes from high resolution remote
sensing imagery into categories such as airport, bridge, desert, forest, and so on. Beyond
image classification, objects can be detected and segmented from Earth datasets using
deep learning techniques (Cimpoi et al., 2015; Girshick et al., 2014). Deep learning
methods can also help increase the computational efficiency of numerical simulations
(e.g. weather prediction) whilst maintaining reasonable accuracy (Wang et al., 2018).
We selected popular tools to analyze how they support different big Earth data
analytics and compared them (Table 3) from aspects of scalability, analytical methods,
programming languages, and graphical user interface (GUI).
Natural resources & environment:
Natural resources have been over-exploited by human kind, causing loss and degrada-
tion of habitats and depleting biological diversity (Smil, 2013). Human beings, especially
the marginalized and vulnerable communities, need to adapt to the rapidly changing
environment and its corresponding adverse circumstance, leading to the attention of
natural resource conservation and sustainable use of biological diversity (Collen et al.,
2013). The capability to monitor the impact of biological diversity and global environ-
mental change is crucial to designing effective adaptation and mitigation strategies to
prevent further loss of natural resources (Pettorelli et al., 2014). This requires the
scientific community to obtain datasets and assess the spatiotemporal changes in the
distribution of atmospheric, ocean, and land surface conditions, and the distribution and
function of the natural resource. Big Earth data are the source for mapping the distribu-
tion of natural resources, especially over large areas, including forest cover change
(Hansen et al., 2013), vegetation cover (Karnieli et al., 2013), and biodiversity dynamics
(Jeltsch et al., 2013; Kuenzer et al., 2014).
Environmental pollution requires big Earth data to monitor and assess in the long term.
Satellite observations, for example, are used in the analysis of European nighttime lights
over 15 years, showing complex patterns of light pollution (Bennie et al., 2014), provide
insight into global long-term changes in air, water, and soil pollution (Fingas & Brown, 2014;
Lehmann et al., 2015; Lin et al., 2015; Schmidt et al., 2015; Van Donkelaar et al., 2015).
Hackathon Journey
It's a Amazing work with you. I know lot of space oriented theory so thank you for this opportunity.
With regards
Sri vinayak .N
References
Acker, J. G., & Leptoukh, G. (2007). Online analysis enhances use of NASA earth science data. Eos,
Transactions American Geophysical Union, 88(2), 14–17.
Agrawal, D., Das, S., & El Abbadi, A. (2011, March). Big data and cloud computing: Current state and
future opportunities. Proceedings of the 14th International Conference on Extending Database
Technology (pp. 530–533). Uppsala, Sweden: ACM.
Ahmad, A., Paul, A., Rathore, M., & Chang, H. (2016). An efficient multidimensional big data fusion
approach in machine-to-machine communication. ACM Transactions on Embedded Computing
Systems (TECS), 15(2), 39.
Alpaydin, E. (2014). Introduction to machine learning. Cambridge, MA: MIT press.
Apache (2017). The Science Data Analytics Platform (SDAP) proposal [online]. Retrieved from
https://wiki.apache.org/incubator/SDAPProposal
Asner, G. P., Knapp, D. E., Boardman, J., Green, R. O., Kennedy-Bowdoin, T., Eastwood, M., . . .
Field, C. B. (2012). Carnegie Airborne observatory-2: Increasing science data dimensionality via
high-fidelity multi-sensor fusion. Remote Sensing of Environment, 124, 454–465.
Bambacus, M., Yang, C. P., Leung, R. Y., Barbee, B., Nuth, J. A., Seery, B., . . . Xu, M. (2017). A Planetary
Defense Gateway for Smart Discovery of relevant Information for Decision Support.
Batty, M. (2007). Cities and complexity: Understanding cities with cellular automata, agent-based
models, and fractals. Cambridge, MA: The MIT press.
Baumann, P. (2014). Rasdaman: Array databases boost spatio-temporal analytics. Computing for
Geospatial Research and Application (COM. Geo), 2014 Fifth International Conference (p. 54).
Washington, DC.
Bendig, J., Bolten, A., & Bareth, G. (2012). Introducing a low-cost mini-UAV for thermal-and
multispectral-imaging. International Archives of the Photogrammetry, Remote Sensing and
Spatial Information Sciences, 39(B1), 345–349.
Bennie, J., Davies, T. W., Duffy, J. P., Inger, R., & Gaston, K. J. (2014). Contrasting trends in light
pollution across Europe based on satellite observed night time lights. Scientific Reports, 4, 3789.
Bernhardt, K. (2007). Agent-based modeling in transportation. Artificial Intelligence in
Transportation, 72(E-C113).
Bhattacharyya, S., & Ivanova, D. (2017). Scientific computing and big data analytics: Application in
climate science. In S. Mazumder, R. S. Bhadoria & G. C. Deka (Eds.), Distributed computing in big
data analytics (pp. 95–106). Cham: Springer.
Binkowski, F. S., & Roselle, S. J. (2003). Models-3 Community Multiscale Air Quality (CMAQ) model
aerosol component 1. Model description. Journal of Geophysical Research: Atmospheres, 108, D6.
Borradaile, G. J. (2013). Statistics of earth science data: Their distribution in time, space and orienta-
tion. Berlin, Germany: Springer Science & Business Media.
Caldwell, P. M., Bretherton, C. S., Zelinka, M. D., Klein, S. A., Santer, B. D., & Sanderson, B. M. (2014).
Statistical significance of climate sensitivity predictors obtained by data mining. Geophysical
Research Letters, 41(5), 1803–1808.
Camara, G., Assis, L. F., Ribeiro, G., Ferreira, K. R., Llapa, E., & Vinhas, L. (2016, October). Big earth
observation data analytics: Matching requirements to system architectures. Proceedings of the
5th ACM SIGSPATIAL International Workshop on Analytics For Big Geospatial Data (pp. 1–6).
Burlingname, CA: ACM.
Candiago, S., Remondino, F., De Giglio, M., Dubbini, M., & Gattelli, M. (2015). Evaluating multi-
spectral images and vegetation indices for precision farming applications from UAV images.
Remote Sensing, 7(4), 4026–4047.
Chakraborty, I., & Paul, T. K. (2010, June). A hybrid clustering algorithm for fire detection in video
and analysis with color based thresholding method. In 2010 International Conference on
Advances in Computer Engineering (pp. 277–280). Bangalore, India: IEEE.
Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014, September 1–5). Return of the devil
in the details: Delving deep into convolutional nets. Proceedings of the British Machine Vision
Conference, Nottingham, UK.
Chini, M., Piscini, A., Cinti, F. R., Amici, S., Nappi, R., & DeMartini, P. M. (2013). The 2011 Tohoku
(Japan) Tsunami inundation and liquefaction investigated through optical, thermal, and SAR
data. IEEE Geoscience and Remote Sensing Letters, 10(2), 347–351.
Chun, B., & Guldmann, J. M. (2014). Spatial statistical analysis and simulation of the urban heat
island in high-density central cities. Landscape and Urban Planning, 125, 76–88.
Cimpoi, M., Maji, S., & Vedaldi, A. (2015, June 7–12). Deep filter banks for texture recognition and
segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
Boston, MA, USA. (pp. 3828–3836).
Collen, B., Pettorelli, N., Baillie, J. E., & Durant, S. M. (Eds.) (2013). Biodiversity monitoring and
conservation: Bridging the gap between global commitment and local action. Cambridge, UK:
John Wiley & Sons, Wiley-Blackwell.
Collobert, R., & Weston, J. (2008, July). A unified architecture for natural language processing: Deep
neural networks with multitask learning. In Proceedings of the 25th international conference on
Machine learning (pp. 160–167). Helsinki, Finland: ACM.
Courtier, P., Thépaut, J. N., & Hollingsworth, A. (1994). A strategy for operational implementation of
4D-Var, using an incremental approach. Quarterly Journal of the Royal Meteorological Society, 120
(519), 1367–1387.
Cressie, N. (2015). Statistics for spatial data. Hoboken, NJ: John Wiley & Sons.
de Jong, R., de Bruin, S., Schaepman, M., & Dent, D. (2011). Quantitative mapping of global land
degradation using Earth observations. International Journal of Remote Sensing, 32(21),
6823–6853.
De Lannoy, G. J., Reichle, R. H., Arsenault, K. R., Houser, P. R., Kumar, S., Verhoest, N. E., &
Pauwels, V. R. (2012). Multiscale assimilation of advanced microwave scanning radiometer–
Tags
#Earth #Space #Data #Anaylsis

