Algorithmic design considerations for geospatial and/or temporal big data

Van Zyl, T

dc.contributor.author	Van Zyl, T
dc.date.accessioned	2014-09-30T13:25:34Z
dc.date.available	2014-09-30T13:25:34Z
dc.date.issued	2014-02
dc.identifier.citation	Van Zyl, T. 2014. Algorithmic design considerations for geospatial and/or temporal big data. In: Big Data: Techniques and Technologies in Geoinformatics, CRC Press, London, UK, PP 117-132	en_US
dc.identifier.isbn	978-1-4665-8655-0
dc.identifier.uri	http://www.crcnetbase.com/doi/abs/10.1201/b16524-7
dc.identifier.uri	http://hdl.handle.net/10204/7704
dc.description	Copyright: 2014 CRC Press, London, UK. Abstract only attached.	en_US
dc.description.abstract	In order to frame the geospatial temporal big data conversation, it is important to discuss them within the context of the three Vs (velocity, variety, and volume) of big data. Each of the Vs brings its own technical requirements to the algorithmic design process, and each of these requirements needs to be considered. It is also important to acknowledge that some of the challenges facing the broader big data community have always existed within the geospatial temporal data analytics community and will always continue to do so. Especially relevant are those big data challenges relating to data volume as presented by large quantities of either raster data, point clouds, and even vector data. Spatial data mining has long endeavored to unlock information from large databases with spatial attributes, and in these cases, algorithmic approaches have been adapted to overcome the data volume. Although the problem of big data is one that is well acknowledged and long studied, it is worth gaining a deeper insight and a more formal and rigorous treatment of the subject as is presented by the opportunity of a sudden awareness of spatial big data by the broader data community. Spatial data can be categorized into three major forms: these being raster, vector, and areal. Historically, it has been the case that raster data presented itself as a large volume challenge. It is clear that this historical trend is changing, and none of these categories maps neatly to any of the big data’s Vs. For example, a large volume of vector data is now plausible if the Internet of Things is considered, and these data could also place velocity constraints on the algorithms if near real-time processing is required. Additionally, high-variety unstructured data may arrive at high velocity or any other of the many permutations. What is clear across all these permutations of the big data Vs is that considerable consideration needs to be given to the time and space complexity of the algorithms that are required to process these data. In addition, each of the three Vs places added constraints on the others, and increasingly, the three Vs need to be considered together. For example, unstructured data increases the time complexity of algorithms needed to process the data chunks, while, for instance, high volumes of the same unstructured data increase space complexity. To gain a true sense of the overall challenge faced by the geospatial big data community, couple these classical challenges of big data with the added time and space complexity of spatial data algorithms. First, it is important to note that the independent identical distribution (IID) is not a reasonable assumption for either temporal or spatial data. The reason for this assumption failing is that both of these cases consider data that is auto-correlated. In fact, the first rule of geography is this fact exactly. As a result of not being able to make an IID assumption in most cases, the time complexity of spatial and temporal algorithms is higher than their traditional counterparts. For example, Spatial Auto-Regression is more complex than Linear Regression, Geographically Weighted Regression is more computationally demanding than Regression, and Co-Location Pattern Mining requiring spatial predicates is more complex than Association Rule Mining. In addition, ignoring the spatiotemporal autocorrelation in the data can lead to spurious results, for instance, the salt and pepper effect when clustering. The solution to the big data challenge is simple to describe yet in most cases is not easily tractable. Simply put it is important insofar as it is possible to minimize space complexity aiming for at most linear space complexity and target a time complexity that is log linear if not less. However, this is often not possible and other techniques are required. All is not lost and spatial data does not only present increased challenges in the big data arena but also provides additional exploitable opportunities in overcoming some of the big data challenges. For example, spatial autocorrelation allows for aggregations and filtering of data within fixed windows so as to reduce the total number of points required for consideration without excessive loss of information. It also allows the algorithm designer to consider points at sufficient distance as a single cluster thus reducing the number of computations.	en_US
dc.language.iso	en	en_US
dc.publisher	CRC Press	en_US
dc.relation.ispartofseries	Workflow;13338
dc.subject	Geospatial big data	en_US
dc.subject	Geoinformatics	en_US
dc.subject	Spatial data	en_US
dc.subject	Independent identical distribution	en_US
dc.subject	IID	en_US
dc.subject	Linear Regression	en_US
dc.subject	Geographically Weighted Regression	en_US
dc.title	Algorithmic design considerations for geospatial and/or temporal big data	en_US
dc.type	Book Chapter	en_US
dc.identifier.apacitation	Van Zyl, T. (2014). Algorithmic design considerations for geospatial and/or temporal big data., <i>Workflow;13338</i> CRC Press. http://hdl.handle.net/10204/7704	en_ZA
dc.identifier.chicagocitation	Van Zyl, T. "Algorithmic design considerations for geospatial and/or temporal big data" In <i>WORKFLOW;13338</i>, n.p.: CRC Press. 2014. http://hdl.handle.net/10204/7704.	en_ZA
dc.identifier.vancouvercitation	Van Zyl T. Algorithmic design considerations for geospatial and/or temporal big data.. Workflow;13338. [place unknown]: CRC Press; 2014. [cited yyyy month dd]. http://hdl.handle.net/10204/7704.	en_ZA
dc.identifier.ris	TY - Book Chapter AU - Van Zyl, T AB - In order to frame the geospatial temporal big data conversation, it is important to discuss them within the context of the three Vs (velocity, variety, and volume) of big data. Each of the Vs brings its own technical requirements to the algorithmic design process, and each of these requirements needs to be considered. It is also important to acknowledge that some of the challenges facing the broader big data community have always existed within the geospatial temporal data analytics community and will always continue to do so. Especially relevant are those big data challenges relating to data volume as presented by large quantities of either raster data, point clouds, and even vector data. Spatial data mining has long endeavored to unlock information from large databases with spatial attributes, and in these cases, algorithmic approaches have been adapted to overcome the data volume. Although the problem of big data is one that is well acknowledged and long studied, it is worth gaining a deeper insight and a more formal and rigorous treatment of the subject as is presented by the opportunity of a sudden awareness of spatial big data by the broader data community. Spatial data can be categorized into three major forms: these being raster, vector, and areal. Historically, it has been the case that raster data presented itself as a large volume challenge. It is clear that this historical trend is changing, and none of these categories maps neatly to any of the big data’s Vs. For example, a large volume of vector data is now plausible if the Internet of Things is considered, and these data could also place velocity constraints on the algorithms if near real-time processing is required. Additionally, high-variety unstructured data may arrive at high velocity or any other of the many permutations. What is clear across all these permutations of the big data Vs is that considerable consideration needs to be given to the time and space complexity of the algorithms that are required to process these data. In addition, each of the three Vs places added constraints on the others, and increasingly, the three Vs need to be considered together. For example, unstructured data increases the time complexity of algorithms needed to process the data chunks, while, for instance, high volumes of the same unstructured data increase space complexity. To gain a true sense of the overall challenge faced by the geospatial big data community, couple these classical challenges of big data with the added time and space complexity of spatial data algorithms. First, it is important to note that the independent identical distribution (IID) is not a reasonable assumption for either temporal or spatial data. The reason for this assumption failing is that both of these cases consider data that is auto-correlated. In fact, the first rule of geography is this fact exactly. As a result of not being able to make an IID assumption in most cases, the time complexity of spatial and temporal algorithms is higher than their traditional counterparts. For example, Spatial Auto-Regression is more complex than Linear Regression, Geographically Weighted Regression is more computationally demanding than Regression, and Co-Location Pattern Mining requiring spatial predicates is more complex than Association Rule Mining. In addition, ignoring the spatiotemporal autocorrelation in the data can lead to spurious results, for instance, the salt and pepper effect when clustering. The solution to the big data challenge is simple to describe yet in most cases is not easily tractable. Simply put it is important insofar as it is possible to minimize space complexity aiming for at most linear space complexity and target a time complexity that is log linear if not less. However, this is often not possible and other techniques are required. All is not lost and spatial data does not only present increased challenges in the big data arena but also provides additional exploitable opportunities in overcoming some of the big data challenges. For example, spatial autocorrelation allows for aggregations and filtering of data within fixed windows so as to reduce the total number of points required for consideration without excessive loss of information. It also allows the algorithm designer to consider points at sufficient distance as a single cluster thus reducing the number of computations. DA - 2014-02 DB - ResearchSpace DP - CSIR KW - Geospatial big data KW - Geoinformatics KW - Spatial data KW - Independent identical distribution KW - IID KW - Linear Regression KW - Geographically Weighted Regression LK - https://researchspace.csir.co.za PY - 2014 SM - 978-1-4665-8655-0 T1 - Algorithmic design considerations for geospatial and/or temporal big data TI - Algorithmic design considerations for geospatial and/or temporal big data UR - http://hdl.handle.net/10204/7704 ER -	en_ZA

Files in this item

Name: VanZyl2_2014_ABSTRACT ...

Size: 90.66Kb

Format: PDF

View/Open

This item appears in the following Collection(s)

Book Chapters

Show simple item record

Browse

All of ResearchSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects
- Publication Type
- Cluster
- Impact Area

Quick Links

Legislation and compliance

General Enquiries

Tel: + 27 12 841 2911
Email: callcentre@csir.co.za

Physical Address
Meiring Naudé Road
Brummeria
Pretoria
South Africa

Postal Address
PO Box 395
Pretoria 0001
South Africa

Social Connect

Resources on this site are free to download and reuse according to associated licensing provision. Please read the terms and conditions of usage of each resource.