ResearchSpace

Algorithmic design considerations for geospatial and/or temporal big data

Show simple item record

dc.contributor.author Van Zyl, T
dc.date.accessioned 2014-09-30T13:25:34Z
dc.date.available 2014-09-30T13:25:34Z
dc.date.issued 2014-02
dc.identifier.citation Van Zyl, T. 2014. Algorithmic design considerations for geospatial and/or temporal big data. In: Big Data: Techniques and Technologies in Geoinformatics, CRC Press, London, UK, PP 117-132 en_US
dc.identifier.isbn 978-1-4665-8655-0
dc.identifier.uri http://www.crcnetbase.com/doi/abs/10.1201/b16524-7
dc.identifier.uri http://hdl.handle.net/10204/7704
dc.description Copyright: 2014 CRC Press, London, UK. Abstract only attached. en_US
dc.description.abstract In order to frame the geospatial temporal big data conversation, it is important to discuss them within the context of the three Vs (velocity, variety, and volume) of big data. Each of the Vs brings its own technical requirements to the algorithmic design process, and each of these requirements needs to be considered. It is also important to acknowledge that some of the challenges facing the broader big data community have always existed within the geospatial temporal data analytics community and will always continue to do so. Especially relevant are those big data challenges relating to data volume as presented by large quantities of either raster data, point clouds, and even vector data. Spatial data mining has long endeavored to unlock information from large databases with spatial attributes, and in these cases, algorithmic approaches have been adapted to overcome the data volume. Although the problem of big data is one that is well acknowledged and long studied, it is worth gaining a deeper insight and a more formal and rigorous treatment of the subject as is presented by the opportunity of a sudden awareness of spatial big data by the broader data community. Spatial data can be categorized into three major forms: these being raster, vector, and areal. Historically, it has been the case that raster data presented itself as a large volume challenge. It is clear that this historical trend is changing, and none of these categories maps neatly to any of the big data’s Vs. For example, a large volume of vector data is now plausible if the Internet of Things is considered, and these data could also place velocity constraints on the algorithms if near real-time processing is required. Additionally, high-variety unstructured data may arrive at high velocity or any other of the many permutations. What is clear across all these permutations of the big data Vs is that considerable consideration needs to be given to the time and space complexity of the algorithms that are required to process these data. In addition, each of the three Vs places added constraints on the others, and increasingly, the three Vs need to be considered together. For example, unstructured data increases the time complexity of algorithms needed to process the data chunks, while, for instance, high volumes of the same unstructured data increase space complexity. To gain a true sense of the overall challenge faced by the geospatial big data community, couple these classical challenges of big data with the added time and space complexity of spatial data algorithms. First, it is important to note that the independent identical distribution (IID) is not a reasonable assumption for either temporal or spatial data. The reason for this assumption failing is that both of these cases consider data that is auto-correlated. In fact, the first rule of geography is this fact exactly. As a result of not being able to make an IID assumption in most cases, the time complexity of spatial and temporal algorithms is higher than their traditional counterparts. For example, Spatial Auto-Regression is more complex than Linear Regression, Geographically Weighted Regression is more computationally demanding than Regression, and Co-Location Pattern Mining requiring spatial predicates is more complex than Association Rule Mining. In addition, ignoring the spatiotemporal autocorrelation in the data can lead to spurious results, for instance, the salt and pepper effect when clustering. The solution to the big data challenge is simple to describe yet in most cases is not easily tractable. Simply put it is important insofar as it is possible to minimize space complexity aiming for at most linear space complexity and target a time complexity that is log linear if not less. However, this is often not possible and other techniques are required. All is not lost and spatial data does not only present increased challenges in the big data arena but also provides additional exploitable opportunities in overcoming some of the big data challenges. For example, spatial autocorrelation allows for aggregations and filtering of data within fixed windows so as to reduce the total number of points required for consideration without excessive loss of information. It also allows the algorithm designer to consider points at sufficient distance as a single cluster thus reducing the number of computations. en_US
dc.language.iso en en_US
dc.publisher CRC Press en_US
dc.relation.ispartofseries Workflow;13338
dc.subject Geospatial big data en_US
dc.subject Geoinformatics en_US
dc.subject Spatial data en_US
dc.subject Independent identical distribution en_US
dc.subject IID en_US
dc.subject Linear Regression en_US
dc.subject Geographically Weighted Regression en_US
dc.title Algorithmic design considerations for geospatial and/or temporal big data en_US
dc.type Book Chapter en_US
dc.identifier.apacitation Van Zyl, T. (2014). Algorithmic design considerations for geospatial and/or temporal big data., <i>Workflow;13338</i> CRC Press. http://hdl.handle.net/10204/7704 en_ZA
dc.identifier.chicagocitation Van Zyl, T. "Algorithmic design considerations for geospatial and/or temporal big data" In <i>WORKFLOW;13338</i>, n.p.: CRC Press. 2014. http://hdl.handle.net/10204/7704. en_ZA
dc.identifier.vancouvercitation Van Zyl T. Algorithmic design considerations for geospatial and/or temporal big data.. Workflow;13338. [place unknown]: CRC Press; 2014. [cited yyyy month dd]. http://hdl.handle.net/10204/7704. en_ZA
dc.identifier.ris TY - Book Chapter AU - Van Zyl, T AB - In order to frame the geospatial temporal big data conversation, it is important to discuss them within the context of the three Vs (velocity, variety, and volume) of big data. Each of the Vs brings its own technical requirements to the algorithmic design process, and each of these requirements needs to be considered. It is also important to acknowledge that some of the challenges facing the broader big data community have always existed within the geospatial temporal data analytics community and will always continue to do so. Especially relevant are those big data challenges relating to data volume as presented by large quantities of either raster data, point clouds, and even vector data. Spatial data mining has long endeavored to unlock information from large databases with spatial attributes, and in these cases, algorithmic approaches have been adapted to overcome the data volume. Although the problem of big data is one that is well acknowledged and long studied, it is worth gaining a deeper insight and a more formal and rigorous treatment of the subject as is presented by the opportunity of a sudden awareness of spatial big data by the broader data community. Spatial data can be categorized into three major forms: these being raster, vector, and areal. Historically, it has been the case that raster data presented itself as a large volume challenge. It is clear that this historical trend is changing, and none of these categories maps neatly to any of the big data’s Vs. For example, a large volume of vector data is now plausible if the Internet of Things is considered, and these data could also place velocity constraints on the algorithms if near real-time processing is required. Additionally, high-variety unstructured data may arrive at high velocity or any other of the many permutations. What is clear across all these permutations of the big data Vs is that considerable consideration needs to be given to the time and space complexity of the algorithms that are required to process these data. In addition, each of the three Vs places added constraints on the others, and increasingly, the three Vs need to be considered together. For example, unstructured data increases the time complexity of algorithms needed to process the data chunks, while, for instance, high volumes of the same unstructured data increase space complexity. To gain a true sense of the overall challenge faced by the geospatial big data community, couple these classical challenges of big data with the added time and space complexity of spatial data algorithms. First, it is important to note that the independent identical distribution (IID) is not a reasonable assumption for either temporal or spatial data. The reason for this assumption failing is that both of these cases consider data that is auto-correlated. In fact, the first rule of geography is this fact exactly. As a result of not being able to make an IID assumption in most cases, the time complexity of spatial and temporal algorithms is higher than their traditional counterparts. For example, Spatial Auto-Regression is more complex than Linear Regression, Geographically Weighted Regression is more computationally demanding than Regression, and Co-Location Pattern Mining requiring spatial predicates is more complex than Association Rule Mining. In addition, ignoring the spatiotemporal autocorrelation in the data can lead to spurious results, for instance, the salt and pepper effect when clustering. The solution to the big data challenge is simple to describe yet in most cases is not easily tractable. Simply put it is important insofar as it is possible to minimize space complexity aiming for at most linear space complexity and target a time complexity that is log linear if not less. However, this is often not possible and other techniques are required. All is not lost and spatial data does not only present increased challenges in the big data arena but also provides additional exploitable opportunities in overcoming some of the big data challenges. For example, spatial autocorrelation allows for aggregations and filtering of data within fixed windows so as to reduce the total number of points required for consideration without excessive loss of information. It also allows the algorithm designer to consider points at sufficient distance as a single cluster thus reducing the number of computations. DA - 2014-02 DB - ResearchSpace DP - CSIR KW - Geospatial big data KW - Geoinformatics KW - Spatial data KW - Independent identical distribution KW - IID KW - Linear Regression KW - Geographically Weighted Regression LK - https://researchspace.csir.co.za PY - 2014 SM - 978-1-4665-8655-0 T1 - Algorithmic design considerations for geospatial and/or temporal big data TI - Algorithmic design considerations for geospatial and/or temporal big data UR - http://hdl.handle.net/10204/7704 ER - en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record