Big Data Platform Considerations
This article highlights some considerations to be taken into account when designing a Big Data Platform in terms of Database selection and Hardware choice.
The Challenge
To create a scalable data architecture that meets the requirements to serve a particular data set within the required speed, size and budget constraints.
RDBMS vs NoSQL vs HADOOP
Hadoop and No SQL databases are great but in cases where the data structure is complex with a deep hierarchy of complex object relations there are some significant advantages of using a traditional RDBMS such as:
- Simplicity:
It is very easy to navigate a hierarchal structure using basic SQL - Talent Pool:
Its easier to hire a DBA experienced in SQL based DB’s than others - Ease of Use:
Data Scientists are able to run complex analyses with SQL which are often complicated to translate into code.
For the purpose of this article we will focus on an RDBMS based solution.
Lack of Parallelisation
One of the limitations of traditional RDBMS’ is the lack of true MPP (Massive Parallel Processing) capability for large queries.
To address this a number of different approaches have been taken led by enterprise solutions:
Netezza
An older example of attempts at achieving fast databases through smart storage techniques on spiinning disks and parallelism done at a low level.
Netezza stored the “hot data” on the outer tracks of the disk platter to leverage the faster access times and used the inner tracks for non hot-data and duplication.
Parallelism was achieved through dividing the query execution in smaller worker tasks and combining the outputs into the final result.
Netezza
Netezza was acquired by IBM in 2010
GreenPlum
GreenPlum was acquired by EMC in 2010
Teradata
Teradata was acquired by EMC in 2010
EMC
Dell acquired EMC in 2016
The Enterprise Market Solutions
Getting meaningful analytics from Data Warehouses requires a significant amount of processing – One can imagine a sort or aggregation on a billion row table of numeric values by date for example.
Despite the Schema designer’s best efforts to keep the database as clean and as close to first normal form as possible – there is no getting around the fact that in most cases we will be waiting for that single CPU thread to execute the operation sequentially.
Enabling parallel operations on a database is not something like an all or nothing – it can enabled on selected operations – where appropriate if you like.
Meaningful data insights on large datasets using a traditional DBMS!
- Rapid Response
- Low Latency
- Scalable User
Through levereging significantly improved web capabilities of HTML5, Avinton has delivered rich web based applications which allow users to visualise a wealth of information on their assets in the field such as:
- Configuration Parameters
- Connectivity to other nodes
- Performance Metrics
- Alarm Status
GIS Analytics
Analytics on the geographical data allows to aggregate statistics geographically by area, region, city or country as well as user defined areas. Our geo platform enables analyses against demographic datasets such as population density, land use type and such. This data can either be overlayed on the map or reported in table form.
Overlays
Heat maps and other calculated data sets such as analyses results or external layers can be overlayed onto the map. Through animations of the data over a time series one is able to visualise trends in the changes within the spatial data.
Map Data Support
Map data of any form can be imported into the system. Vector transformation to the base projection is trivial and we have in-house tools for raster reprojection if it so happens that there are more than one base projection in the geo data.
GIS Consulting
Avinton provides end to end GIS software development, delivery and support. We also provide an advisory service for those looking to get support with their application stack definition and which frameworks and application stack will best suit their needs.
Fleet Tracking & Management
GPS data from personnel, vehicles or assets in transit allows near realtime tracking and visualisation. This data can be hooked into existing call centre or internal operations software to leverage the location awareness.