Big Data Solutions – Storage

By James Cauchi | mainpage, Tech Articles | Comments are Closed | 16 March, 2016 | 3

Considerations

Security

Using hardened APIs and best practice security measures corresponding to the environment.

Reliability

The Data Store is required to respond consistently to Read and Write requests.

Speed!

The storage solution needs to accommodate IO operations at a speed that the application demands to maintain user experience and data concurrency

Scalability

An appropriate storage solution selection (ex: DAS / SAN / NAS) which can scale well into the future demands of the application and within the environment constraints.

Backup

A Robust backup solution design for minimal HW footprint with appropriate point in time recovery, system restore time, geo-redundancy and replication factor.

Fault Tolerance

Rugged fault tolerance built into both the hardware and software layer.

How can we achieve this?

– Storage Path Optimisation

Keep the data as close as possible to where it’s being used

– Accelerated Data Access

Multidimensional Caching
- Hardware Level (ex. Storage Controller Cache / SAN Cache Pool)
- Storage Fabric Level
- System Service Level
- Application Data Access Layer / API Level
In-Memory Datasets and Indexes
Adaptive Data Compression, De-duplication, Preallocation

– Data Classification for performance and cost efficiency

From simple data access frequency or age based to complex, pattern based or statistical predictive algorithms,
Or Data Type based classification for Object, Block and File Storage
Example:

Hot Data ~ In Memory

Extremely high frequency

Semi-structured transactional data

Preprocessing subsets

Warm Data ~ Flash Disk (SSD)

High Frequency

Structured Subsets

Cold Data ~ Fast Disk Array

Low Frequency

Structured Subsets

ICY Data ~ Slow Disk Array / Remote

Very low frequency

Structured

Compressed Subsets

Frozen Data ~ Tape Library

– Reduce Disk IO needed for each data request

In case of BLOB (binary large object) storage we can use large volumes (~100GB) with an in-memory index.
The 100GB Volume can store a number of images say with their respective location in the volume known by the index which is held in the storage node’s memory for quick access.

– Using SSDs

At Avinton we design our solutions where we place the HOT data on SSD arrays and the cold data on the spinning disks. In cases where it is not immediately apparent which data is hot or cold we gather meta data on the files or tables in order to understand the number of reads, updates, index scans and so on which will then allow us to isolate the hot data.
In some cases data classification (HOT / Warm / Cold) is relative to age so newer data will be HOT while the older data is expired onto the Cold storage area.

– In-Memory Index

In scenarios where the data volumes are really large we also use in-memory indexes – typically in the form of key-value pairs. With the recent improvements in the reliability of in-memory key-value pair solutions with persistence we are able to achieve significant performance gains with minimal risk.

Conclusions

– Improved Application Performance

Done right the application of such techniques will improve the data response speed significantly and is often part of the solution for long running queries. In some cases we are able to improve the storage performance while avoiding a costly hardware upgrade.

– Data Schema Considerations

In the case of any storage solution one cannot rely on these techniques alone. A good schema in the case of an RDBMS data warehouse is key for having a responsive solution. Other areas to look at are bottlenecks on the data input and output interfaces (be it SCSI / SAS / IP).

– Why Avinton?

Avinton are by no means pioneers in this area – similar techniques are used by Google, Facebook, Yahoo and many other big players. We have simply mastered these techniques having been using them throughout the years starting from our early Telecom monitoring solutions which are still in use today.

Final Thoughts

To design a good data storage solution the following are necessary:
– Know your data (HOT vs Cold – Structure, Size, Types etc..)
– Know your users (#Simultaneous Users, Types of queries)
– Detailed knowledge of HW (Server vendor specific HW options)
– Good working knowledge of the storage technique in use (be it DB or File based storage)
– Appropriate Storage Solution Selection (DAS / SAN / NAS)

Having a scalable Big Data Storage solution that allows you to leverage data insights efficiently is fundamental since having a lot of data which is slow to retrieve diminishes its value.

Avinton have designed and delivered various data solutions including both RDBMS (PostgreSQL & ORACLE) and hybrid RDBMS & file based solutions on HDFS (Hadoop).
We offer an End to End service from Design > Dimensioning > Implementation > Deployment > SLA based Support.

A common theme throughout this article is that Avinton’s solutions feature design considerations for improved IO performance both on the Software and Hardware level. This stems from our philosophy that to design high performance big data solutions one has to have a good understanding of the underlying hardware.

Our Research, Development and Testing work at our development and training centre in Yokohama is where we test new hardware configurations and combine them with well known big data solutions like our latest project with Spark on Hadoop. This allows us to bring our clients tailored solutions based on test result data.

Our Research, Development and Testing experience Enables us to:

Deliver optimised HW / SW platform combinations
Reduce time to market
Heavily Tailor the solution to our client’s design requirements
Provide SLA based HW & Application support

We are passionate about data and welcome any enquiries in this regard.

Big Data Solutions – Storage

Big Data Solutions – Storage

Considerations

Security

Reliability

Speed!

Scalability

Backup

Fault Tolerance

How can we achieve this?

– Storage Path Optimisation

– Accelerated Data Access

– Data Classification for performance and cost efficiency

Hot Data ~ In Memory

Warm Data ~ Flash Disk (SSD)

Cold Data ~ Fast Disk Array

ICY Data ~ Slow Disk Array / Remote

Frozen Data ~ Tape Library

– Reduce Disk IO needed for each data request

– Using SSDs

– In-Memory Index

Conclusions

– Improved Application Performance

– Data Schema Considerations

– Why Avinton?

Final Thoughts

Related Post

Automation, Robotics, AI and Jobs?

Machine Learning / AI Storage and Infrastructure Considerations

Smart Manufacturing: How Modern Factories Use Machine Vision & Edge AI to Increase Efficiency

5 Tips to Land a Job in IT Engineering

採用情報

Avinton SDGs

Search

Big Data Solutions – Storage

Big Data Solutions – Storage

Considerations

Security

Reliability

Speed!

Scalability

Backup

Fault Tolerance

How can we achieve this?

– Storage Path Optimisation

– Accelerated Data Access

– Data Classification for performance and cost efficiency

Hot Data ~ In Memory

Warm Data ~ Flash Disk (SSD)

Cold Data ~ Fast Disk Array

ICY Data ~ Slow Disk Array / Remote

Frozen Data ~ Tape Library

– Reduce Disk IO needed for each data request

– Using SSDs

– In-Memory Index

Conclusions

– Improved Application Performance

– Data Schema Considerations

– Why Avinton?

Final Thoughts

Related Post

Automation, Robotics, AI and Jobs?

Machine Learning / AI Storage and Infrastructure Considerations

Smart Manufacturing: How Modern Factories Use Machine Vision & Edge AI to Increase Efficiency

5 Tips to Land a Job in IT Engineering

採用情報

Avinton SDGs

Search

Tags