Using hardened APIs and best practice security measures corresponding to the environment.
The Data Store is required to respond consistently to Read and Write requests.
The storage solution needs to accommodate IO operations at a speed that the application demands to maintain user experience and data concurrency
An appropriate storage solution selection (ex: DAS / SAN / NAS) which can scale well into the future demands of the application and within the environment constraints.
A Robust backup solution design for minimal HW footprint with appropriate point in time recovery, system restore time, geo-redundancy and replication factor.
Rugged fault tolerance built into both the hardware and software layer.
Keep the data as close as possible to where it’s being used
Extremely high frequency
Semi-structured transactional data
Preprocessing subsets
High Frequency
Structured Subsets
Low Frequency
Structured Subsets
Very low frequency
Structured
Compressed Subsets
Archive
It is necessary to identify differences in access patterns to the various pieces of data in order to ensure that the appropriate storage solution is chosen for each type.
Data which is accessed or updated frequently can be classed as hot.
Data which is accessed or updated occasionally can be classified as cold and warm being somewhere in between.
These different classifications can allow us to further differentiate the replication factor and access speed required for the different data areas.
In case of BLOB (binary large object) storage we can use large volumes (~100GB) with an in-memory index.
The 100GB Volume can store a number of images say with their respective location in the volume known by the index which is held in the storage node’s memory for quick access.
At Avinton we design our solutions where we place the HOT data on SSD arrays and the cold data on the spinning disks. In cases where it is not immediately apparent which data is hot or cold we gather meta data on the files or tables in order to understand the number of reads, updates, index scans and so on which will then allow us to isolate the hot data.
In some cases data classification (HOT / Warm / Cold) is relative to age so newer data will be HOT while the older data is expired onto the Cold storage area.
In scenarios where the data volumes are really large we also use in-memory indexes – typically in the form of key-value pairs. With the recent improvements in the reliability of in-memory key-value pair solutions with persistence we are able to achieve significant performance gains with minimal risk.
Done right the application of such techniques will improve the data response speed significantly and is often part of the solution for long running queries. In some cases we are able to improve the storage performance while avoiding a costly hardware upgrade.
In the case of any storage solution one cannot rely on these techniques alone. A good schema in the case of an RDBMS data warehouse is key for having a responsive solution. Other areas to look at are bottlenecks on the data input and output interfaces (be it SCSI / SAS / IP).
Avinton are by no means pioneers in this area – similar techniques are used by Google, Facebook, Yahoo and many other big players. We have simply mastered these techniques having been using them throughout the years starting from our early Telecom monitoring solutions which are still in use today.
To design a good data storage solution the following are necessary:
– Know your data (HOT vs Cold – Structure, Size, Types etc..)
– Know your users (#Simultaneous Users, Types of queries)
– Detailed knowledge of HW (Server vendor specific HW options)
– Good working knowledge of the storage technique in use (be it DB or File based storage)
– Appropriate Storage Solution Selection (DAS / SAN / NAS)
Having a scalable Big Data Storage solution that allows you to leverage data insights efficiently is fundamental since having a lot of data which is slow to retrieve diminishes its value.
Avinton have designed and delivered various data solutions including both RDBMS (PostgreSQL & ORACLE) and hybrid RDBMS & file based solutions on HDFS (Hadoop).
We offer an End to End service from Design > Dimensioning > Implementation > Deployment > SLA based Support.
A common theme throughout this article is that Avinton’s solutions feature design considerations for improved IO performance both on the Software and Hardware level. This stems from our philosophy that to design high performance big data solutions one has to have a good understanding of the underlying hardware.
Our Research, Development and Testing work at our development and training centre in Yokohama is where we test new hardware configurations and combine them with well known big data solutions like our latest project with Spark on Hadoop. This allows us to bring our clients tailored solutions based on test result data.
Our Research, Development and Testing experience Enables us to:
We are passionate about data and welcome any enquiries in this regard.
Will my job be automated? Due to the significant amount of research and development done in Automation and AI over the past few years we will start to see a significant portion of jobs takenRead more
As an engineering graduate and IT Consultant myself I always had big dreams of my future in this growing industry. In this article I will walk through some observations which may be useful for thoseRead more
Data visualization can be defined as a visual / interactive exploration and graphical representation of data of any size nature and origin.