• English
    • 日本語 (Japanese)
Avinton JapanAvinton JapanAvinton JapanAvinton Japan
  • Services
    • Avinton Data Platform
    • Edge AI Camera
    • Private Cloud
    • AI Service Development
  • Blog
  • Avinton Academy
  • Careers
    • Jobs
  • About
    • Company Profile
    • Message from the CEO
    • Sustainability
  • Contact

Big Data Solutions – Storage

    Home mainpage Big Data Solutions – Storage
    Avinton - Storage Solutions

    Big Data Solutions – Storage

    By James Cauchi | mainpage, Tech Articles | Comments are Closed | 16 March, 2016 | 3

    Considerations

    Security

    Using hardened APIs and best practice security measures corresponding to the environment.

    Reliability

    The Data Store is required to respond consistently to Read and Write requests.

    Speed!

    The storage solution needs to accommodate IO operations at a speed that the application demands to maintain user experience and data concurrency

    Scalability

    An appropriate storage solution selection (ex: DAS / SAN / NAS) which can scale well into the future demands of the application and within the environment constraints.

    Backup

    A Robust backup solution design for minimal HW footprint with appropriate point in time recovery, system restore time, geo-redundancy and replication factor.

    Fault Tolerance

    Rugged fault tolerance built into both the hardware and software layer.

    How can we achieve this?

     

     – Storage Path Optimisation

     

    Keep the data as close as possible to where it’s being used

     

     – Accelerated Data Access

    • Multidimensional Caching
      • Hardware Level (ex. Storage Controller Cache / SAN Cache Pool)
      • Storage Fabric Level
      • System Service Level
      • Application Data Access Layer / API Level
    • In-Memory Datasets and Indexes
    • Adaptive Data Compression, De-duplication, Preallocation

     

    – Data Classification for performance and cost efficiency 

    • From simple data access frequency or age based to complex, pattern based or statistical predictive algorithms,
    • Or Data Type based classification for Object, Block and File Storage
    • Example:

    Hot Data ~ In Memory

    Extremely high frequency

    Semi-structured transactional data

    Preprocessing subsets

    Warm Data ~ Flash Disk (SSD)

    High Frequency

    Structured Subsets

    Cold Data ~ Fast Disk Array

    Low Frequency

    Structured Subsets

    ICY Data ~ Slow Disk Array / Remote

    Very low frequency

    Structured

    Compressed Subsets

    Frozen Data ~ Tape Library

    Archive

    It is necessary to identify differences in access patterns to the various pieces of data in order to ensure that the appropriate storage solution is chosen for each type.
    Data which is accessed or updated frequently can be classed as hot.
    Data which is accessed or updated occasionally can be classified as cold and warm being somewhere in between.

    These different classifications can allow us to further differentiate the replication factor and access speed required for the different data areas.

     

     – Reduce Disk IO needed for each data request

    In case of BLOB (binary large object) storage we can use large volumes (~100GB) with an in-memory index.
    The 100GB Volume can store a number of images say with their respective location in the volume known by the index which is held in the storage node’s memory for quick access.

     

     – Using SSDs

    At Avinton we design our solutions where we place the HOT data on SSD arrays and the cold data on the spinning disks. In cases where it is not immediately apparent which data is hot or cold we gather meta data on the files or tables in order to understand the number of reads, updates, index scans and so on which will then allow us to isolate the hot data.
    In some cases data classification (HOT / Warm / Cold) is relative to age so newer data will be HOT while the older data is expired onto the Cold storage area.

     

     – In-Memory Index

    In scenarios where the data volumes are really large we also use in-memory indexes – typically in the form of key-value pairs. With the recent improvements in the reliability of in-memory key-value pair solutions with persistence we are able to achieve significant performance gains with minimal risk.

     

    Conclusions

     – Improved Application Performance

    Done right the application of such techniques will improve the data response speed significantly and is often part of the solution for long running queries. In some cases we are able to improve the storage performance while avoiding a costly hardware upgrade.

     

     – Data Schema Considerations

    In the case of any storage solution one cannot rely on these techniques alone. A good schema in the case of an RDBMS data warehouse is key for having a responsive solution. Other areas to look at are bottlenecks on the data input and output interfaces (be it SCSI / SAS / IP).

     

     – Why Avinton?

    Avinton are by no means pioneers in this area – similar techniques are used by Google, Facebook, Yahoo and many other big players. We have simply mastered these techniques having been using them throughout the years starting from our early Telecom monitoring solutions which are still in use today.

     

    Final Thoughts

    To design a good data storage solution the following are necessary:
    – Know your data (HOT vs Cold – Structure, Size, Types etc..)
    – Know your users (#Simultaneous Users, Types of queries)
    – Detailed knowledge of HW (Server vendor specific HW options)
    – Good working knowledge of the storage technique in use (be it DB or File based storage)
    – Appropriate Storage Solution Selection (DAS / SAN / NAS)

    Having a scalable Big Data Storage solution that allows you to leverage data insights efficiently is fundamental since having a lot of data which is slow to retrieve diminishes its value.

    Avinton have designed and delivered various data solutions including both RDBMS (PostgreSQL & ORACLE) and hybrid RDBMS & file based solutions on HDFS (Hadoop).
    We offer an End to End service from Design > Dimensioning > Implementation > Deployment > SLA based Support.

    A common theme throughout this article is that Avinton’s solutions feature design considerations for improved IO performance both on the Software and Hardware level. This stems from our philosophy that to design high performance big data solutions one has to have a good understanding of the underlying hardware.

    Our Research, Development and Testing work at our development and training centre in Yokohama is where we test new hardware configurations and combine them with well known big data solutions like our latest project with Spark on Hadoop. This allows us to bring our clients tailored solutions based on test result data.

    Our Research, Development and Testing experience Enables us to:

    • Deliver optimised HW / SW platform combinations
    • Reduce time to market
    • Heavily Tailor the solution to our client’s design requirements
    • Provide SLA based HW & Application support

    We are passionate about data and welcome any enquiries in this regard.

    Storage, Infrastructure

    Related Post

    • Aiと仕事

      Automation, Robotics, AI and Jobs?

      By James Cauchi | Comments are Closed

      Will my job be automated? Due to the significant amount of research and development done in Automation and AI over the past few years we will start to see a significant portion of jobs takenRead more

    • Avinton Machine Learning - Infrastructure Considerations

      Machine Learning / AI Storage and Infrastructure Considerations

      By James Cauchi | Comments are Closed

      Read more

    • Smart Manufacturing: How Modern Factories Use Machine Vision & Edge AI to Increase Efficiency

      Smart Manufacturing: How Modern Factories Use Machine Vision & Edge AI to Increase Efficiency

      By James Cauchi | Comments are Closed

      Read more

    • 5 Tips to Land a Job in IT Engineering

      5 Tips to Land a Job in IT Engineering

      By James Cauchi | Comments are Closed

      Read more

    採用情報

    採用情報

    Avinton SDGs

    SDGsへの貢献

    Search

    Tags

    5G AI AI AIエンジニア Big Data careers Commodity Hardware Construction Industry Consulting Corporate Innovation Data-Driven Company Data Science Digital Transformation Edge AI Edge Computing Imagine Analysis Infrastructure Interface IoT IT engineering Kubernetes Linux Machine Learning Machine Vision Manufacturing Industry PostgreSQL Precision Farming Predictive Maintenance Programmer Safety Management Smart Agriculture Smart Drones Smart Factory Smart Manufacturing Society 5.0 Sound Analysis Storage Study Guide Yield Forecasting インフラ エッジコンピューティング キャリア コンテナ技術 技術ブログ 資格
    © 2023 Avinton | All Rights Reserved | プライバシーポリシー
    • Services
      • Avinton Data Platform
      • Edge AI Camera
      • Private Cloud
      • AI Service Development
    • Blog
    • Avinton Academy
    • Careers
      • Jobs
    • About
      • Company Profile
      • Message from the CEO
      • Sustainability
    • Contact
    • English
      • 日本語 (Japanese)
    Avinton Japan