Tuesday, January 5, 2021

NoSQL Opensource Database: Architecture, Tools, Algorithm, Design, Interchange and Distributed architecture

Created on 2020-08-21 11:15

Published on 2020-08-22 14:05

Cloud Platforms (AWS/Azure/Google/Amazon) offer a range of non-hierarchical, non-RDBMS databases services (not necessarily No-SQL): Wide-Column, Document, Time Series, Graph, and Analytics. Powered by opensource, and launched on cloud platforms, these have seen a growth of 52% in 2019, with projected revenues exceeding 25% by 2022.

The next-gen cloud-native application architectures and the hunger for data variety for Analytics/AI will drive adoption of these data management services rapidly if one understands the underlying algorithms, architecture and design principles. Here is a compendium of the same:

Data Interchanges - Added 7th Sept 2020

Traditional and bulky interchange - Services/WS/RPC/streams/messages/events/Actor-based for data pipelines: JSON, XML, CSV. May not be a best choice for data intensive apps working over n/w. Fast-forward and consider these options:

Protobuf: If you are thinking RPC as interchange mechanisms and potentially gRPC
Thrift: Social streaming with facebook
Apache Avro: Opensource for Hadoop and large data, order agnostice (writer vs reader schema)

The challenges in all the interchange formats is to take care of implementing the forward and backward incompatibilities.

Architecture: Replication, Partitions, Transaction - Added 9th Sept 2020

Distributed data bases will be increasingly common as we head to cloud. Its important to understand and account for the challenges while devising applications/algorithms:

Replication challenges and solution:

2. Partitioning/Sharding: Horizontal scaling enabled by Key based, hash-based, primary/secondary index based, etc. The challenges are primary around need for rebalancing and routing, they are solved well by many large databases efficiently.

3. Transaction: Applications have to account for inconsistent implementation of Atomicity, Consistency, Isolation and Durability properties. This leads to Dirty reads, Dirty writes, lost updates, read/write skews (timing anomaly). The hard problem are solved by offering solutions that applications need to implement: Snapshot isolation with "BEGIN TRANSACTION", materializing conflicts with "SELECT FOR UPDATE", and Serialized isolation methods: poor performing Serial execution (REDIS), poor scaling 2-Phase locking (MYSQL using predicate, range-index, shared & exclusive locks), the latest Serializable snapshot isolation (PostGres) where conflicting writes are isolated.

About the Site

The pearls of wisdom are formed as we journey through our careers. In this journey we encounter some unique experiences that leave an indelible mark on our minds. These experiences often evoke passions and emotions in us. My website is nothing more then deep insight that I had formed about certain indelible experiences gathered through my career journey. The best value is when it is shared openly so that it is challenged/criticized freely by others with often similar but different experiences.

The Insightful Journey

Tuesday, January 5, 2021

NoSQL Opensource Database: Architecture, Tools, Algorithm, Design, Interchange and Distributed architecture

Favourite

Search This Blog

Topics

Interesting sites

Theory - Always a Student

Interesting blogs

Visitors

About the Site

About Me

Archive

Followers