Database - Nosql

> (Data|State) Management and Processing > Database management system (DBMS)

1 - About

NoSQL is distributed database that prioritize availability above consistency. In an always-on world were downtime is unacceptable, a NoSQL application chooses to sacrifice consistency instead of availability - ie AP from CAP - Developers need then to enforce consistency in there code and this is extremely difficult.

NoSql vs Consistent system

  • Consistent Databases: “Everyone MUST see the same thing, either old or new, no matter how long it takes.”
  • NoSQL: “For large applications, we can’t afford to wait that long, and maybe it doesn’t matter anyway”

The NoSQL field has actually four different types of databases:

A Lots of startups are the customers of NoSQL. You have few enterprises customers because applications are traditional OLTP on structured data.

A clear trend towards re-introducing schemas, languages, transactions at full scale (for example Google’s Spanner system)

NoSQL:

Advertising

3 - Features

3.1 - Pro

  • scalable: “It's hard to scale out my RDBMS in a distributed environment“
  • Ability to horizontally scale “simple operation” (key lookups, read/write of 1 or few records) throughput over many servers
  • high-throughput reads and writes
  • Flexibility: “My data doesn’t conform to a rigid schema”
  • The ability to replicate and partition data over many servers (sharding)
  • Efficient use of distributed indexes and RAM for data storage
  • Efficient use of distributed indexes and RAM for data storage

3.2 - Cons

  • No acid means no mission critical data
  • No High level Query language - A simple API – no query language
  • NoSQL means No Standards
  • A weaker concurrency model BASE (Basically Available, Soft state, Eventually consistent) than ACID
I guess what I'm saying is that my decision to use NoSQL, and I'm guessing others' decisions to do so, has less to do with the fact that we can't squeeze a few thousand writes a second out of MySQL and more to do with management and cost overhead. NoSQL solutions allow us to serve absurd amounts of data for a really, really low price.
Advertising

4 - Major Impact Systems (Rick Cattel)

  • memcached demonstrated that in memory indexes can be highly scalable, distributing and replicating objects over multiple nodes.
  • Dynamo pioneered the idea of [using] eventual consistency as a way to achieve higher availability and scalability: data fetched are not guaranteed to be up-to-date, but updates are guaranteed to be propagated to all nodes eventually.
  • BigTable demonstrated that persistent record storage could be scaled to thousands of nodes, a feat that most of the other systems aspire to.

5 - List

6 - Note

NoSQL environment should be used as a raw text document storage area, and an XML file storage area, where these objects are then properly keyed and distributed.

7 - Terminology

  • Document = nested values, extensible records (think XML or JSON)
  • Extensible record = families of attributes have a schema, but new attributes may be added
  • Key - Value object = a set of key - value pairs. No schema, no exposed nesting
Advertising

8 - Documentation / Reference

data/database/nosql.txt · Last modified: 2019/10/30 17:45 by gerardnico