MongoDB

a sceptic’s view

Is it really as great as they say it is?
Distributed system, CAP theorem, NoSQL (Mongo) promise and delivery.

Tomasz Borek, JAP head and leader since 2016, Mongo certified in 2012

About this talk and myself

New Mongo releases sparked another wave of interests and inquiries. Among my customers, several decided to use Mongo DB for their applications. Some asked my advice. Hence, the research which led to this talk.

TL;DR

Mongo is nice and has nice documentation. Small projects will like it. Larger projects or projects which skimped on research may find themselves scrambling for PostgreSQL later on.

Mongo’s marketing is VERY persuasive and powerful. Do your own leg work or you and your project may suffer. Employ Jepsen and study it’s analyses.

Is Mongo popular?

Mongo is VERY popular. Widely popular. M in MEAN is for Mongo. Among SpringData first NoSQL DBs was Mongo (if not the very first!). It’s incredibly easy to set up. Mongo marketing needs the popularity and drives it.

MEAN

MEAN: Mongo Express Angular Node

The real reason to learn MEAN stack: Employability

— Free Code Camp Blog post from 2014

Languages

While there are drivers to do queries in multiple languages, Mongo (written in C++, JS and Python) is JS-tied (or perhaps front-end / web tied)

Mongo shell is written in JS and uses JS
Mongo uses JSON / BSON
MEAN

Hype

NoSQL == hype
Distributed == Microservices == Hype
Mongo User Group talks
Blog posts (some from Mongo employees)
Hackathons
MEAN (web-tied)
Easy to install, dockerized… need I say more?

Surprise! Did you know, that Mongo…

had global lock crippling intense multithreaded workflows (still has, but…)
had "safe saves" feature hidden and turned off by default
loses data sometimes
1. even lost data during READ!
doesn’t pass Jepsen tests

doesn’t support relations (almost) at all
doesn’t offer full ACID transactions
has non-isolated transactions and plagued by anomalies
has unsafe defaults compromising everything for marketability
had a security disaster with 30k DBs being taken over / ransomed

requires for the working set to fit in RAM (or crawls)
may roll-back your data unless you take great pains not to
actually discourages arbiter cause sharding and 'reasons'
has transaction defaults that override the settings on a collection or DB?

I could go on. The list of surprises isn’t short, when it comes to Mongo.

Do you really need it?

do you even need that kind of scale?
what have you tried with RDBMSes so far?

do you need active-active?
do you need distributed transactions?

why are you after Mongo?

About me

In the Internet

What I do

audits: code, infra, components, systems
tests and audits of performance or security
diving into DBs, GNU/Linuxes, net or security
programmer for hire
talks, workshops, consulting, trainings

Junior Java Academy Online

November - February
9h of lectures a week, Mon-Wed-Fri 8-11
then project, Academy or just a lot of knowledge and some experience - depending on your results
mainly Kraków or Gdańsk

https://epa.ms/PreAcademy - apply now

https://epa.ms/subPreJAP - subscribe to know more later (set location if you want to)

Questions?

What was promised?

And what was offered? To understand, we need to delve a bit into what is what.

NoSQL and CAP and Mongo
Distributed systems, scaling, transactions and Mongo

Databases are hard

Relational DBs had decades to get to the point they have reached. They postulated 3rd normal form, holding data in the DB, relationships (so, math: algebra) and indexes (and many more but I’ll stop here). And they continued for years with two sentences: you don’t want active-active clusters" and "you don’t want distributed transactions". Along came NoSQL…

What does NoSQL mean?

Take your pick:

no SQL, none, nada, zilch
not only SQL, cause we have polyglot persistence
no idea actually
new SQL
no, SQL

Figure 1. pic from Mark Maddsen’s talk at StrataCon 2013

The original CAP theorem

once upon a time the systems were divided

you cannot ignore partition tolerance

— Wise Man

and so, the schism began: you were either CP (Mongo) or AP… or you were Even Worse™ - CA!

Database Systems according to the CAP Theorem

Figure 2. Credits: Security and Privacy Implications on Database Systems in Big Data Era: A Survey

The NoSQL promise

no relational DBs (or less, or we can work with’em)
distributed, active-active
scalable (in, out, up, sideways, you name it)
big data
easy

Whatever they wrote on the tin. So… anything. Everything.

Mongo’s promise?

Mongo is CP, so consistency - yet watch out for how they define it (also, wait a few slides)
Mongo is for big data, cause hu-MONGO-us - but it’s also RAM-limited and sharding is only now being worked out
Mongo is good for financial things - until 3 busts with bitcoin startups
"we live in post-transactional era" - no transactions - oh wait, we don’t. So local, not for sharding. Oh wait, FOR sharding BUT with limits…

When Mongo completes a write?

HackingDistributed EminGunSirer MongoCompletesWriteWhen

Figure 3. Credits: HackingDistributed, a blog of Emin Gün Sirer

Replication and sharding

replication relies on having a backup copy, assuring you don’t lose data
1. Mongo may lose data in replication, due to 'roll-backs', crazy defaults, election problems
sharding is a technique of horizontal partitioning, good to scale
1. if you use sharding, some things won’t work in Mongo or will be limited

Distributed ain’t easy!

Distributed systems are hard

A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.

— Leslie Lamport

Network Fallacies

Figure 4. Denise Yu illustrated the 8 Fallacies from Peter L. Deutsch and others

Active passive cluster

you have a cluster? you have a distributed system
add network problems
add clustering problems
add your own system problems (your system tries to do things, right?)

Transactions in SQL

ACID

MVCC

Isolation levels

Active passive cluster with transactions

cluster == distributed system == network+clustering+your system problems
you also have transactions? try distributed MVCC or distributed ATOMICity, with non-zero latency, or a changed route
add rollback, distributed one, you partially applied the things, now you roll them back, from entire cluster, with replication, oh, and your network has just partitioned

You don’t want active - active

active-passive problems
add multiple concurrent/simultaneous writes - everybody accepts writes now
do you want your fries, ekhem, transactions with that?
CAP
Consistency, Availability, Partition Tolerance, you can’t sacrifice the latter

Jepsen

A fantastic piece of engineering, a tool to check if a distributed system actually handles itself well when partition happens, under load, etc. Shout out to Kyle for his incredible tool and head over to jepsen.io please.

http://jepsen.io/analyses/mongodb-4.2.6

What does it do?

Tests systems' behaviour, especially when a network partition occurs.

Tests partition tolerance.

And tests what happens when a partition HEALS.
That’s where the devil is, right there, in the details.

Mongo clusters

active passive(s) arbiter (replication)
active passives (replication)
sharded replica sets, one replica set for a shard (I’m skipping config here for simplicity) - that’s for big data / scaling

Again: when Mongo completes a write?

Figure 5. Credits: HackingDistributed, a blog of Emin Gün Sirer

Now what if…

a partition happened
and the primary went down
and it healed in the middle of an election
or right as it finished with a new primary
and partition healed, showing the other part of the network also elected a primary?

Questions?

Summarizing

MongoDB is nice and their marketing promises the sky and beyond. It is VERY popular.

Many have failed, but this scarcely left a dent on Mongo sites or in it’s materials.

Be VERY careful!

Overpromised

Distributed systems are VERY HARD, transactions are hard
RDBMS makers didn’t want to do active-active clusters or distributed transactions or massive scaling, which NoSQL promised
NoSQL is not very well defined, some definitions definitely have overpromised
CAP theorem has been revised since it’s inception - original division being too strict (and no latency?!)

Mongo? WARNING! WARNING! WARNING!

Mongo promises to be good for all use cases
Read the small print in their docs (go deeper)
Read and understand Jepsen analyses
Do your own tests - or pay Jepsen to
Consider unusable scenarios (transactions? low RAM?)

NoSQL? Careful!

NoSQL is now "Not Only SQL" - consider scenarios for chosen technology
CAP theorem was revised - are you big data? When are you CA or AP or CP?
Replication is difficult in active-active, do you need it?
Sharding is even more so
Distributed transactions are hard and dangerous

Silver lining?

We do move forward, and the NoSQL DB’s now are much better than their first generation. But do read the fine print and do use them per their scenarios. Also, consider LOTSA tests for your use case, using things outlined here as an idea generator. :)

Mongo on it’s own is maturing, undeniable. Version 5.0 is much better than 2.0.

Also, each time Jepsen tests Mongo, their docs get better and they correct something.

Thank you!

Thank you, TJB out. I’ll gladly take badges if you liked this… or emails / chats if you did not. :-)

MongoDB

a sceptic’s view

About this talk and myself

TL;DR

Is Mongo popular?

MEAN

Languages

Hype

Surprise! Did you know, that Mongo…​

Do you really need it?

About me

In the Internet

What I do

Junior Java Academy Online

Questions?

What was promised?

Databases are hard

What does NoSQL mean?

The original CAP theorem

The NoSQL promise

Mongo’s promise?

When Mongo completes a write?

Replication and sharding

Distributed systems are hard

Network Fallacies

Active passive cluster

Transactions in SQL

Active passive cluster with transactions

You don’t want active - active

Jepsen

What does it do?

Mongo clusters

Again: when Mongo completes a write?

Now what if…​

Questions?

Summarizing

Overpromised

Mongo? WARNING! WARNING! WARNING!

NoSQL? Careful!

Silver lining?

Thank you!

Surprise! Did you know, that Mongo…

Now what if…