Featured Post

Event Sourcing Video from Michael Ploed

Event Sourcing I want to share a great video I found few days ago that describes very well what Event Sourcing is.

Monday, February 5, 2018

MongoDB Shards and Replica Set

Data Model

Mongo DB uses a flexible schema: a document store. 

  • Document: A document is a BSON object that consists of an ordered list of element, each element has a field, type and value.
  • Collection:  a group of documents. Something equivalent to a RDBMS table.

Data References 

Two types of references:

  1. Links or references from one document to another
  2. Embedded Data


MongoDB  supports write atomicity at the document level. With a denormalized structure a single write  operation can allow to insert or update a document, therefore a normalized structure implies to split operation into multiple write operations (not atomic).


Sharding is a mechanism used by MongoDB to scale horizontally. Data are spread in the cluster based on the sharding key.
A client of a sharded cluster uses mongos in order to query the cluster. Client applications do not connect directly with the shards.
In a sharded cluster all mongod instance should be able to connect to all other instances in the cluster

In a sharded cluster consider the following elements:

  1. Shard Instance: that contains a subset of the sharded data.
  2. Mongos: is a query router that decouples client from the sharded instances
  3. Config Server: mongod instance that stores metadata and configuration (generally in a replica set configuration)

  • Mongos is able to direct a query to a specific shard if the query include the shard key otherwise a broadcast query is sent to every shard;
  • Shard is a decision to make at collection level; it's possibile to have a mix of sharded and unsharded collections: sharded collections are spread in shards, while unsharded collections are stored in the primary shard (generally the mongos having less data)
  • Collections are divided into chunk based on the shard key across shards in the cluster (default chunk size is 64 MB)

 Replica Set

MongoDB supports redundancy and availability via Replica Set: a group of mongod instances with the same data set.

In a group of mondod instances there are two types: primary node, and secondary node.

  • A primary node  receives all write operations and records changes into its dataset in the oplog.
  • A secondary node uses the primary's oplog and apply the operations to their data sets. If the primary node becomes unavailable a leader/primary election runs.
  • An extra mongod instance (arbiter) is used to maintain a quorum in a replica set for the election requests
Read and write operations are managed by the primary node, but it's possibile to specify read preferences in order to query directly a secondary node giving up consistency.

No comments :

Post a Comment