Is Deploying MongoDB that different from deploying an RDBMS?


After completed the development of your new MongoDB-based application, and are now preparing to deploy into production. There are some key questions you should be discussing with your System Engineering team:
  • What are the deployment best practices?
  • What are the key metrics that need to be monitored to ensure the application is meeting its required service levels?
  • How will you know when it’s time to add shards? [May no be applicable for all applications]
  • What tools do you have to backup and restore the database? [May no be applicable for all applications]
  • And what about securing access to all that new real-time big data? [May no be applicable for all applications]


Lets cover few of the topics, Hardware selection, Scaling, High Availability and Monitoring. 
System performance and capacity planning are two important topics that should be addressed in any deployment, whether it’s an RDBMS or a NoSQL database. Part of your planning should involve establishing baselines on data volume, system load, performance (throughput and latency), and capacity utilization. These baselines should reflect the workloads you expect the database to perform in production, and they should be revisited periodically as the number of users, application features, performance SLA, or any other factors change.
Baselines will help you understand when the system is operating as designed, and when issues begin to emerge that may affect the quality of the user experience or other factors critical to the system.
The following section discusses key deployment considerations, including hardware, scaling and HA, and discusses what you need to monitor to maintain optimum system performance.

When prioritizing hardware budget for MongoDB deployments
  • RAM should be at or near the top of the list.
  • Ensuring you have defined appropriate index coverage for your queries during the schema design phase of the project will minimize the risk of this happening.
  • Dev Ops teams can track the number of pages accessed by the instance over a given period, and the elapsed time from the oldest to newest document in the working set. By tracking these metrics, it is possible to detect when the working set is approaching current RAM limits and proactively take action to ensure the system is scaled.


Brief: MongoDB makes extensive use of RAM for low latency database operations. In MongoDB, all data is read and manipulated through memory-mapped files. Reading data from memory is measured in nanoseconds and reading data from disk is measured in milliseconds; and so reading from memory is approximately 100,000 times faster than reading from disk.
The set of data and indexes that are accessed most frequently during normal operations is called the working set, which ideally should fit in RAM. It may be the case that the working set represents a fraction of the entire database, such as applications where data related to recent events or popular products is accessed most commonly.
Page faults occur when MongoDB attempts to access data that has not been loaded in RAM. If there is free memory then the operating system will locate the page on disk and load it into memory directly. However, if there is no free memory the operating system must write a page that is in memory to disk and then read the requested page into memory. This process will be slower than accessing data that is already in memory.
Some operations may inadvertently purge a large percentage of the working set from memory, which adversely affects performance. For example, a query that scans all documents in the database, where the database is larger than the RAM on the server, will cause documents to be read into memory and the working set to be written out to disk. 

Storage and Disk I/O
  • MongoDB does not require shared storage. It can use local attached storage as well as solid state drives (SSDs).
  • Most MongoDB deployments should use RAID-10. RAID-5 and RAID-6 do not provide sufficient performance. RAID-0 provides good write performance, but limited read performance and insufficient fault tolerance.


Brief: Most disk access patterns in MongoDB do not have sequential properties, and as a result, customers may experience substantial performance gains by using SSDs. Good results and strong price to performance have been observed with SATA SSD and with PCI. Commodity SATA spinning drives are comparable to higher cost spinning drives due to the non-sequential access patterns of MongoDB: rather than spending more on expensive spinning drives, that budget may be more effectively spent on more RAM or SSDs.
While data files benefit from SSDs, MongoDB’s journal files are good candidates for fast, conventional disks due to their high sequential write profile.
While your MongoDB system should be designed so that its working set fits in memory, disk I/O is still a key performance consideration. MongoDB regularly flushes writes to disk and commits to the journal, so under heavy write load, the underlying disk subsystem may become overwhelmed. The iostat command can be used to show high disk utilization and excessive queuing for writes.

CPU Selection – Speed or Cores?
  • MongoDB performance is typically not CPU-bound. As MongoDB rarely encounters workloads able to leverage large numbers of cores, it is preferable to have servers with faster clock speeds than numerous cores with slower clock speeds.


Brief: As with any system, measuring CPU utilization is important. If high utilization is observed without other issues such as disk saturation or pagefaults, there may be an unusual issue in the system. For example, a MapReduce job with an infinite loop, or a query that sorts and filters a large number of documents from working set without good index coverage, might cause a spike in CPU without triggering issues in the disk system or pagefaults. Tools for monitoring CPU utilization are discussed below.
MongoDB provides horizontal scale-out for databases using a technique called 'Sharding'. Sharding distributes data across multiple physical partitions called shards. Sharding allows MongoDB deployments to address the hardware limitations of a single server, such as bottlenecks in RAM or disk I/O, without adding complexity to the application.

Scaling your Database – When and How?
  • RAM Limitation: The size of the system’s active working set will soon exceed the capacity of the maximum amount of RAM in the system.
  • Disk I/O Limitation: The system has a large amount of write activity, and the operating system cannot write data fast enough to meet demand; and/or I/O bandwidth limits how fast the writes can be flushed to disk.
  • Storage Limitation: The data set approaches or exceeds the storage capacity of a single node in the system.


https://res.infoq.com/articles/mongodb-deployment-monitoring/en/resources/1fig1.jpg
MongoDB Auto-Sharding, with Application Transparency
It is far easier to implement sharding before the resources of the system become limited, so capacity planning and proactive monitoring are important elements in successfully scaling the application
Users should consider deploying a sharded MongoDB cluster in the following situations:
One of the goals of sharding is to uniformly distribute data across multiple servers. If the utilization of server resources is not approximately equal there may be an underlying issue that is problematic for the deployment. For example, a poorly selected shard key can result in uneven data distribution. In this case, most if not all of the queries will be directed to the single mongod that is managing the data.
Furthermore, MongoDB may be attempting to redistribute the documents to achieve a more ideal balance across the servers. While redistribution will eventually result in a more desirable distribution of documents, there is substantial work associated with rebalancing the data and this activity itself may interfere with achieving the desired performance SLA.
By running db.currentOp() you will be able to determine what work is currently being performed by the cluster, including rebalancing of documents across the shards.

High Availability with MongoDB Replica Sets
MongoDB uses its native replication to maintain multiple copies of data across replica sets. Replica sets help prevent downtime by detecting failures (server, network, OS or database) and automatically initiating failover. It is recommended that all MongoDB deployments should be configured with replication.
https://res.infoq.com/articles/mongodb-deployment-monitoring/en/resources/fig2.jpg
Self-healing Recovery with MongoDB Replica Sets
Operations that modify a database on the primary are replicated to the secondaries with a log called the oplog. The oplog contains an ordered set of idempotent operations that are replayed on the secondaries. The size of the oplog is configurable and by default 5% of the available free disk space.
Replication lag is something to be monitored as part of normal operations. This is the amount of time it takes a write operation on the primary to replicate to a secondary. Some amount of delay is normal, but as replication lag grows, issues may arise. Typical causes of replication lag include network latency or connectivity issues, and disk latencies such as the throughput of the secondaries being inferior to that of the primary.
Configuring MongoDB
Users should store configuration options in mongod’s configuration file. This allows sysadmins to implement consistent configurations across entire clusters. The configuration files support all options provided as command line options for mongod. Installations and upgrades should be automated through popular tools such as Chef and Puppet, and the MongoDB community provides and maintains example scripts for these tools.
A basic MongoDB configuration file looks like the following:
·         fork = true
·         bind_ip = 127.0.0.1
·         port = 27017
·         quiet = true
·         dbpath = /srv/mongodb
·         logpath = /var/log/mongodb/mongod.log
·         logappend = true
·         journal = true

The documentation will enable you to learn more about MongoDB configuration options.
The latest suggestions on specific configurations for operating systems, file systems, storage devices and other system-related topics are maintained on the MongoDB documentation Production Notes page.


Comments

Popular posts from this blog

Interview Questions to Ask the Employer

Place .NET DLL in GAC

Windows Communication Foundation - FAQ