Is Deploying MongoDB that different from deploying an RDBMS?
After completed the development of your new
MongoDB-based application, and are now preparing to deploy into production.
There are some key questions you should be discussing with your System
Engineering team:
- What are the deployment best practices?
- What are the key metrics that need to be monitored to ensure the application is meeting its required service levels?
- How will you know when it’s time to add shards? [May no be applicable for all applications]
- What tools do you have to backup and restore the database? [May no be applicable for all applications]
- And what about securing access to all that new real-time big data? [May no be applicable for all applications]
Lets cover few of the topics, Hardware
selection, Scaling, High Availability and Monitoring.
System performance and capacity planning are two
important topics that should be addressed in any deployment, whether it’s an
RDBMS or a NoSQL database. Part of your planning should involve establishing
baselines on data volume, system load, performance (throughput and latency),
and capacity utilization. These baselines should reflect the workloads you
expect the database to perform in production, and they should be revisited
periodically as the number of users, application features, performance SLA, or
any other factors change.
Baselines will help you understand when the
system is operating as designed, and when issues begin to emerge that may
affect the quality of the user experience or other factors critical to the
system.
The following section discusses key deployment
considerations, including hardware, scaling and HA, and discusses what you need
to monitor to maintain optimum system performance.
When
prioritizing hardware budget for MongoDB deployments
- RAM should be at or near the top of the list.
- Ensuring you have defined appropriate index coverage for your queries during the schema design phase of the project will minimize the risk of this happening.
- Dev Ops teams can track the number of pages accessed by the instance over a given period, and the elapsed time from the oldest to newest document in the working set. By tracking these metrics, it is possible to detect when the working set is approaching current RAM limits and proactively take action to ensure the system is scaled.
Brief: MongoDB makes extensive use of RAM
for low latency database operations. In MongoDB, all data is read and
manipulated through memory-mapped files. Reading data from memory is measured
in nanoseconds and reading data from disk is measured in milliseconds; and so
reading from memory is approximately 100,000 times faster than reading from
disk.
The set of data and indexes that are accessed
most frequently during normal operations is called the working set, which
ideally should fit in RAM. It may be the case that the working set represents a
fraction of the entire database, such as applications where data related to
recent events or popular products is accessed most commonly.
Page faults occur when MongoDB attempts to
access data that has not been loaded in RAM. If there is free memory then the
operating system will locate the page on disk and load it into memory directly.
However, if there is no free memory the operating system must write a page that
is in memory to disk and then read the requested page into memory. This process
will be slower than accessing data that is already in memory.
Some operations may inadvertently purge a large
percentage of the working set from memory, which adversely affects performance.
For example, a query that scans all documents in the database, where the
database is larger than the RAM on the server, will cause documents to be read
into memory and the working set to be written out to disk.
Storage and Disk I/O
- MongoDB does not require shared storage. It can use local attached storage as well as solid state drives (SSDs).
- Most MongoDB deployments should use RAID-10. RAID-5 and RAID-6 do not provide sufficient performance. RAID-0 provides good write performance, but limited read performance and insufficient fault tolerance.
Brief: Most disk access patterns in MongoDB
do not have sequential properties, and as a result, customers may experience
substantial performance gains by using SSDs. Good results and strong price to
performance have been observed with SATA SSD and with PCI. Commodity SATA
spinning drives are comparable to higher cost spinning drives due to the
non-sequential access patterns of MongoDB: rather than spending more on
expensive spinning drives, that budget may be more effectively spent on more
RAM or SSDs.
While data files benefit from SSDs, MongoDB’s
journal files are good candidates for fast, conventional disks due to their
high sequential write profile.
While your MongoDB system should be designed so
that its working set fits in memory, disk I/O is still a key performance
consideration. MongoDB regularly flushes writes to disk and commits to the
journal, so under heavy write load, the underlying disk subsystem may become
overwhelmed. The iostat command can be used to show high disk utilization
and excessive queuing for writes.
CPU
Selection – Speed or Cores?
- MongoDB performance is typically not CPU-bound. As MongoDB rarely encounters workloads able to leverage large numbers of cores, it is preferable to have servers with faster clock speeds than numerous cores with slower clock speeds.
Brief: As with any system, measuring CPU
utilization is important. If high utilization is observed without other issues
such as disk saturation or pagefaults, there may be an unusual issue in the
system. For example, a MapReduce job with an infinite loop, or a query that
sorts and filters a large number of documents from working set without good
index coverage, might cause a spike in CPU without triggering issues in the
disk system or pagefaults. Tools for monitoring CPU utilization are discussed
below.
MongoDB provides horizontal scale-out for
databases using a technique called 'Sharding'. Sharding distributes data
across multiple physical partitions called shards. Sharding allows MongoDB
deployments to address the hardware limitations of a single server, such as
bottlenecks in RAM or disk I/O, without adding complexity to the application.
Scaling your Database – When and How?
- RAM Limitation: The size of the system’s active working set will soon exceed the capacity of the maximum amount of RAM in the system.
- Disk I/O Limitation: The system has a large amount of write activity, and the operating system cannot write data fast enough to meet demand; and/or I/O bandwidth limits how fast the writes can be flushed to disk.
- Storage Limitation: The data set approaches or exceeds the storage capacity of a single node in the system.
MongoDB Auto-Sharding, with Application Transparency
It is far easier to implement sharding before
the resources of the system become limited, so capacity planning and proactive
monitoring are important elements in successfully scaling the application
Users should consider deploying a sharded
MongoDB cluster in the following situations:
One of the goals of sharding is to uniformly
distribute data across multiple servers. If the utilization of server resources
is not approximately equal there may be an underlying issue that is problematic
for the deployment. For example, a poorly selected shard key can result in
uneven data distribution. In this case, most if not all of the queries will be
directed to the single mongod that is managing the data.
Furthermore, MongoDB may be attempting to
redistribute the documents to achieve a more ideal balance across the servers.
While redistribution will eventually result in a more desirable distribution of
documents, there is substantial work associated with rebalancing the data and
this activity itself may interfere with achieving the desired performance SLA.
By running db.currentOp() you
will be able to determine what work is currently being performed by the
cluster, including rebalancing of documents across the shards.
High Availability with MongoDB Replica Sets
MongoDB uses its native replication to maintain multiple copies of data across replica sets. Replica sets help prevent downtime by detecting failures (server, network, OS or database) and automatically initiating failover. It is recommended that all MongoDB deployments should be configured with replication.
MongoDB uses its native replication to maintain multiple copies of data across replica sets. Replica sets help prevent downtime by detecting failures (server, network, OS or database) and automatically initiating failover. It is recommended that all MongoDB deployments should be configured with replication.
Self-healing Recovery with
MongoDB Replica Sets
Operations that modify a database on the primary
are replicated to the secondaries with a log called the oplog. The oplog contains an ordered set of idempotent
operations that are replayed on the secondaries. The size of the oplog is
configurable and by default 5% of the available free disk space.
Replication lag is
something to be monitored as part of normal operations. This is the amount of
time it takes a write operation on the primary to replicate to a secondary.
Some amount of delay is normal, but as replication lag grows, issues may arise.
Typical causes of replication lag include network latency or connectivity
issues, and disk latencies such as the throughput of the secondaries being
inferior to that of the primary.
Configuring MongoDB
Users should store
configuration options in mongod’s configuration file. This allows sysadmins to
implement consistent configurations across entire clusters. The configuration
files support all options provided as command line options for mongod.
Installations and upgrades should be automated through popular tools such as
Chef and Puppet, and the MongoDB community provides and maintains example
scripts for these tools.
A basic MongoDB
configuration file looks like the following:
·
fork = true
·
bind_ip = 127.0.0.1
·
port = 27017
·
quiet = true
·
dbpath = /srv/mongodb
·
logpath =
/var/log/mongodb/mongod.log
·
logappend = true
·
journal = true
The documentation will
enable you to learn more about MongoDB configuration options.
The latest suggestions on specific
configurations for operating systems, file systems, storage devices and other
system-related topics are maintained on the MongoDB
documentation Production Notes page.
Comments