MongoDB uses lazy evaluation with
cursors. That means in many cases when you start a MongoDB query which returns
a cursor, the query doesn't get executed yet.
The actual selection happens when you
start requesting data from the cursor.
The main reason is that this allows you to call methods like
sort(by)
, limit(n)
or skip(n)
on the cursor which
can often be processed much more efficiently on the database before selecting
any data.
So
what you measure with the "fetch time" is actually also part of the
selection.
When you want to force the query
to execute without fetching any data yet, you could call
explain()
on
the cursor. The database can't measure the execution time without actually
performing the query. However, in actual real-world use, I would recommend you
to not do thisand use
cursors the way they were intended.
MongoDB Indexing Algorithm: BTree,
Overview[edit]
A B-tree (Bayer & McCreight 1972) of order 5 (Knuth 1998).
In
B-trees, internal (non-leaf) nodes can
have a variable number of child nodes within some pre-defined range. When data
is inserted or removed from a node, its number of child nodes changes. In order
to maintain the pre-defined range, internal nodes may be joined or split.
Because a range of child nodes is permitted, B-trees do not need re-balancing
as frequently as other self-balancing search trees, but may waste some space,
since nodes are not entirely full. The lower and upper bounds on the number of
child nodes are typically fixed for a particular implementation. For example,
in a 2-3 B-tree (often simply referred to as
a 2-3 tree), each internal node may have only 2 or 3 child nodes.
Each
internal node of a B-tree contains a number of keys. The keys act as separation values which
divide its subtrees. For example, if an internal node has
3 child nodes (or subtrees) then it must have 2 keys: a1 and a2. All
values in the leftmost subtree will be less than a1, all
values in the middle subtree will be between a1 and a2, and all
values in the rightmost subtree will be greater than a2.
Usually,
the number of keys is chosen to vary between {\displaystyle d} and {\displaystyle 2d},
where {\displaystyle d} is
the minimum number of keys, and {\displaystyle
d+1} is
the minimum degree or branching factor of the tree. In
practice, the keys take up the most space in a node. The factor of 2 will
guarantee that nodes can be split or combined. If an internal node has {\displaystyle 2d} keys,
then adding a key to that node can be accomplished by splitting the
hypothetical {\displaystyle
2d+1} key
node into two {\displaystyle
d} key
nodes and moving the key that would have been in the middle to the parent node.
Each split node has the required minimum number of keys. Similarly, if an
internal node and its neighbor each have {\displaystyle d} keys,
then a key may be deleted from the internal node by combining it with its
neighbor. Deleting the key would make the internal node have {\displaystyle d-1} keys;
joining the neighbor would add {\displaystyle
d} keys
plus one more key brought down from the neighbor's parent. The result is an
entirely full node of {\displaystyle
2d} keys.
The
number of branches (or child nodes) from a node will be one more than the
number of keys stored in the node. In a 2-3 B-tree, the internal nodes will
store either one key (with two child nodes) or two keys (with three child
nodes). A B-tree is sometimes described with the parameters {\displaystyle (d+1)} — {\displaystyle (2d+1)} or
simply with the highest branching order, {\displaystyle (2d+1)}.
A
B-tree is kept balanced by requiring that all leaf nodes be at the same depth.
This depth will increase slowly as elements are added to the tree, but an
increase in the overall depth is infrequent, and results in all leaf nodes
being one more node farther away from the root.
B-trees
have substantial advantages over alternative implementations when the time to
access the data of a node greatly exceeds the time spent processing that data,
because then the cost of accessing the node may be amortized over multiple
operations within the node. This usually occurs when the node data are in secondary storage such as disk drives. By maximizing the number of keys
within each internal node, the
height of the tree decreases and the number of expensive node accesses is
reduced. In addition, rebalancing of the tree occurs less often. The maximum
number of child nodes depends on the information that must be stored for each
child node and the size of a full disk block or
an analogous size in secondary storage. While 2-3 B-trees are easier to
explain, practical B-trees using secondary storage need a large number of child
nodes to improve performance.
No comments:
Post a Comment