collection
– Collection level operations¶
Collection level utilities for Mongo.
-
pymongo.
ASCENDING
= 1¶ Ascending sort order.
-
pymongo.
DESCENDING
= -1¶ Descending sort order.
-
pymongo.
GEO2D
= '2d'¶ Index specifier for a 2-dimensional geospatial index.
New in version 1.5.1.
Note
Geo-spatial indexing requires server version >= 1.3.3.
-
pymongo.
GEOHAYSTACK
= 'geoHaystack'¶ Index specifier for a 2-dimensional haystack index.
New in version 2.1.
Note
Geo-spatial indexing requires server version >= 1.5.6.
-
pymongo.
GEOSPHERE
= '2dsphere'¶ Index specifier for a spherical geospatial index.
New in version 2.5.
Note
2dsphere indexing requires server version >= 2.4.0.
-
pymongo.
HASHED
= 'hashed'¶ Index specifier for a hashed index.
New in version 2.5.
Note
hashed indexing requires server version >= 2.4.0.
-
pymongo.
TEXT
= 'text'¶ Index specifier for a text index.
New in version 2.7.1.
Note
text search requires server version >= 2.4.0.
-
class
pymongo.collection.
Collection
(database, name[, create=False[, **kwargs]]])¶ Get / create a Mongo collection.
Raises
TypeError
if name is not an instance ofbasestring
(str
in python 3). RaisesInvalidName
if name is not a valid collection name. Any additional keyword arguments will be used as options passed to the create command. Seecreate_collection()
for valid options.If create is
True
or additional keyword arguments are present a create command will be sent. Otherwise, a create command will not be sent and the collection will be created implicitly on first use.Parameters: - database: the database to get a collection from
- name: the name of the collection to get
- create (optional): if
True
, force collection creation even without options being set - **kwargs (optional): additional keyword arguments will be passed as options for the create collection command
Changed in version 2.2: Removed deprecated argument: options
New in version 2.1: uuid_subtype attribute
Changed in version 1.5: deprecating options in favor of kwargs
New in version 1.5: the create parameter
-
c[name] || c.name
Get the name sub-collection of
Collection
c.Raises
InvalidName
if an invalid collection name is used.
-
full_name
¶ The full name of this
Collection
.The full name is of the form database_name.collection_name.
Changed in version 1.3:
full_name
is now a property rather than a method.
-
name
¶ The name of this
Collection
.Changed in version 1.3:
name
is now a property rather than a method.
-
database
¶ The
Database
that thisCollection
is a part of.Changed in version 1.3:
database
is now a property rather than a method.
-
read_preference
¶ The read preference mode for this instance.
See
ReadPreference
for available options.New in version 2.1.
-
tag_sets
¶ Set
tag_sets
to a list of dictionaries like [{‘dc’: ‘ny’}] to read only from members whosedc
tag has the value"ny"
. To specify a priority-order for tag sets, provide a list of tag sets:[{'dc': 'ny'}, {'dc': 'la'}, {}]
. A final, empty tag set,{}
, means “read from any member that matches the mode, ignoring tags.” ReplicaSetConnection tries each set of tags in turn until it finds a set of tags with at least one matching member.See also
New in version 2.3.
-
secondary_acceptable_latency_ms
¶ Any replica-set member whose ping time is within secondary_acceptable_latency_ms of the nearest member may accept reads. Defaults to 15 milliseconds.
See
ReadPreference
.New in version 2.3.
Note
secondary_acceptable_latency_ms
is ignored when talking to a replica set through a mongos. The equivalent is the localThreshold command line option.
-
write_concern
¶ The default write concern for this instance.
Supports dict style access for getting/setting write concern options. Valid options include:
- w: (integer or string) If this is a replica set, write operations will block until they have been replicated to the specified number or tagged set of servers. w=<int> always includes the replica set primary (e.g. w=3 means write to the primary and wait until replicated to two secondaries). Setting w=0 disables write acknowledgement and all other write concern options.
- wtimeout: (integer) Used in conjunction with w. Specify a value in milliseconds to control how long to wait for write propagation to complete. If replication does not complete in the given timeframe, a timeout exception is raised.
- j: If
True
block until write operations have been committed to the journal. Cannot be used in combination with fsync. Prior to MongoDB 2.6 this option was ignored if the server was running without journaling. Starting with MongoDB 2.6 write operations will fail with an exception if this option is used when the server is running without journaling. - fsync: If
True
and the server is running without journaling, blocks until the server has synced all data files to disk. If the server is running with journaling, this acts the same as the j option, blocking until write operations have been committed to the journal. Cannot be used in combination with j.
>>> m = pymongo.MongoClient() >>> m.write_concern {} >>> m.write_concern = {'w': 2, 'wtimeout': 1000} >>> m.write_concern {'wtimeout': 1000, 'w': 2} >>> m.write_concern['j'] = True >>> m.write_concern {'wtimeout': 1000, 'j': True, 'w': 2} >>> m.write_concern = {'j': True} >>> m.write_concern {'j': True} >>> # Disable write acknowledgement and write concern ... >>> m.write_concern['w'] = 0
Note
Accessing
write_concern
returns its value (a subclass ofdict
), not a copy.Warning
If you are using
Connection
orReplicaSetConnection
make sure you explicitly setw
to 1 (or a greater value) orsafe
toTrue
. Unlike callingset_lasterror_options()
, setting an option inwrite_concern
does not implicitly setsafe
toTrue
.
-
uuid_subtype
¶ This attribute specifies which BSON Binary subtype is used when storing UUIDs. Historically UUIDs have been stored as BSON Binary subtype 3. This attribute is used to switch to the newer BSON Binary subtype 4. It can also be used to force legacy byte order and subtype compatibility with the Java and C# drivers. See the
bson.binary
module for all options.
-
insert
(doc_or_docs[, manipulate=True[, safe=None[, check_keys=True[, continue_on_error=False[, **kwargs]]]]])¶ Insert a document(s) into this collection.
If manipulate is
True
, the document(s) are manipulated using anySONManipulator
instances that have been added to thisDatabase
. In this case an"_id"
will be added if the document(s) does not already contain one and the"id"
(or list of"_id"
values for more than one document) will be returned. If manipulate isFalse
and the document(s) does not include an"_id"
one will be added by the server. The server does not return the"_id"
it created soNone
is returned.Write concern options can be passed as keyword arguments, overriding any global defaults. Valid options include w=<int/string>, wtimeout=<int>, j=<bool>, or fsync=<bool>. See the parameter list below for a detailed explanation of these options.
By default an acknowledgment is requested from the server that the insert was successful, raising
OperationFailure
if an error occurred. Passing w=0 disables write acknowledgement and all other write concern options.Parameters: - doc_or_docs: a document or list of documents to be inserted
- manipulate (optional): If
True
manipulate the documents before inserting. - safe (optional): DEPRECATED - Use w instead.
- check_keys (optional): If
True
check if keys start with ‘$’ or contain ‘.’, raisingInvalidName
in either case. - continue_on_error (optional): If
True
, the database will not stop processing a bulk insert if one fails (e.g. due to duplicate IDs). This makes bulk insert behave similarly to a series of single inserts, except lastError will be set if any insert fails, not just the last one. If multiple errors occur, only the most recent will be reported byerror()
. - w (optional): (integer or string) If this is a replica set, write operations will block until they have been replicated to the specified number or tagged set of servers. w=<int> always includes the replica set primary (e.g. w=3 means write to the primary and wait until replicated to two secondaries). Passing w=0 disables write acknowledgement and all other write concern options.
- wtimeout (optional): (integer) Used in conjunction with w. Specify a value in milliseconds to control how long to wait for write propagation to complete. If replication does not complete in the given timeframe, a timeout exception is raised.
- j (optional): If
True
block until write operations have been committed to the journal. Ignored if the server is running without journaling. - fsync (optional): If
True
force the database to fsync all files before returning. When used with j the server awaits the next group commit before returning.
Returns: - The
'_id'
value (or list of ‘_id’ values) of doc_or_docs or[None]
if manipulate isFalse
and the documents passed as doc_or_docs do not include an ‘_id’ field.
Note
continue_on_error requires server version >= 1.9.1
New in version 2.1: Support for continue_on_error.
New in version 1.8: Support for passing getLastError options as keyword arguments.
Changed in version 1.1: Bulk insert works with an iterable sequence of documents.
-
save
(to_save[, manipulate=True[, safe=None[, check_keys=True[, **kwargs]]]])¶ Save a document in this collection.
If to_save already has an
"_id"
then anupdate()
(upsert) operation is performed and any existing document with that"_id"
is overwritten. Otherwise aninsert()
operation is performed. In this case if manipulate isTrue
an"_id"
will be added to to_save and this method returns the"_id"
of the saved document. If manipulate isFalse
the"_id"
will be added by the server but this method will returnNone
.Raises
TypeError
if to_save is not an instance ofdict
.Write concern options can be passed as keyword arguments, overriding any global defaults. Valid options include w=<int/string>, wtimeout=<int>, j=<bool>, or fsync=<bool>. See the parameter list below for a detailed explanation of these options.
By default an acknowledgment is requested from the server that the save was successful, raising
OperationFailure
if an error occurred. Passing w=0 disables write acknowledgement and all other write concern options.Parameters: - to_save: the document to be saved
- manipulate (optional): manipulate the document before saving it?
- safe (optional): DEPRECATED - Use w instead.
- check_keys (optional): check if keys start with ‘$’ or
contain ‘.’, raising
InvalidName
in either case. - w (optional): (integer or string) If this is a replica set, write operations will block until they have been replicated to the specified number or tagged set of servers. w=<int> always includes the replica set primary (e.g. w=3 means write to the primary and wait until replicated to two secondaries). Passing w=0 disables write acknowledgement and all other write concern options.
- wtimeout (optional): (integer) Used in conjunction with w. Specify a value in milliseconds to control how long to wait for write propagation to complete. If replication does not complete in the given timeframe, a timeout exception is raised.
- j (optional): If
True
block until write operations have been committed to the journal. Ignored if the server is running without journaling. - fsync (optional): If
True
force the database to fsync all files before returning. When used with j the server awaits the next group commit before returning.
Returns: - The
'_id'
value of to_save or[None]
if manipulate isFalse
and to_save has no ‘_id’ field.
New in version 1.8: Support for passing getLastError options as keyword arguments.
-
update
(spec, document[, upsert=False[, manipulate=False[, safe=None[, multi=False[, check_keys=True[, **kwargs]]]]]])¶ Update a document(s) in this collection.
Raises
TypeError
if either spec or document is not an instance ofdict
or upsert is not an instance ofbool
.Write concern options can be passed as keyword arguments, overriding any global defaults. Valid options include w=<int/string>, wtimeout=<int>, j=<bool>, or fsync=<bool>. See the parameter list below for a detailed explanation of these options.
By default an acknowledgment is requested from the server that the update was successful, raising
OperationFailure
if an error occurred. Passing w=0 disables write acknowledgement and all other write concern options.There are many useful update modifiers which can be used when performing updates. For example, here we use the
"$set"
modifier to modify some fields in a matching document:>>> db.test.insert({"x": "y", "a": "b"}) ObjectId('...') >>> list(db.test.find()) [{u'a': u'b', u'x': u'y', u'_id': ObjectId('...')}] >>> db.test.update({"x": "y"}, {"$set": {"a": "c"}}) {...} >>> list(db.test.find()) [{u'a': u'c', u'x': u'y', u'_id': ObjectId('...')}]
Parameters: - spec: a
dict
orSON
instance specifying elements which must be present for a document to be updated - document: a
dict
orSON
instance specifying the document to be used for the update or (in the case of an upsert) insert - see docs on MongoDB update modifiers - upsert (optional): perform an upsert if
True
- manipulate (optional): manipulate the document before
updating? If
True
all instances ofSONManipulator
added to thisDatabase
will be applied to the document before performing the update. - check_keys (optional): check if keys in document start
with ‘$’ or contain ‘.’, raising
InvalidName
. Only applies to document replacement, not modification through $ operators. - safe (optional): DEPRECATED - Use w instead.
- multi (optional): update all documents that match
spec, rather than just the first matching document. The
default value for multi is currently
False
, but this might eventually change toTrue
. It is recommended that you specify this argument explicitly for all update operations in order to prepare your code for that change. - w (optional): (integer or string) If this is a replica set, write operations will block until they have been replicated to the specified number or tagged set of servers. w=<int> always includes the replica set primary (e.g. w=3 means write to the primary and wait until replicated to two secondaries). Passing w=0 disables write acknowledgement and all other write concern options.
- wtimeout (optional): (integer) Used in conjunction with w. Specify a value in milliseconds to control how long to wait for write propagation to complete. If replication does not complete in the given timeframe, a timeout exception is raised.
- j (optional): If
True
block until write operations have been committed to the journal. Ignored if the server is running without journaling. - fsync (optional): If
True
force the database to fsync all files before returning. When used with j the server awaits the next group commit before returning.
Returns: - A document (dict) describing the effect of the update or
None
if write acknowledgement is disabled.
New in version 1.8: Support for passing getLastError options as keyword arguments.
Changed in version 1.4: Return the response to lastError if safe is
True
.New in version 1.1.1: The multi parameter.
- spec: a
-
remove
([spec_or_id=None[, safe=None[, multi=True[, **kwargs]]]])¶ Remove a document(s) from this collection.
Warning
Calls to
remove()
should be performed with care, as removed data cannot be restored.If spec_or_id is
None
, all documents in this collection will be removed. This is not equivalent to callingdrop_collection()
, however, as indexes will not be removed.Write concern options can be passed as keyword arguments, overriding any global defaults. Valid options include w=<int/string>, wtimeout=<int>, j=<bool>, or fsync=<bool>. See the parameter list below for a detailed explanation of these options.
By default an acknowledgment is requested from the server that the remove was successful, raising
OperationFailure
if an error occurred. Passing w=0 disables write acknowledgement and all other write concern options.Parameters: - spec_or_id (optional): a dictionary specifying the
documents to be removed OR any other type specifying the
value of
"_id"
for the document to be removed - safe (optional): DEPRECATED - Use w instead.
- multi (optional): If
True
(the default) remove all documents matching spec_or_id, otherwise remove only the first matching document. - w (optional): (integer or string) If this is a replica set, write operations will block until they have been replicated to the specified number or tagged set of servers. w=<int> always includes the replica set primary (e.g. w=3 means write to the primary and wait until replicated to two secondaries). Passing w=0 disables write acknowledgement and all other write concern options.
- wtimeout (optional): (integer) Used in conjunction with w. Specify a value in milliseconds to control how long to wait for write propagation to complete. If replication does not complete in the given timeframe, a timeout exception is raised.
- j (optional): If
True
block until write operations have been committed to the journal. Ignored if the server is running without journaling. - fsync (optional): If
True
force the database to fsync all files before returning. When used with j the server awaits the next group commit before returning.
Returns: - A document (dict) describing the effect of the remove or
None
if write acknowledgement is disabled.
New in version 1.8: Support for passing getLastError options as keyword arguments.
Changed in version 1.7: Accept any type other than a
dict
instance for removal by"_id"
, not justObjectId
instances.Changed in version 1.4: Return the response to lastError if safe is
True
.Changed in version 1.2: The spec_or_id parameter is now optional. If it is not specified all documents in the collection will be removed.
New in version 1.1: The safe parameter.
- spec_or_id (optional): a dictionary specifying the
documents to be removed OR any other type specifying the
value of
-
initialize_unordered_bulk_op
()¶ Initialize an unordered batch of write operations.
Operations will be performed on the server in arbitrary order, possibly in parallel. All operations will be attempted.
Returns a
BulkOperationBuilder
instance.See Unordered Bulk Write Operations for examples.
New in version 2.7.
-
initialize_ordered_bulk_op
()¶ Initialize an ordered batch of write operations.
Operations will be performed on the server serially, in the order provided. If an error occurs all remaining operations are aborted.
Returns a
BulkOperationBuilder
instance.See Ordered Bulk Write Operations for examples.
New in version 2.7.
-
drop
()¶ Alias for
drop_collection()
.The following two calls are equivalent:
>>> db.foo.drop() >>> db.drop_collection("foo")
New in version 1.8.
-
find
([spec=None[, fields=None[, skip=0[, limit=0[, timeout=True[, snapshot=False[, tailable=False[, sort=None[, max_scan=None[, as_class=None[, slave_okay=False[, await_data=False[, partial=False[, manipulate=True[, read_preference=ReadPreference.PRIMARY[, exhaust=False[, compile_re=True[, **kwargs]]]]]]]]]]]]]]]]]])¶ Query the database.
The spec argument is a prototype document that all results must match. For example:
>>> db.test.find({"hello": "world"})
only matches documents that have a key “hello” with value “world”. Matches can have other keys in addition to “hello”. The fields argument is used to specify a subset of fields that should be included in the result documents. By limiting results to a certain subset of fields you can cut down on network traffic and decoding time.
Raises
TypeError
if any of the arguments are of improper type. Returns an instance ofCursor
corresponding to this query.Parameters: - spec (optional): a SON object specifying elements which must be present for a document to be included in the result set
- fields (optional): a list of field names that should be returned in the result set or a dict specifying the fields to include or exclude. If fields is a list “_id” will always be returned. Use a dict to exclude fields from the result (e.g. fields={‘_id’: False}).
- skip (optional): the number of documents to omit (from the start of the result set) when returning the results
- limit (optional): the maximum number of results to return
- timeout (optional): if True (the default), any returned cursor is closed by the server after 10 minutes of inactivity. If set to False, the returned cursor will never time out on the server. Care should be taken to ensure that cursors with timeout turned off are properly closed.
- snapshot (optional): if True, snapshot mode will be used for this query. Snapshot mode assures no duplicates are returned, or objects missed, which were present at both the start and end of the query’s execution. For details, see the snapshot documentation.
- tailable (optional): the result of this find call will be a tailable cursor - tailable cursors aren’t closed when the last data is retrieved but are kept open and the cursors location marks the final document’s position. if more data is received iteration of the cursor will continue from the last document received. For details, see the tailable cursor documentation.
- sort (optional): a list of (key, direction) pairs
specifying the sort order for this query. See
sort()
for details. - max_scan (optional): limit the number of documents examined when performing the query
- as_class (optional): class to use for documents in the
query result (default is
document_class
) - slave_okay (optional): if True, allows this query to be run against a replica secondary.
- await_data (optional): if True, the server will block for some extra time before returning, waiting for more data to return. Ignored if tailable is False.
- partial (optional): if True, mongos will return partial results if some shards are down instead of returning an error.
- manipulate: (optional): If True (the default), apply any outgoing SON manipulators before returning.
- network_timeout (optional): specify a timeout to use for
this query, which will override the
MongoClient
-level default - read_preference (optional): The read preference for this query.
- tag_sets (optional): The tag sets for this query.
- secondary_acceptable_latency_ms (optional): Any replica-set member whose ping time is within secondary_acceptable_latency_ms of the nearest member may accept reads. Default 15 milliseconds. Ignored by mongos and must be configured on the command line. See the localThreshold option for more information.
- compile_re (optional): if
False
, don’t attempt to compile BSON regex objects into Python regexes. Return instances ofRegex
instead. - exhaust (optional): If
True
create an “exhaust” cursor. MongoDB will stream batched results to the client without waiting for the client to request each batch, reducing latency.
Note
There are a number of caveats to using the exhaust parameter:
1. The exhaust and limit options are incompatible and can not be used together.
2. The exhaust option is not supported by mongos and can not be used with a sharded cluster.
3. A
Cursor
instance created with the exhaust option requires an exclusivesocket
connection to MongoDB. If theCursor
is discarded without being completely iterated the underlyingsocket
connection will be closed and discarded without being returned to the connection pool.4. A
Cursor
instance created with the exhaust option in a request must be completely iterated before executing any other operation.5. The network_timeout option is ignored when using the exhaust option.
Note
The manipulate and compile_re parameters may default to False in future releases.
Note
The max_scan parameter requires server version >= 1.5.1
New in version 2.7: The
compile_re
parameter.New in version 2.3: The tag_sets and secondary_acceptable_latency_ms parameters.
New in version 1.11+: The await_data, partial, and manipulate parameters.
New in version 1.8: The network_timeout parameter.
New in version 1.7: The sort, max_scan and as_class parameters.
Changed in version 1.7: The fields parameter can now be a dict or any iterable in addition to a list.
New in version 1.1: The tailable parameter.
-
find_one
([spec_or_id=None[, *args[, **kwargs]]])¶ Get a single document from the database.
All arguments to
find()
are also valid arguments forfind_one()
, although any limit argument will be ignored. Returns a single document, orNone
if no matching document is found.Parameters: spec_or_id (optional): a dictionary specifying the query to be performed OR any other type to be used as the value for a query for
"_id"
.*args (optional): any additional positional arguments are the same as the arguments to
find()
.**kwargs (optional): any additional keyword arguments are the same as the arguments to
find()
.max_time_ms (optional): a value for max_time_ms may be specified as part of **kwargs, e.g.
>>> find_one(max_time_ms=100)
Changed in version 1.7: Allow passing any of the arguments that are valid for
find()
.Changed in version 1.7: Accept any type other than a
dict
instance as an"_id"
query, not justObjectId
instances.
-
parallel_scan
(num_cursors, **kwargs)¶ Scan this entire collection in parallel.
Returns a list of up to
num_cursors
cursors that can be iterated concurrently. As long as the collection is not modified during scanning, each document appears once in one of the cursors’ result sets.For example, to process each document in a collection using some thread-safe
process_document()
function:def process_cursor(cursor): for document in cursor: # Some thread-safe processing function: process_document(document) # Get up to 4 cursors. cursors = collection.parallel_scan(4) threads = [ threading.Thread(target=process_cursor, args=(cursor,)) for cursor in cursors] for thread in threads: thread.start() for thread in threads: thread.join() # All documents have now been processed.
With
MongoReplicaSetClient
orMasterSlaveConnection
, if the read_preference attribute of this instance is not set topymongo.read_preferences.ReadPreference.PRIMARY
or the (deprecated) slave_okay attribute of this instance is set to True the command will be sent to a secondary or slave.Parameters: - num_cursors: the number of cursors to return
Note
Requires server version >= 2.5.5.
-
count
()¶ Get the number of documents in this collection.
To get the number of documents matching a specific query use
pymongo.cursor.Cursor.count()
.
-
create_index
(key_or_list, cache_for=300, **kwargs)¶ Creates an index on this collection.
Takes either a single key or a list of (key, direction) pairs. The key(s) must be an instance of
basestring
(str
in python 3), and the direction(s) should be one of (ASCENDING
,DESCENDING
,GEO2D
,GEOHAYSTACK
,GEOSPHERE
,HASHED
,TEXT
).To create a simple ascending index on the key
'mike'
we just use a string argument:>>> my_collection.create_index("mike")
For a compound index on
'mike'
descending and'eliot'
ascending we need to use a list of tuples:>>> my_collection.create_index([("mike", pymongo.DESCENDING), ... ("eliot", pymongo.ASCENDING)])
All optional index creation parameters should be passed as keyword arguments to this method. For example:
>>> my_collection.create_index([("mike", pymongo.DESCENDING)], ... background=True)
Valid options include:
- name: custom name to use for this index - if none is given, a name will be generated
- unique: if
True
creates a unique constraint on the index - dropDups or drop_dups: if
True
duplicate values are dropped during index creation when creating a unique index - background: if
True
this index should be created in the background - sparse: if
True
, omit from the index any documents that lack the indexed field - bucketSize or bucket_size: for use with geoHaystack indexes. Number of documents to group together within a certain proximity to a given longitude and latitude.
- min: minimum value for keys in a
GEO2D
index - max: maximum value for keys in a
GEO2D
index - expireAfterSeconds: <int> Used to create an expiring (TTL) collection. MongoDB will automatically delete documents from this collection after <int> seconds. The indexed field must be a UTC datetime or the data will not expire.
Note
expireAfterSeconds requires server version >= 2.1.2
Parameters: - key_or_list: a single key or a list of (key, direction) pairs specifying the index to create
- cache_for (optional): time window (in seconds) during which
this index will be recognized by subsequent calls to
ensure_index()
- see documentation forensure_index()
for details - **kwargs (optional): any additional index creation options (see the above list) should be passed as keyword arguments
- ttl (deprecated): Use cache_for instead.
Changed in version 2.3: The ttl parameter has been deprecated to avoid confusion with TTL collections. Use cache_for instead.
Changed in version 2.2: Removed deprecated argument: deprecated_unique
Changed in version 1.5.1: Accept kwargs to support all index creation options.
New in version 1.5: The name parameter.
See also
-
ensure_index
(key_or_list, cache_for=300, **kwargs)¶ Ensures that an index exists on this collection.
Takes either a single key or a list of (key, direction) pairs. The key(s) must be an instance of
basestring
(str
in python 3), and the direction(s) should be one of (ASCENDING
,DESCENDING
,GEO2D
,GEOHAYSTACK
,GEOSPHERE
,HASHED
,pymongo.TEXT
).See
create_index()
for detailed examples.Unlike
create_index()
, which attempts to create an index unconditionally,ensure_index()
takes advantage of some caching within the driver such that it only attempts to create indexes that might not already exist. When an index is created (or ensured) by PyMongo it is “remembered” for cache_for seconds. Repeated calls toensure_index()
within that time limit will be lightweight - they will not attempt to actually create the index.Care must be taken when the database is being accessed through multiple clients at once. If an index is created using this client and deleted using another, any call to
ensure_index()
within the cache window will fail to re-create the missing index.Returns the specified or generated index name used if
ensure_index()
attempts to create the index. ReturnsNone
if the index is already cached.All optional index creation parameters should be passed as keyword arguments to this method. Valid options include:
- name: custom name to use for this index - if none is given, a name will be generated
- unique: if
True
creates a unique constraint on the index - dropDups or drop_dups: if
True
duplicate values are dropped during index creation when creating a unique index - background: if
True
this index should be created in the background - sparse: if
True
, omit from the index any documents that lack the indexed field - bucketSize or bucket_size: for use with geoHaystack indexes. Number of documents to group together within a certain proximity to a given longitude and latitude.
- min: minimum value for keys in a
GEO2D
index - max: maximum value for keys in a
GEO2D
index - expireAfterSeconds: <int> Used to create an expiring (TTL) collection. MongoDB will automatically delete documents from this collection after <int> seconds. The indexed field must be a UTC datetime or the data will not expire.
Note
expireAfterSeconds requires server version >= 2.1.2
Parameters: - key_or_list: a single key or a list of (key, direction) pairs specifying the index to create
- cache_for (optional): time window (in seconds) during which
this index will be recognized by subsequent calls to
ensure_index()
- **kwargs (optional): any additional index creation options (see the above list) should be passed as keyword arguments
- ttl (deprecated): Use cache_for instead.
Changed in version 2.3: The ttl parameter has been deprecated to avoid confusion with TTL collections. Use cache_for instead.
Changed in version 2.2: Removed deprecated argument: deprecated_unique
Changed in version 1.5.1: Accept kwargs to support all index creation options.
New in version 1.5: The name parameter.
See also
-
drop_index
(index_or_name)¶ Drops the specified index on this collection.
Can be used on non-existant collections or collections with no indexes. Raises OperationFailure on an error (e.g. trying to drop an index that does not exist). index_or_name can be either an index name (as returned by create_index), or an index specifier (as passed to create_index). An index specifier should be a list of (key, direction) pairs. Raises TypeError if index is not an instance of (str, unicode, list).
Warning
if a custom name was used on index creation (by passing the name parameter to
create_index()
orensure_index()
) the index must be dropped by name.Parameters: - index_or_name: index (or name of index) to drop
-
drop_indexes
()¶ Drops all indexes on this collection.
Can be used on non-existant collections or collections with no indexes. Raises OperationFailure on an error.
-
reindex
()¶ Rebuilds all indexes on this collection.
Warning
reindex blocks all other operations (indexes are built in the foreground) and will be slow for large collections.
New in version 1.11+.
-
index_information
()¶ Get information on this collection’s indexes.
Returns a dictionary where the keys are index names (as returned by create_index()) and the values are dictionaries containing information about each index. The dictionary is guaranteed to contain at least a single key,
"key"
which is a list of (key, direction) pairs specifying the index (as passed to create_index()). It will also contain any other information in system.indexes, except for the"ns"
and"name"
keys, which are cleaned. Example output might look like this:>>> db.test.ensure_index("x", unique=True) u'x_1' >>> db.test.index_information() {u'_id_': {u'key': [(u'_id', 1)]}, u'x_1': {u'unique': True, u'key': [(u'x', 1)]}}
Changed in version 1.7: The values in the resultant dictionary are now dictionaries themselves, whose
"key"
item contains the list that was the value in previous versions of PyMongo.
-
options
()¶ Get the options set on this collection.
Returns a dictionary of options and their values - see
create_collection()
for more information on the possible options. Returns an empty dictionary if the collection has not been created yet.
-
aggregate
(pipeline, **kwargs)¶ Perform an aggregation using the aggregation framework on this collection.
With
MongoReplicaSetClient
orMasterSlaveConnection
, if the read_preference attribute of this instance is not set topymongo.read_preferences.ReadPreference.PRIMARY
or the (deprecated) slave_okay attribute of this instance is set to True the aggregate command will be sent to a secondary or slave.Parameters: - pipeline: a single command or list of aggregation commands
- **kwargs: send arbitrary parameters to the aggregate command
Note
Requires server version >= 2.1.0.
With server version >= 2.5.1, pass
cursor={}
to retrieve unlimited aggregation results with aCommandCursor
:pipeline = [{'$project': {'name': {'$toUpper': '$name'}}}] cursor = collection.aggregate(pipeline, cursor={}) for doc in cursor: print doc
Changed in version 2.7: When the cursor option is used, return
CommandCursor
instead ofCursor
.Changed in version 2.6: Added cursor support.
New in version 2.3.
-
group
(key, condition, initial, reduce, finalize=None, **kwargs)¶ Perform a query similar to an SQL group by operation.
Returns an array of grouped items.
The key parameter can be:
None
to use the entire document as a key.- A
list
of keys (each abasestring
(str
in python 3)) to group by. - A
basestring
(str
in python 3), orCode
instance containing a JavaScript function to be applied to each document, returning the key to group by.
With
MongoReplicaSetClient
orMasterSlaveConnection
, if the read_preference attribute of this instance is not set topymongo.read_preferences.ReadPreference.PRIMARY
orpymongo.read_preferences.ReadPreference.PRIMARY_PREFERRED
, or the (deprecated) slave_okay attribute of this instance is set to True, the group command will be sent to a secondary or slave.Parameters: - key: fields to group by (see above description)
- condition: specification of rows to be
considered (as a
find()
query specification) - initial: initial value of the aggregation counter object
- reduce: aggregation function as a JavaScript string
- finalize: function to be called on each object in output list.
Changed in version 2.2: Removed deprecated argument: command
Changed in version 1.4: The key argument can now be
None
or a JavaScript function, in addition to alist
of keys.Changed in version 1.3: The command argument now defaults to
True
and is deprecated.
-
rename
(new_name, **kwargs)¶ Rename this collection.
If operating in auth mode, client must be authorized as an admin to perform this operation. Raises
TypeError
if new_name is not an instance ofbasestring
(str
in python 3). RaisesInvalidName
if new_name is not a valid collection name.Parameters: - new_name: new name for this collection
- **kwargs (optional): any additional rename options
should be passed as keyword arguments
(i.e.
dropTarget=True
)
New in version 1.7: support for accepting keyword arguments for rename options
-
distinct
(key)¶ Get a list of distinct values for key among all documents in this collection.
Raises
TypeError
if key is not an instance ofbasestring
(str
in python 3).To get the distinct values for a key in the result set of a query use
distinct()
.Parameters: - key: name of key for which we want to get the distinct values
Note
Requires server version >= 1.1.0
New in version 1.1.1.
-
map_reduce
(map, reduce, out, full_response=False, **kwargs)¶ Perform a map/reduce operation on this collection.
If full_response is
False
(default) returns aCollection
instance containing the results of the operation. Otherwise, returns the full response from the server to the map reduce command.Parameters: map: map function (as a JavaScript string)
reduce: reduce function (as a JavaScript string)
out: output collection name or out object (dict). See the map reduce command documentation for available options. Note: out options are order sensitive.
SON
can be used to specify multiple options. e.g. SON([(‘replace’, <collection name>), (‘db’, <database name>)])full_response (optional): if
True
, return full response to this command - otherwise just return the result collection**kwargs (optional): additional arguments to the map reduce command may be passed as keyword arguments to this helper method, e.g.:
>>> db.test.map_reduce(map, reduce, "myresults", limit=2)
Note
Requires server version >= 1.1.1
See also
Changed in version 2.2: Removed deprecated arguments: merge_output and reduce_output
Changed in version 1.11+: DEPRECATED The merge_output and reduce_output parameters.
New in version 1.2.
-
inline_map_reduce
(map, reduce, full_response=False, **kwargs)¶ Perform an inline map/reduce operation on this collection.
Perform the map/reduce operation on the server in RAM. A result collection is not created. The result set is returned as a list of documents.
If full_response is
False
(default) returns the result documents in a list. Otherwise, returns the full response from the server to the map reduce command.With
MongoReplicaSetClient
orMasterSlaveConnection
, if the read_preference attribute of this instance is not set topymongo.read_preferences.ReadPreference.PRIMARY
orpymongo.read_preferences.ReadPreference.PRIMARY_PREFERRED
, or the (deprecated) slave_okay attribute of this instance is set to True, the inline map reduce will be run on a secondary or slave.Parameters: map: map function (as a JavaScript string)
reduce: reduce function (as a JavaScript string)
full_response (optional): if
True
, return full response to this command - otherwise just return the result collection**kwargs (optional): additional arguments to the map reduce command may be passed as keyword arguments to this helper method, e.g.:
>>> db.test.inline_map_reduce(map, reduce, limit=2)
Note
Requires server version >= 1.7.4
New in version 1.10.
-
find_and_modify
(query={}, update=None, upsert=False, sort=None, full_response=False, **kwargs)¶ Update and return an object.
This is a thin wrapper around the findAndModify command. The positional arguments are designed to match the first three arguments to
update()
however most options should be passed as named parameters. Either update or remove arguments are required, all others are optional.Returns either the object before or after modification based on new parameter. If no objects match the query and upsert is false, returns
None
. If upserting and new is false, returns{}
.If the full_response parameter is
True
, the return value will be the entire response object from the server, including the ‘ok’ and ‘lastErrorObject’ fields, rather than just the modified object. This is useful mainly because the ‘lastErrorObject’ document holds information about the command’s execution.Parameters: - query: filter for the update (default
{}
) - update: see second argument to
update()
(no default) - upsert: insert if object doesn’t exist (default
False
) - sort: a list of (key, direction) pairs specifying the sort
order for this query. See
sort()
for details. - full_response: return the entire response object from the
server (default
False
) - remove: remove rather than updating (default
False
) - new: return updated rather than original object
(default
False
) - fields: see second argument to
find()
(default all) - **kwargs: any other options the findAndModify command supports can be passed here.
Note
Requires server version >= 1.3.0
Changed in version 2.5: Added the optional full_response parameter
Changed in version 2.4: Deprecated the use of mapping types for the sort parameter
New in version 1.10.
- query: filter for the update (default
-
slave_okay
¶ DEPRECATED. Use
read_preference
instead.Changed in version 2.1: Deprecated slave_okay.
New in version 2.0.
-
safe
¶ DEPRECATED: Use the ‘w’
write_concern
option instead.Use getlasterror with every write operation?
New in version 2.0.
-
get_lasterror_options
()¶ DEPRECATED: Use
write_concern
instead.Returns a dict of the getlasterror options set on this instance.
Changed in version 2.4: Deprecated get_lasterror_options.
New in version 2.0.
-
set_lasterror_options
(**kwargs)¶ DEPRECATED: Use
write_concern
instead.Set getlasterror options for this instance.
Valid options include j=<bool>, w=<int/string>, wtimeout=<int>, and fsync=<bool>. Implies safe=True.
Parameters: - **kwargs: Options should be passed as keyword
arguments (e.g. w=2, fsync=True)
Changed in version 2.4: Deprecated set_lasterror_options.
New in version 2.0.
-
unset_lasterror_options
(*options)¶ DEPRECATED: Use
write_concern
instead.Unset getlasterror options for this instance.
If no options are passed unsets all getlasterror options. This does not set safe to False.
Parameters: - *options: The list of options to unset.
Changed in version 2.4: Deprecated unset_lasterror_options.
New in version 2.0.