Cloud Firestore is a popular database for mobile and web applications. According to its documentation:
[…] It keeps your data in sync across client apps through realtime listeners and offers offline support for mobile and web so you can build responsive apps that work regardless of network latency or Internet connectivity.
Continuing the “How does X work” series, today we’re gonna explore how Firestore works under the hood. How do listeners work? How does it send/receive data from the backend? How does it keep things stored locally? Those are some of the questions we’ll be exploring today.
FirebaseInitProvider will handle the initialization of Firebase for the default project that it’s set to operate with using the data in the app’s
google-services.json file. When building using Gradle, this
ContentProvider is automatically integrated into the app’s manifest and executed when the app is launched.
If an app needs access to another Firebase project in addition to the default project, use
initializeApp(Context, FirebaseOptions, String) to do that.
Represents a single Cloud Firestore database. It’s probably the most used class as it acts as a facade to other classes. It provides methods such as
Responsible for creating
FirestoreClient. Most functions are wrappers around
FirestoreClient, most of the time
FirebaseFirestore does some parsing and validation before calling the method that does the actual work on
Specifies the configuration for your Firestore instance. The configurable values are
Data bundles are serialized collections of documents.
These data bundles can be saved to a CDN or another object storage provider, and then loaded from your client applications. By doing that, you can avoid making extra calls against the Firestore database.
You can read more about Data Bundles here.
Even though most of us only use one database instance, Firestore supports multiple instances.
FirestoreMultiDbComponent is basically a container that stores references to all available databases.
When you call
FirebaseFirestore.getInstance(), it’s actually calling
FirestoreMultiDbComponent with the default database id.
Here’s the method that returns a database given its id. It’s also responsible for creating a
FirebaseFirestore if there’s none registered.
It looks very similar to
FirebaseFirestore but here you start to see some real work being done.
Even though a
FirestoreClient instance can be reused for multiple users, not all classes can. When
FirestoreClient is created, it calls
authProvider.setChangeListener to be notified when the logged user changes and notifies
SyncEngine about that.
It’s also responsible for creating the
Datastore, the class that represents the connection to Firebase Firestore’s server.
waitForPendingWrites that returns a task that resolves when all the pending writes at the time when this method is called received server acknowledgement.
It also contains
write. According to its documentation:
Writes mutations. The returned task will be notified when it’s written to the backend.
That’s a interesting thing to note,
write will behave differently based on the user’s connectivity. That might be the behavior most people expect but you need to remember that Cloud Firestore can also be used as an offline database. If your app allows users to use it offline, you’ll have take that into consideration.
Most people have never heard of
listen(Query, ListenOptions, EventListener<ViewSnapshot>) but you’ve probably used
Query.addSnapshotListener, that’s the method they call to listen for changes.
listen shares the same data source for the equal queries, so calling
DocumentReference.addSnapshotListener from multiple places using the same query is not costly.
To accomplish its purpose,
listen relies mostly on
As you may know, Firestore is a reactive database meaning that when somebody updates a document you also receive the update if you’re listening for it.
QueryListener is one of the classes responsible for that.
Before we understand how
QueryListener works we need to what learn
A view snapshot is an immutable capture of the results of a query and the changes to them.
Whenever you query for something you don’t get the actual values you queried for. Firestore returns a
QuerySnapshot and inside it you can find a
ViewSnapshot that contains the values your queried for. Take a look at
ViewSnapshot is basically a data class, it holds together a lot of related data. It contains reference to the
Query that generated it, the old documents, the new documents, a list of document changes, if the values are from cache, etc.
Now let’s get back to
QueryListener. According to the documentation:
QueryListener takes a series of internal view snapshots and determines when to raise events.
Why does it say “determines when to raise events”, don’t all changes raise events? Well, not necessarily. When you start listening for a query you can give it a
MetadataChanges, that’s going to define what kind of changes you’ll be notified of. You have two options:
Currently document snapshots have two metadata properties
If you specify
MetadataChanges.INCLUDE, you’ll also be notified when any these two fields change. Let’s suppose you’re listening to collection C and a user who has no connectivity writes document D to collection C. Initially
hasPendingWrites will be true because this data has not been written to the backend yet. When the document is uploaded the data in your document is probably not going to change but
hasPendingWrites will become false and you’ll receive an update for that.
QueryListener does is basically wait for new view snapshots and decide whether or not it should notify you of a change based on these options.
Represents a Mutation of a document.
Mutation is exactly what you think it is, it’s the change of something. A mutation of a document is one or more changes of a document. A change can be setting or removing something. There are 3 main mutations:
- DeleteMutation – represents that a document was deleted
- SetMutation – represents that a whole document was created or changed
- PatchMutation – represents that some fields in a document were created or changed
Mutations also includes the field transformation operations such as array union, array remove, increment value and server timestamp.
Why are Mutations needed? Can’t Firestore simply change the document and store that? Let’s see a few reasons why it’s not so simple.
First, Firestore works offline, if we’re both offline, I modify one field and you modify another field, when we get online we expect the document to have both changes and that is more complex if we only store the mutated document.
Second, field transformations such as array union don’t have a reference to the whole document so Firestore needs an way to represent that transformation for it to be applied on the server.
A queue of mutations to apply to the remote store.
Whenever you create, update or delete a document, a mutation is created. They are submitted individually or in group in case you’re using a
MutationQueue. That creates a
MutationBatch, it’s simply a collection of mutations that will be sent to the server together.
Mutations remain in the
removeMutationBatch is called.
MutationQueue is an interface and like many other classes such as
IndexManager, etc it has 2 implementations. A memory implementation and a SQLite implementation.
If you set
FirebaseFirestoreSettings.persistenceEnabled to true, the SQLite implementation of these classes will be used to persist the changes locally.
The memory and SQLite implementations are very similar, the main difference is where their data comes from. Let’s see how these classes handle
MutationQueue#isEmpty by looking at a reduced code version:
The reason the SQLite implementation filters by uid(user id) is that the same database is shared among multiple users but the memory implementation is instantiated by user.
Persistence is the lowest-level shared interface to persistent storage in Firestore.
What does that mean? By “lowest-level shared interface” it’s talking about shared between memory and SQLite. All the methods below have a memory and a SQLite implementation.
getMutationQueuereturns a different
MemoryMutationQueueinstance by user when using the memory implementation and a new
SQLiteMutationQueuethat shares the database when using the SQLite implementation.
runTransactionuses the native transition mechanism provided by SQL databases when using SQLite and
ReferenceDelegatewhen dealing with the in memory implementation.
- and so on…
A lot of code is simplified by this interface that can be used to talk to in memory and SQLite implementations without having to use 2 different classes.
LocalStore is a final class, the same implementation is used whether or not persistence is enabled.
Just because you have persistence disabled doesn’t mean Firestore doesn’t keep things locally. When persistence is disabled those things are kept in memory, when it’s enabled they are keep in SQLite.
Imagine you fetched document A from Firestore, you update the document and now you have A’, do you think A’ is really stored locally? Take a look at documentation provided by Firestore:
The local store provides the local version of documents that have been modified locally. It maintains the constraint: LocalDocument = RemoteDocument + Active(LocalMutations)
The only things that are stored locally(either in memory or in SQLite) are
Mutations, the A’ document you have is a combination of those 2 things. Here’s how Firestore does that:
- You call
FirebaseFirestore#documentto get a
- You call
DocumentReference#get(Source.CACHE)to force Firestore to return the document available locally.
As you can see Firestore never stores the changed version of a document locally, it only stores what’s on the server(
RemoteDocumentCache) and the local mutations(
MutationQueue), those two things are enough to create the changed document you’re expecting.
LocalStore also contains
configureIndices(List<FieldIndex>). When you query a lot of data indexes become essential for maintaining good performance. Locally Firestore creates a simplified version of an index to speed up some queries. To create those indexes it uses
IndexManager that contains two implementations:
MemoryIndexManager: only supports collection parent indexing, that’s used when doing collection group queries.
SQLiteIndexManager: supports both collection parent and document field indexing.
Even though document field indexing is supported in SQLite, it appears it’s never used 🤷♂️.
Be aware that when a collection query is executed locally, it always iterates through all available documents, if you have a huge amount of documents that can become a problem on some devices.
RemoteStore handles all interaction with the backend through a simple, clean interface.
RemoteStore is the class that handles streams to talk to the backend. It utilizes
WatchStream to observe data and
WriteStream to write data to the backend.
watchQuerythat’s used to tell the backend it wants to receive changes related to a given query.
writeMutationsto write all the changes that happened locally to the backend.
LocalStore to request the next
MutationBatch that should be sent to the backend.
writePipeline is a
Deque that queues
MutationBatches that were sent and haven’t been acknowledged or will be sent to the server.
You can manually call
enableNetwork if you want to influence how
RemoteStore works. By default it’ll use a
ConnectivityMonitor to detect the network status and handle streams accordingly.
Determining if a user has connectivity is a problem most of us have encountered throughout our careers. On Firebase Firestore the class responsible for dealing with that is
AndroidConnectivityMonitor. I won’t go over how it’s done in this article but you check out the code here.
One interesting thing to note is that every time the app is foregrounded it checks for connectivity and calls all listeners in case it’s connected.
EventManager is responsible for mapping queries to query event listeners. It handles “fan-out.” (Identical queries will re-use the same watch on the backend.)
Earlier I said that
FirestoreClient#listen shares the same data source for identical queries, this is the class that handles that.
addQueryListener is used to register a new query listener, from now on the query will receive updates belonging to it.
onViewSnapshots will be called when the
OnlineState changes or when new data is available. I’ll dispatch the changes to all query listeners related to the a
SyncEngine is the central controller in the client SDK architecture.
SyncEngine is the piece that makes
EventManager work together.
When you can
DocumentReference#update the method that ends up getting called is
handleCredentialChange that gets called when the authenticated user changes, when that happens:
LocalStoreis notified that the user changed and a new
LocalDocumentsViewis created for the new user.
RemoteStorerestarts its streams.
A Watch Change is the internal representation of the watcher API protocol buffers.
WatchChange basically encapsulates what is returned by the backend.
As you can see, the watch stream only receives
WatchChanges and it has 3 sub classes:
DocumentChange: Represents a document change.
ExistenceFilterWatchChange: Used to verify the client has the right number of documents locally. It contains an
ExistenceFilterthat has only one field:
WatchTargetChange: Used to update
WatchChangeAggregator is created every time a watch stream is started on
RemoteStore. It receives the
RemoteStore and handles them. The easiest one to understand is the
Whenever a new
WatchChangeAggregator#handleDocumentChange. That causes the updated document to be added to
If the new document version is greater than the version that’s stored locally,
RemoteStore will call
RemoteEvent containing the documents that were added to
pendingDocumentUpdates earlier will be created and dispatched to
SyncEngine will send the
LocalStore causing it update
RemoteDocumentCache, that’s where the documents stay stored locally.
SyncEngine will also cause the queries listeners to update.
This was by no means a complete exploration of the library, there are dozens of topics I didn’t touch for lack of time. Now that you have a basic understanding of how things works, it should be easier for you to continue exploring. You can find the source code here.
If you enjoy learning how libraries work, take a look at my previous article explaining how Crashlytics works.
I hope you got to understand a little bit more how this amazing library works. If you have any questions or suggestions feel free to reach me out on Twitter. See you in my next article.