How does Cloud Firestore work? by Victor Brandalise

Cloud Firestore is a popular database for mobile and web applications. According to its documentation:

[…] It keeps your data in sync across client apps through realtime listeners and offers offline support for mobile and web so you can build responsive apps that work regardless of network latency or Internet connectivity.

Continuing the “How does X work” series, today we’re gonna explore how Firestore works under the hood. How do listeners work? How does it send/receive data from the backend? How does it keep things stored locally? Those are some of the questions we’ll be exploring today.

FirebaseApp

FirebaseInitProvider will handle the initialization of Firebase for the default project that it’s set to operate with using the data in the app’s google-services.json file. When building using Gradle, this ContentProvider is automatically integrated into the app’s manifest and executed when the app is launched.

If an app needs access to another Firebase project in addition to the default project, use initializeApp(Context, FirebaseOptions, String) to do that.

FirebaseFirestore

Represents a single Cloud Firestore database. It’s probably the most used class as it acts as a facade to other classes. It provides methods such as setFirestoreSettings, collection, document, runTransaction, runBatch, waitForPendingWrites, enableNetwork/disableNetwork, clearPersistence, etc.

Responsible for creating FirestoreClient. Most functions are wrappers around FirestoreClient, most of the time FirebaseFirestore does some parsing and validation before calling the method that does the actual work on FirebaseClient.

FirebaseFirestoreSettings

Specifies the configuration for your Firestore instance. The configurable values are host, sslEnabled, persistenceEnabled and cacheSizeBytes.

Data Bundles

Data bundles are serialized collections of documents.

These data bundles can be saved to a CDN or another object storage provider, and then loaded from your client applications. By doing that, you can avoid making extra calls against the Firestore database.

You can read more about Data Bundles here.

FirestoreMultiDbComponent

Even though most of us only use one database instance, Firestore supports multiple instances. FirestoreMultiDbComponent is basically a container that stores references to all available databases.

When you call FirebaseFirestore.getInstance(), it’s actually calling FirestoreMultiDbComponent with the default database id.

Here’s the method that returns a database given its id. It’s also responsible for creating a FirebaseFirestore if there’s none registered.

FirestoreClient

It looks very similar to FirebaseFirestore but here you start to see some real work being done.

Even though a FirestoreClient instance can be reused for multiple users, not all classes can. When FirestoreClient is created, it calls authProvider.setChangeListener to be notified when the logged user changes and notifies SyncEngine about that.

It’s also responsible for creating the Datastore, the class that represents the connection to Firebase Firestore’s server.

It contains waitForPendingWrites that returns a task that resolves when all the pending writes at the time when this method is called received server acknowledgement.

It also contains write. According to its documentation:

Writes mutations. The returned task will be notified when it’s written to the backend.

That’s a interesting thing to note, write will behave differently based on the user’s connectivity. That might be the behavior most people expect but you need to remember that Cloud Firestore can also be used as an offline database. If your app allows users to use it offline, you’ll have take that into consideration.

Most people have never heard of listen(Query, ListenOptions, EventListener<ViewSnapshot>) but you’ve probably used DocumentReference.addSnapshotListener or Query.addSnapshotListener, that’s the method they call to listen for changes.

listen shares the same data source for the equal queries, so calling DocumentReference.addSnapshotListener from multiple places using the same query is not costly.

To accomplish its purpose, listen relies mostly on QueryListener.

QueryListener

As you may know, Firestore is a reactive database meaning that when somebody updates a document you also receive the update if you’re listening for it.

QueryListener is one of the classes responsible for that.

Before we understand how QueryListener works we need to what learn ViewSnapshot is.

ViewSnapshot

A view snapshot is an immutable capture of the results of a query and the changes to them.

Whenever you query for something you don’t get the actual values you queried for. Firestore returns a QuerySnapshot and inside it you can find a ViewSnapshot that contains the values your queried for. Take a look at QuerySnapshot#getDocuments

ViewSnapshot is basically a data class, it holds together a lot of related data. It contains reference to the Query that generated it, the old documents, the new documents, a list of document changes, if the values are from cache, etc.

Now let’s get back to QueryListener. According to the documentation:

QueryListener takes a series of internal view snapshots and determines when to raise events.

Why does it say “determines when to raise events”, don’t all changes raise events? Well, not necessarily. When you start listening for a query you can give it a MetadataChanges, that’s going to define what kind of changes you’ll be notified of. You have two options: INCLUDE and EXCLUDE.

Currently document snapshots have two metadata properties hasPendingWrites and isFromCache.

If you specify MetadataChanges.INCLUDE, you’ll also be notified when any these two fields change. Let’s suppose you’re listening to collection C and a user who has no connectivity writes document D to collection C. Initially hasPendingWrites will be true because this data has not been written to the backend yet. When the document is uploaded the data in your document is probably not going to change but hasPendingWrites will become false and you’ll receive an update for that.

What QueryListener does is basically wait for new view snapshots and decide whether or not it should notify you of a change based on these options.

Mutation

Represents a Mutation of a document.

Mutation is exactly what you think it is, it’s the change of something. A mutation of a document is one or more changes of a document. A change can be setting or removing something. There are 3 main mutations:

DeleteMutation – represents that a document was deleted
SetMutation – represents that a whole document was created or changed
PatchMutation – represents that some fields in a document were created or changed

Mutations also includes the field transformation operations such as array union, array remove, increment value and server timestamp.

Why are Mutations needed? Can’t Firestore simply change the document and store that? Let’s see a few reasons why it’s not so simple.

First, Firestore works offline, if we’re both offline, I modify one field and you modify another field, when we get online we expect the document to have both changes and that is more complex if we only store the mutated document.

Second, field transformations such as array union don’t have a reference to the whole document so Firestore needs an way to represent that transformation for it to be applied on the server.

MutationQueue

A queue of mutations to apply to the remote store.

Whenever you create, update or delete a document, a mutation is created. They are submitted individually or in group in case you’re using a WriteBatch to MutationQueue. That creates a MutationBatch, it’s simply a collection of mutations that will be sent to the server together.

Mutations remain in the MutationQueue until removeMutationBatch is called.

MutationQueue is an interface and like many other classes such as Persistence, ContentProvider, IndexManager, etc it has 2 implementations. A memory implementation and a SQLite implementation.

If you set FirebaseFirestoreSettings.persistenceEnabled to true, the SQLite implementation of these classes will be used to persist the changes locally.

The memory and SQLite implementations are very similar, the main difference is where their data comes from. Let’s see how these classes handle MutationQueue#isEmpty by looking at a reduced code version:

The reason the SQLite implementation filters by uid(user id) is that the same database is shared among multiple users but the memory implementation is instantiated by user.

Persistence

Persistence is the lowest-level shared interface to persistent storage in Firestore.

What does that mean? By “lowest-level shared interface” it’s talking about shared between memory and SQLite. All the methods below have a memory and a SQLite implementation.

getMutationQueue returns a different MemoryMutationQueue instance by user when using the memory implementation and a new SQLiteMutationQueue that shares the database when using the SQLite implementation.
runTransaction uses the native transition mechanism provided by SQL databases when using SQLite and ReferenceDelegate when dealing with the in memory implementation.
and so on…

A lot of code is simplified by this interface that can be used to talk to in memory and SQLite implementations without having to use 2 different classes.

LocalStore

LocalStore is a final class, the same implementation is used whether or not persistence is enabled.

Just because you have persistence disabled doesn’t mean Firestore doesn’t keep things locally. When persistence is disabled those things are kept in memory, when it’s enabled they are keep in SQLite.

Imagine you fetched document A from Firestore, you update the document and now you have A’, do you think A’ is really stored locally? Take a look at documentation provided by Firestore:

The local store provides the local version of documents that have been modified locally. It maintains the constraint: LocalDocument = RemoteDocument + Active(LocalMutations)

The only things that are stored locally(either in memory or in SQLite) are RemoteDocuments and Mutations, the A’ document you have is a combination of those 2 things. Here’s how Firestore does that:

You call FirebaseFirestore#document to get a DocumentReference.
You call DocumentReference#get(Source.CACHE) to force Firestore to return the document available locally.
- DocumentReference calls FirestoreClient#getDocumentFromLocalCache who calls LocalStore#readDocument.
- LocalStore calls LocalDocumentsView#getDocument .

As you can see Firestore never stores the changed version of a document locally, it only stores what’s on the server(RemoteDocumentCache) and the local mutations(MutationQueue), those two things are enough to create the changed document you’re expecting.

LocalStore also contains configureIndices(List<FieldIndex>). When you query a lot of data indexes become essential for maintaining good performance. Locally Firestore creates a simplified version of an index to speed up some queries. To create those indexes it uses IndexManager that contains two implementations:

MemoryIndexManager: only supports collection parent indexing, that’s used when doing collection group queries.
SQLiteIndexManager: supports both collection parent and document field indexing.

Even though document field indexing is supported in SQLite, it appears it’s never used 🤷‍♂️.

Be aware that when a collection query is executed locally, it always iterates through all available documents, if you have a huge amount of documents that can become a problem on some devices.

RemoteStore

RemoteStore handles all interaction with the backend through a simple, clean interface.

RemoteStore is the class that handles streams to talk to the backend. It utilizes WatchStream to observe data and WriteStream to write data to the backend.

WatchStream contains watchQuery that’s used to tell the backend it wants to receive changes related to a given query.
WriteStream contains writeMutations to write all the changes that happened locally to the backend.

RemoteStore polls LocalStore to request the next MutationBatch that should be sent to the backend.

writePipeline is a Deque that queues MutationBatches that were sent and haven’t been acknowledged or will be sent to the server.

You can manually call disableNetwork or enableNetwork if you want to influence how RemoteStore works. By default it’ll use a ConnectivityMonitor to detect the network status and handle streams accordingly.

AndroidConnectivityMonitor

Determining if a user has connectivity is a problem most of us have encountered throughout our careers. On Firebase Firestore the class responsible for dealing with that is AndroidConnectivityMonitor. I won’t go over how it’s done in this article but you check out the code here.

One interesting thing to note is that every time the app is foregrounded it checks for connectivity and calls all listeners in case it’s connected.

EventManager

EventManager is responsible for mapping queries to query event listeners. It handles “fan-out.” (Identical queries will re-use the same watch on the backend.)

Earlier I said that FirestoreClient#listen shares the same data source for identical queries, this is the class that handles that.

addQueryListener is used to register a new query listener, from now on the query will receive updates belonging to it.

onViewSnapshots will be called when the OnlineState changes or when new data is available. I’ll dispatch the changes to all query listeners related to the a ViewSnapshot.

SyncEngine

SyncEngine is the central controller in the client SDK architecture.

SyncEngine is the piece that makes LocalStore, RemoteStore and EventManager work together.

When you can DocumentReference#set or DocumentReference#update the method that ends up getting called is SyncEngine#writeMutations.

It contains handleCredentialChange that gets called when the authenticated user changes, when that happens:

LocalStore is notified that the user changed and a new LocalDocumentsView is created for the new user.
RemoteStore restarts its streams.

WatchChange

A Watch Change is the internal representation of the watcher API protocol buffers.

A WatchChange basically encapsulates what is returned by the backend.

As you can see, the watch stream only receives WatchChanges and it has 3 sub classes:

DocumentChange: Represents a document change.
ExistenceFilterWatchChange: Used to verify the client has the right number of documents locally. It contains an ExistenceFilter that has only one field: count.
WatchTargetChange: Used to update TargetStates.

WatchChangeAggregator

A WatchChangeAggregator is created every time a watch stream is started on RemoteStore. It receives the WatchChanges from RemoteStore and handles them. The easiest one to understand is the DocumentChange.

Whenever a new DocumentChange arrives, RemoteStore calls WatchChangeAggregator#handleDocumentChange. That causes the updated document to be added to pendingDocumentUpdates.

If the new document version is greater than the version that’s stored locally, RemoteStore will call WatchChangeAggregator#createRemoteEvent, a RemoteEvent containing the documents that were added to pendingDocumentUpdates earlier will be created and dispatched to SyncEngine#handleRemoteEvent.

SyncEngine will send the RemoteEvent to LocalStore causing it update RemoteDocumentCache, that’s where the documents stay stored locally. SyncEngine will also cause the queries listeners to update.

This was by no means a complete exploration of the library, there are dozens of topics I didn’t touch for lack of time. Now that you have a basic understanding of how things works, it should be easier for you to continue exploring. You can find the source code here.

If you enjoy learning how libraries work, take a look at my previous article explaining how Crashlytics works.

How does Crashlytics work?

I hope you got to understand a little bit more how this amazing library works. If you have any questions or suggestions feel free to reach me out on Twitter. See you in my next article.

How does Cloud Firestore work?