Collaborative Web Apps
Recently I've been thinking about how to build a collaborative web app in 2021.
By collaborative web app I mean apps with desktop-like interactions and realtime collaborations, such as Notion, Discord, Figma, etc.
I want to find an approach suitable for small teams, simple but not necessarily easy, fast by default, and scalable when it grows.
Thus I started a journey to find out by creating a demo todo app, exploring the toolsI welcome all kinds of third-party tools and services, as long as they don't bring in hard-to-replace vendor lock-ins. and methods along the way.
Our todo app has these features:
- users can create, edit, delete and reorder todos, which are stored on the server for persistency
- users can cooperatively edit the same list of todos, and changes are automatically synced between different clients in realtime
- all the operations should be as fast as possible for all the users around the world
The fast part is mainly concerned with latency, because most apps won't have much throughput to deal with in the beginning. More specifically, we want the changes made by one client to be delivered to other clients in a portion of the speed of light - less than 100ms in the same geographical region and several hundred ms across the continent. The app is named Todo Light for this reason.
You can play with the end productThe result failed to meet our goal for speed, depending on your location in the world. More on this later. here. The client and server code are both hosted on Github.
A random list id is created when the app starts. You can share the URL with the list_id parameter to collaboratively edit the same list of todos with others (or yourself in another browser tab).
Todo Light is simple enough as a demo but nonetheless embodies some essence of the SaaS apps mentioned above, albeit in a much-simplified manner.
Let's get started building it.
Client
We begin from the client because it contains the core of our app.
The user interface part is easy, which we use React to build. Other reactive UI frameworks like Svelte and Vue should work as well. I choose React because of familiarity.
The client-only version of the app is straightforward to write:
import React, { useState } from 'react'
export default function TodoApp() {
const [todos, setTodos] = useState([])
const [content, setContent] = useState('')
return (
<div>
<form
onSubmit={(e) => {
e.preventDefault()
if (content.length > 0) {
setTodos((todos) => {
return [...todos, { content, completed: false }]
})
setContent('')
}
}}
>
<input
autoFocus
value={content}
onChange={(e) => setContent(e.target.value)}
/>
</form>
<ul>
{todos.map((todo, index) => {
return (
<li key={index}>
<span>{todo.content}</span>
<div>
<label>
<input
type="checkbox"
checked={todo.completed}
onChange={(e) => {
setTodos((todos) => {
return todos.map((todo, i) => {
if (i === index) {
return {
...todo,
completed: e.target.checked,
}
} else {
return todo
}
})
})
}}
/>
</label>
<button
onClick={() => {
setTodos((todos) => {
return todos.filter((todo, i) => {
return index !== i
})
})
}}
>
x
</button>
</div>
</li>
)
})}
</ul>
</div>
)
}
With less than 100 lines of code, the resulting app already looks and works like the end product, except that its state is volatile. Refresh the browser, and you'd lose all your todo items!
We use React's state to store data, which is okay for the input value because it is temporary by nature, but not quite right for the todos. The todos need to be:
- updated in a local browser cache for maximal speedThis pattern is sometimes called Optimistic UI
- synced to the server for persistency
- delivered to other clients in correct order and state.
Nowadays, there is a plethora of frontend state management libraries to choose from: Redux, MobX, Recoil, GraphQL clients like Apollo and Relay, etc. Sadly none of them works in our use case. What we need is a distributed system with realtime syncing and conflict resolution baked in. Although there are good writings on this subject, distributed systems are still too hard to implement correctly for a one-person team. I'd like to bring in some help.
After some search, a promising option shows up - Replicache, of which the homepage says:
Replicache makes it easy to add realtime collaboration, lag-free UI, and offline support to web apps. It works with any backend stack.
Sounds too good to be true (spoiler: it's mostly true). How does Replicache achieve these bold claims? Its doc site has a whole page to explain how it works. To save your time, I will summarize roughly here.
Replicache implements a persistent store in the browser, using IndexedDB. You can mutate the store locally and subscribe to part of the store in your UI. When data changes, subscriptions re-fire, and the UI refreshes.
You need to provide two backend endpoints for Replicache to talk to: replicache-pull and replicache-push. replicache-pull sends back a subset of your database for the current client. replicache-push updates the database from local mutations. After applying a mutation on the server, you send a WebSocket As the Replicache doc says, managing your own WebSocket backend has a very high operational cost. We use Ably here. message hinting to affected clients to pull again.
That's all you need to do. Relicache orchestrates the whole process to make sure the state is consistent while being synced in realtime.
We will dive into the backend integration in the next section of this article. For now, let's rewrite the state-related code utilizing Replicache:
// Only relevant part are shown
import { Replicache } from 'replicache'
import { useSubscribe } from 'replicache-react'
import { nanoid } from 'nanoid'
const rep = new Replicache({
// other replicache options
mutators: {
async createTodo(tx, { id, completed, content, order }) {
await tx.put(`todo/${id}`, {
completed,
content,
order,
id,
})
},
async updateTodoCompleted(tx, { id, completed }) {
const key = `todo/${id}`
const todo = await tx.get(key)
todo.completed = completed
await tx.put(`todo/${id}`, todo)
},
async deleteTodo(tx, { id }) {
await tx.del(`todo/${id}`)
},
},
})
export default function TodoApp() {
const todos =
useSubscribe(rep, async (tx) => {
return await tx.scan({ prefix: 'todo/' }).entries().toArray()
}) ?? []
const onSubmit = (e) => {
e.preventDefault()
if (content.length > 0) {
rep.mutate.createTodo({
id: nanoid(),
content,
completed: false,
})
setContent('')
}
}
const onChangeCompleted = (e) => {
rep.mutate.updateTodoCompleted({
id: todo.id,
completed: e.target.checked,
})
}
const onDelete = (_e) => {
rep.mutate.deleteTodo({ id: todo.id })
}
// render
}
We replace React's in-memory state with Replicache's persistent store. The app should work as before, except your carefully written todo items won't disappear when the browser tab closes.
Notice the mutators we register when initializing Replicache. They are the main APIs we use to interact with Replicache's store. When they are executed on the client, the corresponding mutations will be sent to the replicache-push endpoint by Replicache.
With the help of Replicache, you can think about your client state as a giant hashtable. You can read from it and write to it as you like, and Replicache would dutifully keep the state in sync among the server and all the clients.
Server
Now let's move on to the server.
The plan is clear: we will implement the two endpoints needed by Replicache, using some backend language (we use NodeJS in this case) and some database. The only requirement by Replicache is that the database must support a certain kind of transaction.
Before we set out to write the code, we need to think about the architecture. Remember the third feature of Todo Light? It should be as fast as possible for all users around the world.
Since we have implemented Optimistic UI on the client, most operations are already speedy (zero latency). For changes to be synced from one client to others quickly, we still need to achieve low latency for the requests to the server. Hopefully, the latency should be under 100ms for the collaboration to feel realtime.
We can only achieve that by globally deploying the server and the database. If we don't and only deployed to one region, the latency for a user in another continent will be several hundred milliseconds high no matter what we do. It's the speed of light, period.
Globally deploying a stateless server should be easy. At least that's what I initially thought. Turns out I was wrong. In 2021, most cloudMostly I'm referring to PaaS like Heroku and Google App Engine. FaaS (function as a service) is much easier to deploy globally but comes with its own gotchas. providers still only allow you to deploy your server to a single region. You need to go many extra steps to have a global setup.
Luckily I find Fly.io, a cloud service that helps you "deploy app servers close to your users", which is excatly what we need. It comes with an excellent command-line tool and a smooth "push to deploy" deployment flow. Scaling out to multiple regions (in our case, Hong Kong and Los Angeles) takes only a few keystrokes. Even better, they offer a pretty generous free tier.
The only question left is which database we should use. Globally distributed databases with strong consistency is a huge and complicated area that has been tackled by big companies in recent years.
Inspired by Google's Spanner, many open source solutions come out. One of the most polished competitors is CockroachDB. Luckily, they offer a managed service with a 30-day trial.
Although I managed to build a version of Todo Light using CockroachDB, the end product in this article is based on a much simpler Postgres setup with distributed read replicas. Dealing with a global database brings in much complexity that is not essential to the subject matter of this article, which will wait for another piece.
We need two tables, one for todos and one for replicache clients.
Replicache needs to track the last_mutation_id of different clients to coordinate all mutations, whether confirmed or pending. The deleted column is used for soft deletes. The version column is used to compute change for Replicache pulls, which we will explain later.
The replicache-push endpoint receives arguments from the local mutators. Let's persist them to the database. We also need to increment the lastMutationID in the same transaction, as mandated.
router.post('/replicache-push', async (req, res) => {
const { list_id: listID } = req.query
const push = req.body
try {
// db is a typical object than represents a database connection
await db.tx(async (t) => {
let lastMutationID = await getLastMutationID(t, push.clientID)
for (const mutation of push.mutations) {
const expectedMutationID = lastMutationID + 1
if (mutation.id < expectedMutationID) {
console.log(
`Mutation ${mutation.id} has already been processed - skipping`,
)
continue
}
if (mutation.id > expectedMutationID) {
console.warn(`Mutation ${mutation.id} is from the future - aborting`)
break
}
// these mutations are automatically sent by Replicache when we execute their counterparts on the client
switch (mutation.name) {
case 'createTodo':
await createTodo(t, mutation.args, listID)
break
case 'updateTodoCompleted':
await updateTodoCompleted(t, mutation.args)
break
case 'updateTodoOrder':
await updateTodoOrder(t, mutation.args)
break
case 'deleteTodo':
await deleteTodo(t, mutation.args)
break
default:
throw new Error(`Unknown mutation: ${mutation.name}`)
}
lastMutationID = expectedMutationID
}
// after successful mutations we use Ably to notify the clients
const channel = ably.channels.get(`todos-of-${listID}`)
channel.publish('change', {})
await t.none(
'UPDATE replicache_clients SET last_mutation_id = $1 WHERE id = $2',
[lastMutationID, push.clientID],
)
res.send('{}')
})
} catch (e) {
console.error(e)
res.status(500).send(e.toString())
}
})
async function getLastMutationID(t, clientID) {
const clientRow = await t.oneOrNone(
'SELECT last_mutation_id FROM replicache_clients WHERE id = $1',
clientID,
)
if (clientRow) {
return parseInt(clientRow.last_mutation_id)
}
await t.none(
'INSERT INTO replicache_clients (id, last_mutation_id) VALUES ($1, 0)',
clientID,
)
return 0
}
async function createTodo(t, { id, completed, content, order }, listID) {
await t.none(
`INSERT INTO todos (
id, completed, content, ord, list_id) values
($1, $2, $3, $4, $5)`,
[id, completed, content, order, listID],
)
}
async function updateTodoCompleted(t, { id, completed }) {
await t.none(
`UPDATE todos
SET completed = $2, version = gen_random_uuid()
WHERE id = $1
`,
[id, completed],
)
}
// other similar SQL CRUD functions are omitted
The replicache-pull endpoint requires more effort. The general plan is, in every request to replicache-pull we compute a diff of state and an arbitrary cookie (not to be confused with HTTP cookie) to send back to the client. The cookie will be attached to the subsequent request to compute the diff. Rinse and repeat.
How to compute the diff may be the most challenging part of integrating Replicache. The team provides several helpful strategies. We will use the most recommend one: the row version strategy.
router.post('/replicache-pull', async (req, res) => {
const pull = req.body
const { list_id: listID } = req.query
try {
await db.tx(async (t) => {
const lastMutationID = parseInt(
(
await t.oneOrNone(
'select last_mutation_id from replicache_clients where id = $1',
pull.clientID,
)
)?.last_mutation_id ?? '0',
)
const todosByList = await t.manyOrNone(
'select id, completed, content, ord, deleted, version from todos where list_id = $1',
listID,
)
// patch is an array of mutations that will be applied to the client
const patch = []
const cookie = {}
// For initial call we will just clear the client store.
if (pull.cookie == null) {
patch.push({ op: 'clear' })
}
todosByList.forEach(
({ id, completed, content, ord, version, deleted }) => {
// The cookie is a map from row id to row version.
// As the todos count grows, it might become too big to be efficiently exchanged.
// By then, we can compute a hash as a cookie and store the actual cookie on the server.
cookie[id] = version
const key = `todo/${id}`
if (pull.cookie == null || pull.cookie[id] !== version) {
if (deleted) {
patch.push({
op: 'del',
key,
})
} else {
// addtions and updates are all represented as the 'put' op
patch.push({
op: 'put',
key,
value: {
id,
completed,
content,
order: ord,
},
})
}
}
},
)
res.json({ lastMutationID, cookie, patch })
res.end()
})
} catch (e) {
res.status(500).send(e.toString())
}
})
Because version is a random UUID generated by Postgres's gen_random_uuid function, we can use it to efficiently calculate whether a todo item has been updated or not.
That's all for the server code, and we've come to the end of our journey. With the help of many great tools, we've successfully built a fast, collaborative todo app. More importantly, we've worked out a reasonably simple approach to building similar web apps. As the user base and feature set grow, this approach shall scale well in both performance and complexity.
Bonus - Implement Reordering with Fractional Indexing
You may notice that we use the type text for the ord column in the database schema, which seems better suited for a number type. The reason is we are using a technique called Fractional Indexing to implement reordering. Check the source code of Todo Light or try to implement it by yourself. It should be an interesting practice.
At the time of the writing, one shortcoming of Replicache is that its local transactions are not fast enough to enable heavy interactions such as drag and drop. To prevent lagging, we turned on the useMemstore: true
option to disable offline support. Hopefully, this will be fixed soon.
Update
- [2022-06-30] It works again!
- [2022-05-25] The Replicache shortcoming mentioned at the end of this article has been fixed.
- [2022-05-25] The demo is not working right now. I will setup fly.io and deploy it soon.