
This Is How to Build a Collaborative Text Editor Using Rails
It is a painful realization. You just added a beautiful, multi-page description into your bug tracker’s text editor, complete with photos and a short screencast. Then your co-worker, who left their window open when they went to lunch, helpfully fixes a typo… and overwrites everything you just did. Poof — all that work, gone.
One of my favorite things about Rails is how great it is with solving 99 percent of the issues most applications have — you know, take some information, do stuff with it, and then put that information back in front of people. But as your app becomes more popular, there is more information being entered in the same record, often by many people at the same time. And this can be disastrous.
How great would it be if everyone could work on the same record at the same time? If the record could deal with all those updates, instead of you having to negotiate who makes the updates like a bunch of stranded preteens from Lord of the Flies?
As more and more organizations move all workflow and data online, the expectation is that everyone can work together in real time. Technology companies are building entire businesses around collaborative editing or adding collaboration to their existing products.
This is something that we have been thinking about at Aha! for some time. We know that our customers want to be able to work together seamlessly when creating notes or writing feature descriptions in our application. So we have been researching what is possible to deliver the best collaborative text editor experience.
And if you do not want your customers to have to pass the conch shell just to get work done, you need it too.
What does collaborative editing look like? As developers, you will probably think of Git. You make changes to your document, someone else makes changes on their document, you merge, and then one of you fixes conflicts.
Part of that process is great. You can just make changes without having to wait for anyone else. That is called making changes optimistically, in the sense that you can do stuff without having to tell other people first and assume that your changes will come through.
Part of that process is not great. When you and I are editing the same document at the same time, we do not want to be interrupted every few minutes to deal with conflicts. A text editor that worked like that would be unusable. But conflicts do not constantly happen. They only happen when we try to edit the same place at the same time, which is not common.
When a conflict does happen, though, what if we were not bugged about it? The system could make its best guess on what to do, and one of us could fix it if it was wrong. In theory, this seems like a terrible idea that could never work. In practice, it mostly does work.
So how does this work in practice? The system doesn’t need to be right, it just has to be consistent, and it needs to try to keep your intent.
The image below shows this. If I type in the word “hello,” the system should do its very best to make sure “hello” ends up in the document somewhere. That is intent.

And if someone else types “bye” at the same spot at the same time? Our two documents should eventually end up exactly the same, whether that is “hellobye” or “byehello” — the documents need to be consistent.
What about conflicts? Well, people are kind of natural conflict resolution machines. If you are walking down the hallway, and someone is about to walk into you, you will stop. Probably both of you will move to the same side. And then maybe you will both move to the other side and you will laugh. But then eventually one of you will move, and the other will stand still, and everything will work out.
So, you want your editor to quickly respond to the person using it. If you are typing, you do not want to wait for a network request before you can see what you typed. And you want the other people editing your document to see your changes. And you want to do all of this as fast as possible. How could you make that work?
The first thing you might think of is sending diffs, like the image below. “This person 1 changed line 5 from this to this.” But it is hard to see intent in a diff. All diffs do is tell you what changed, not why.

A better way is to think in terms of actions a person could take: “I inserted character ‘a’ at position 5.” “I deleted character ‘b’ after position 8.”

You can make these actions (or operations) pretty much whatever you want. Everything from “insert some text” to “make this section of text bold.” You can apply these to a document, and when you do, the document changes. So, as you can see below, applying this operation changes “Hello” to “Hello, world.”

If we have operations, and we can send operations, and we can change documents by applying operations, then we almost have collaboration.
What if client A sent an “insert ‘world’ at 5” operation to client B? Client B could apply that operation, and you would have the same doc! Bingo — job is finished and it is perfect. Except it is not.
Because remember — you could both be changing the document at the same time. So, let’s say you have two clients. They each have a document with the text “at” as shown below.
