Kyle d’Oliveira · 2023-08-24 · ruby, rails, debugging, race conditions

Off to the races: 3 ways to avoid race conditions

What is a race condition?

I searched for a good definition of a race condition and this is the best I found:

A race condition is unanticipated behavior caused by multiple processes interacting with shared resources in a different order than expected.

This is quite the mouthful and it is still not very clear how race conditions show up in Rails.

Using Rails, we are always working with multiple processes — each request or background job is an individual process that can operate mostly independent of other processes.

We are also always working with shared resources. Does the application use a relational database? That's a shared resource. Does the application use some kind of caching server? Yup, that's a shared resource. Do you use some kind of external API? You guessed it — that's a shared resource.

There are two example categories of race conditions that I would like to talk about and then touch on how to approach addressing them.

Read-modify-write

The read-modify-write category is a type of race condition where one process will read values from a shared resource, modify the value within memory, and then attempt to write it back to the shared resource. This seems very straightforward when we look at it through the lens of a single process. But when a second process comes up, it can result in some unanticipated behavior.

Consider code that looks like this:

class IdeasController < ActionController::Base

  def vote
    @idea = Idea.find(params[:id])
    @idea.votes += 1
    @idea.save!
  end

end

Here we are reading (Idea.find(params[:id])), modifying (@idea.votes += 1), then writing (@idea.save!).

We can see that this would increment the number of votes on an idea by one. If there was an idea with zero votes, it would end with having one vote. However, if a second request came in and read the idea from the database while it still had zero votes and incremented that value in memory, we could have a situation where two votes come in simultaneously — yet the end result is that the number of votes in the database is only one.

This is also referred to as the lost update race condition.

Check-then-act

The check-then-act category is a type of race condition where data is loaded from a shared resource, and depending on the value present, we determine if an action needs to be performed.

One of the classic examples of how this shows up is in the validates_uniqueness_of validation in Rails, like this:

class User < ActiveRecord::Base
  validates_uniqueness_of :email
end

Consider code that looks like this:

User.create(email: "demo@example.com")

With the validation in place, Rails will check if there is any existing user with that email. If there is no other, it will act by persisting the user into the database. However, what would happen if a second request was executing the same code at the same time? We could end up in a situation where both requests check to determine if there is duplicate data (and there is none) — then they will both act by saving the data, resulting in a duplicate user in the database.

Addressing race conditions

There is no silver bullet for fixing race conditions, but there are a handful of strategies that can be leveraged for any particular problem. There are three main categories for removing race conditions:

1. Remove the critical section

While this could be viewed as deleting the offending code, sometimes you can refactor the code so that it isn't vulnerable to race conditions. Other times, you can look into atomic operations.

An atomic operation is one where no other process can interrupt the operation so you know it will always execute as a single unit.

For the read-modify-write example, instead of incrementing the idea votes in memory, they could be incremented in the database:

@ideas.increment!(:votes)

That will execute sql that looks like this:

UPDATE "ideas" SET "votes" = COALESCE("votes", 0) + 1 WHERE "ideas"."id" = 123

Utilizing this would not be subject to the same race conditions.

For the check-then-act example, instead of allowing Rails to validate the model, we could insert the record directly into the database with an upsert:

User.where(email: "demo@example.com").upsert({}, unique_by: :email)

That will insert the record into the database. If there is a conflict on email (which would require a unique index on email) it will simply ignore the insert.

2. Detect and recover

Sometimes you cannot remove the critical section. It is possible there may be an atomic action, but it doesn't quite work in a way that the code requires. In those situations, you can try a detect and recover approach. With this approach, safeguards are set up that will inform you if a race condition happened. You can either gracefully abort or retry the operation.

For the read-modify-write example, this could be done with optimistic locking. Optimistic locking is built into Rails and can allow detection of when multiple processes are operating on the same record at the same time. To enable optimistic locking, you only need to add a lock_version column to your table and Rails will automatically enable it.

change_table :ideas do |t|
  t.integer :lock_version, default: 0
end

Then when you attempt to update a record, Rails will only update it if the lock_version is the same version it was in memory. If it isn't, it will raise a ActiveRecord::StaleObjectError exception, which can be rescued to handle it. Handling it could be a retry or it could just be an error message reported back to the user.

def vote
  @idea = Idea.find(params[:id])
  @idea.votes += 1
  @idea.save!
rescue ActiveRecord::StaleObjectError
  retry
end

For the check-then-act example, this could be done with a unique index on the column, then rescuing the exception when persisting the data.

add_index :users, [:email], unique: true

With a unique index in place, if data already exists in the database with that email, Rails will raise an ActiveRecord::RecordNotUnique error and that can be rescued and handled appropriately.

begin
  user = User.create(email: "demo@example.com")
rescue ActiveRecord::RecordNotUnique
  user = User.find_by(email: "demo@example.com")
end

Idempotency

In order to retry actions, it is important that the entire operation is idempotent. This means that if an operation is performed multiple times, the result is the same as if it was only applied once.

For instance, imagine if a job sent out an email and it was performed whenever an idea's votes were changed. It would be really bad if an email was sent out for each retry. To make the operation idempotent, you could hold off sending the email until the entire voting operation was complete. Alternatively, you could update the implementation of the process that sends the email to only send the email if votes changed from the last time it was sent out. If a race condition occurs and you need to retry, the first attempt at sending an email might result in a no-op and it is safe to trigger it again.

Many operations might not be idempotent — such as enqueueing a background job, sending an email, or calling a third party API.

3. Protect the code

If you cannot detect and recover, you can try to protect the code. The goal here is to create a contract where only one process can access the shared resource at a time. Effectively, you are removing concurrency — since only one process can have access to a shared resource, we can avoid most race conditions. The tradeoff though is that the more concurrency is removed, the slower the application can be as other process will wait until they are allowed access.

This could be handled using pessimistic locking that is built in with Rails. To use pessimistic locking, you can add lock to queries that are being built, and Rails will tell the database to hold a row lock on those records. The database will then prevent any other process from obtaining the lock until it is done. Be sure to wrap the code in a transaction so the database knows when to release the lock.

Idea.transaction do
  @idea = Idea.lock.find(params[:id])
  @idea.votes += 1
  @idea.save!
end

If row-level locking isn't possible, there are other tools such as Redlock or with_advisory_lock that could be used. These will allow locking an arbitrary block of code. Using this could be as simple as something like this:

email = "demo@example.com"
User.with_advisory_lock("user_uniqueness_#{email}"} do
  User.find_or_create_by(email: email)
end

These strategies will cause processes to wait until a lock is obtained. So, they will also want to have some form of timeout to prevent a process from waiting forever — as well as some handling for what to do in the event of a timeout.

While there is no panacea for fixing race conditions, many race conditions can be fixed through these strategies. However, each problem is a little different, so the details of the solutions can vary. You can take a look at my talk from RailsConf 2023 that goes more into detail about race conditions.