We all are Senior Engineers now. Many of us weren’t ready for it. Let’s play catchup and start learning things a senior engineer knows.
In this post, I share lessons about avoiding race conditions by getting to know your ORMs better. I use code examples from Rails but the concepts are applicable to most frameworks and ORMs.
Learn when ORMs actually evaluate queries
Look at the Rails code below and choose the right option:
engineers = Employee.where(role: 'engineer').all
if engineers.count > 0
send_welcome_email(employee_ids: engineers.map(&:id))
end
You can skip to the next section if you answered correctly1
When you construct a query to retrieve objects, ORMs do not immediately run a query on the database. So, the below line of Rails code has no effect:
engineers = Employee.where(role: 'engineer').all
This code returns a “Constructed Query” object. This is an `ActiveRecord::Relation` in Rails, a `QuerySet` in Django etc. (readers with PHP experience, help me with the Laravel equivalent). ORMs allow you to chain, filter, slice these “Constructed Query” objects further without running a database query until you force the query to be evaluated. This evaluation is forced when you actually try to access the rows. For example, here are some ways to force an evaluation:
# convert the object to a list
engineers.to_a
# print the rows
puts "#{engineers}"
# iterate over the rows
engineers.each do { |engineer| do_something(engineer) }
The takeaway:
✨ ORMs are lazy ✨
Most methods that fetch multiple records don’t evaluate until forced. Most methods that fetch a single value/record evaluate immediately. Please read the documentation for your framework to learn more: Rails, Django.
Back to the send_welcome_email
example: the correct answer is 2 queries.
engineers = Employee.where(role: 'engineer').all
# 1st query: COUNT(*)
if engineers.count > 0
# 2nd query: SELECT *
send_welcome_email(employee_ids: engineers.map(&:id))
end
This knowledge about ORMs being lazy is not crucial because you can reduce the number of queries that hit your database, but because you can inadvertently create subtle bugs if you don’t know this. Here are two examples that I have seen in production:
1. No welcome emails
In our send_welcome_email
example, if a concurrent request updated the role for all engineers AFTER you check the count, but BEFORE you fetch IDs, you will send an empty list to `send_welcome_email`. This could have unintended effects: maybe the method doesn’t handle empty lists.
engineers = Employee.where(role: 'engineer').all
if engineers.count > 0
# --> concurrent requests updated all role=engineer to role=agent
# --> EXCEPTION! send_welcome_email doesn't handle empty list
send_welcome_email(employee_ids: engineers.map(&:id))
end
I like to remember this class of bugs by the acronym: TOCTTOU. More technically, this is the Phantom effect.
2. No farewell emails
Look at this
# Get employees that aren't deleted but have exited the company
past_employees = Employee.where(deleted: false).where("exit_ts < ?", Time.now)
# Update them to be deleted
past_employees.update_all(deleted: true)
# Send these employees a farewell email
send_farewell_email(employee_ids: past_employees.map(&:id))
Now that you know ORMs are lazy, you know nobody is receiving any farewell emails. That’s because past_employees
on the first line are still just an ActiveRecord::Relation
. When we actually fetch the employee records later to send the farewell email, the intended employees will be excluded from this query as they were just marked deleted.
Learn what your ORM does on .save()
Consider this scenario and choose the right option:
* Fetch an employee record
* Change any column's value. Eg: employee.salary = 50000
* employee.save()
There is no one correct answer, which makes it more dangerous! The answer will depend on your framework/ORM:
Rails tracks the fields you change and only updates them by default.
Django’s default ORM updates all the fields by default (you can specify which fields to update if you want)
SQLAlchemy, although, has the same behaviour as Rails
Laravel has the same behaviour as Rails.
My take is that Rails’ default is “less surprising”, and makes for fewer headaches in a large codebase. Imagine your team built an admin UI that allows employees to change their name. This UI is backed by some code like this:
employee = Employee.find(id: x)
employee.name = new_name
employee.save
Unknown to you, the finance team has built a UI for Human Resource personnel to update an employee’s salary. That code looks like:
employee = Employee.find(id: x)
employee.salary = new_salary
employee.save
Now pretend that we serve a request for each of these UIs concurrently. Both the requests will receive the same data for the Employee, change the fields and run the .save()
request.
With Rails, there is no conflict by default: each request updates the field they changed.
With Django’s default, depending on the update that hit the database last, the employee will either have the new name (but old salary) or the new salary (but old name). This defect is classed as Lost Update (or TOCTTOU).
The takeaway
✨ Know your ORM’s save behaviour ✨
Django’s default isn’t wrong, it just means that one needs to be more careful when updating whole objects. Wherever possible, use update_fields to only send the changed fields to your database. Also, race conditions will need to be top-of-mind for you every time. You can protect your important writes with locks, but you might end up with a lot of them in a large codebase.
Learn when you can avoid locks
Locks are convenient but shouldn’t be used willy-nilly. Locking doesn’t come for free. There are other ways to make updates safely without race conditions. Let’s look at an example.
You run a gaming website and maintain a high score for each game. Your website is busy and you get many concurrent requests for updating the high score. After a few lost updates, you started using locks to prevent it from happening again:
def update_high_score(game, new_high_score)
game.with_lock("FOR UPDATE") do
return if game.high_score < new_high_score
game.update!(high_score: new_high_score)
end
end
This is called a “Pessimistic Lock”: you prevent any other process to update the record while you are updating it. Pessimistic locks are simple to understand but can cause deadlocks or slow down your application (because requests are waiting for the lock to be released).
You can flip this to an Optimistic Lock by checking for the state as you commit your change. Optimistic locks are even more useful in cases where we don’t need to retry a failed update: if there is already a higher score than the one you were updating, you don’t need to retry.
Our update_high_score
method can be changed to use an Optimistic Locking strategy called a “Conditional Update”. That way, we can remove the lock and move the condition “update only if the new high score is greater than the current one” to the database directly:
def update_high_score(game, new_high_score)
game.class.where(id: game.id).where(
"high_score < #{new_high_score}"
).update_all(
"high_score = #{new_high_score}"
)
end
The takeaway
✨ Embrace Optimism ✨
If you are not used to them, optimistic locks can seem backward. Pessimistic locks are simpler, you lock the row and do what you want. But once you get the hang of it, you will start seeing it like solving a puzzle and enjoy changing your locks where it makes sense. Most databases support conditional updates. For example: DynamoDB.
Atomic updates
This is not an example for Optimistic locking, but I have sometimes seen locks being used to increment values when no locking is required. It seems obvious to use some atomic operations in cache like databases (like Redis or Memcached), but might not be the first thing that comes to mind for non-cache databases even though most of them support atomic operations.
In our gaming website example, if you want to increment the number of times a game has been played, you can handle race conditions using a Pessimistic lock:
def increment_played_count(game)
game.with_lock("FOR UPDATE") do
game.played_count += 1
game.save
end
end
But you can also just increment the value in the database directly by running an atomic update:
def increment_played_count(game)
game.class.where(id: game.id).update_all(
"played_count = played_count + 1"
)
end
Fin.
P.S. - If you enjoy reading about race conditions and how to cause subtle bugs, I wrote a post about it that you will like:
Race Conditions/Concurrency Defects in Databases: A Catalogue
I came across defects caused by race conditions that might occur when we work with databases, while reading Designing Data-Intensive Applications (DDIA), by Martin Kleppmann. It turns out that if we aren’t careful, concurrent transactions can cause a lot of headaches, to say the least. Finding it easy to forget or confuse these defects for each other, I…
Correct answer: 2 queries
🔥🔥🔥🔥