Something I see regularly is developers trying to write stored procedures to service multiple update scenarios. They typically accomplish this by adding multiple input parameters to the stored procedure and then using an OR operator in the update statement to match the parameter that a value was provided for. In many scenarios this works okay and no one ever notices the performance hit this type of stored procedure takes. That is, until the table being updated grows to millions of rows or there are joins involved. Suddenly that update that used to take less than one second it now taking 10-20 seconds.
For my examples, I’ll be using the AdventureWorks 2019 OLTP database. This can be downloaded from Microsoft for anyone interested trying this themselves. First, we’re going to give Janice a few extra hours of vacation time. Below is the statement we’ll use.
Now, before we execute that, let’s check what Indexes are on the table.
We can see there is a NONCLUSTERED index on LoginID, and another on NationalIDNumber. Now, how is SQL Server going to use these indexes when we attempt to update Janice’s vacation hours? After all, we’re providing a value for LoginID, right?
Wow, we actually did a CLUSTERED index scan instead of a NONCLUSTERED seek, and read 290 rows before we found the 1 row we were looking for. Again, not significant numbers based on the size the database we’re testing in. But these numbers can escalate quickly in a much larger database.
But why didn’t SQL Server using our NONCLUSTERED index? Consider this, we don’t have an index with a single key that covers both columns in our WHERE clause. Further, SQL Server has to assume that either one of them won’t be a match due to our use of the OR operator. So, best case scenario would be for SQL Server to do an Index Seek on both NONCLUSTERED indexes and join the results. However, that would be pretty costly, so SQL Server decided it would rather do a CLUSTERED index scan in this scenario.
Now, do we think the result will be an different if we pass in NationalIDNumber instead? let’s try.
No, we get the same execution plan. Now, the estimates are still good in this plan. SQL Server expected 1 row and found 1 row. SQL Server only had to read 290 before finding that 1 row.
Now, let’s add another layer of complexity that I’ve seen a few times. Developers think that by splitting the WHERE clause into two AND statements, each with its own OR operator embedded in the middle, that they can avoid this above issues. This code usually looks like this.
So, if we execute this statement, does SQL Server have a better understanding of which index it can use?
No, the plan actually gets worse. Now SQL Server is expecting to find 55 rows and only finding 1. This may not seem like a big deal, but SQL Server clears RAM for the expected result set, before it begins to execute the query. In this case, SQL Server is clearing space in RAM for the expected 55 rows, instead of just enough space for the one row it will actually find.
Now, let’s try this update a different way. We’re going to write an IF statement to separate out the possible combination of parameters that may be populated at execution time. Here is what that update statement would look like.
Now, how will this statement perform?
Wow, this actually produced the Index Seek we’ve been looking for. Better yet, SQL Server expected to read 1 row and actually read only 1 row. It doesn’t get much better than that. By writing our update statement this way, we’ve allowed SQL Server to build an execution for the exact parameters we populated. If we were to populate NationalIDNumber instead of LoginID, SQL Server would build a plan for that update statement separate from the one for LoginID. Now, I’ll avoid the topic for now around having our developers build specific stored procedures for each of these. That topic is out of scope for the points being considered here.
In summary, you can use an OR operator and get acceptable results most of the time, and often no one ever see a performance issue using it. However, it’s best to understand what SQL Server is doing when you decide to use an OR operator. Let’s try to avoid them, and prevent future bottlenecks from popping up in our code.