HOME

Monday, December 17, 2012

SQL Server Index Column Order - Does it Matter?


Problem

There is a delicate balance on performance when it comes to setting up the indexes on a table. Too many indexes and your INSERT / UPDATE / DELETE performance will suffer, but not enough indexing will impact your SELECT performance. This tip will look at the order of the columns in your index and how this order impacts query plans and performance.

Solution

Sample SQL Server Table and Data Population

For this example we will setup two sample tables and populate each of the tables with data. Here is the code:
-- Table creation logic
CREATE TABLE [dbo].[TABLE1] 
([col1] [int] NOT NULL,[col2] [int] NULL,[col3] [int] NULL,[col4] [varchar](50) NULL)
GO
CREATE TABLE [dbo].[TABLE2] 
([col1] [int] NOT NULL,[col2] [int] NULL,[col3] [int] NULL,[col4] [varchar](50) NULL)
GO
ALTER TABLE dbo.TABLE1 ADD CONSTRAINT PK_TABLE1 PRIMARY KEY CLUSTERED (col1) 
GO
ALTER TABLE dbo.TABLE2 ADD CONSTRAINT PK_TABLE2 PRIMARY KEY CLUSTERED (col1)
GO
-- Populate tables
DECLARE @val INT
SELECT @val=1
WHILE @val < 1000
BEGIN  
   INSERT INTO dbo.Table1(col1, col2, col3, col4) VALUES(@val,@val,@val,'TEST')
   INSERT INTO dbo.Table2(col1, col2, col3, col4) VALUES(@val,@val,@val,'TEST')
   SELECT @val=@val+1
END
GO
-- Create multi-column index on table1
CREATE NONCLUSTERED INDEX IX_TABLE1_col2col3 ON dbo.TABLE1 (col2,col3)
  WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
        ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) 
  ON [PRIMARY]
GO

Before we run the following commands include the actual execution plan (Ctrl + M) and also turn on Statistics IO by running "SET STATISTICS IO ON" in the query window.

Single Table Query Example

For our first example let's take a look at a simple query with one column in the WHERE clause. The first query uses the second column (col3) of the index in the WHERE clause and the second query uses the first column (col2) of the index in the WHERE clause. Note we are using "DBCC DROPCLEANBUFFERS" to ensure the cache is empty before each query is executed. Here is the code:
DBCC DROPCLEANBUFFERS
GO
SELECT * FROM dbo.TABLE1 WHERE col3=88
GO
DBCC DROPCLEANBUFFERS
GO
SELECT * FROM dbo.TABLE1 WHERE col2=88
GO
Before running the command turn on the execution plan.  Looking at the explain plans for these two queries we can see that the query that uses the second column (col3) of the index in the WHERE clause is performing an index scan on PK_TABLE1 and not even using the index we created on the column. The second query which has the first column (col2) of the index in the WHERE clause is performing a seek on the index we created and even on this small table uses fewer resources, 6 reads as compared to 4 reads, to execute this query. You can probably guess that this increase in performance only increases as the table becomes larger.
Single table query using second column of index

Single table query using first column of index

-- query 1
Table 'TABLE1'. Scan count 1, logical reads 6, physical reads 0
-- query 2
Table 'TABLE1'. Scan count 1, logical reads 4, physical reads 0

Two Table Join Query Example

For our next example let's take a look a query that has the same WHERE clause but also adds an inner join to another table. We will again have two queries. The first query has the second column (col3) of the index in the WHERE clause and the first column (col2) of the index is used to join the table. The second query has the first column (col2) of the index in the WHERE clause and second column (col3) of the index is used to join the table. Again we are using "DBCC DROPCLEANBUFFERS" to ensure the cache is empty before each query is executed. Here is the code:
DBCC DROPCLEANBUFFERS
GO
SELECT * 
  FROM dbo.TABLE1 INNER JOIN 
       dbo.TABLE2 ON dbo.TABLE1.col2 = dbo.TABLE2.col1
 WHERE dbo.TABLE1.col3=255       
GO
DBCC DROPCLEANBUFFERS
GO
SELECT * 
  FROM dbo.TABLE1 INNER JOIN 
       dbo.TABLE2 ON dbo.TABLE1.col3 = dbo.TABLE2.col1
 WHERE dbo.TABLE1.col2=255       
GO
Looking at the explain plans for these two queries we can see that when the column that the tables are joined on appears first in the index the query does an index scan of this table (as it did in the first example). The second query which has the first column of the index in the WHERE clause is performing a seek on the index we created. Again this second query uses fewer resources to complete, 6 reads as compared to the 8 reads required by the first query doing the scan. This increase in performance will also become greater as more data is added to these tables, as was the case in the previous example.
Table join using second column of index

Table join using first column of index

-- query 1
Table 'TABLE2'. Scan count 0, logical reads 2, physical reads 0
Table 'TABLE1'. Scan count 1, logical reads 6, physical reads 0

-- query 2
Table 'TABLE2'. Scan count 0, logical reads 2, physical reads 0
Table 'TABLE1'. Scan count 1, logical reads 4, physical reads 0

Summary

You can see from these simple examples above that the ordering of the columns in your index does have an impact on how any queries against the tables will behave. A best practice that I've followed that has been quite helpful when creating indexes is to make sure you are always working with the smallest result set possible. This means that your indexes should start with any columns in your WHERE clause followed by the columns in your JOIN condition. A couple more steps that also help query performance is to include any columns in an ORDER BY clause as well as making the index a covering index by including the columns in your SELECT list, this will remove the lookup step to retrieve data. One thing I mentioned earlier that I want to re-interate here is that by adding more indexes to tables as well as adding more columns to an existing index it will require more resources to update these indexes when inserting, updating and deleting data, so finding a balance is important.

No comments:

Post a Comment