Wednesday, March 7, 2012

Indexes design

If I have a table with Col1 + Col2 as PK. But most of the queries use Col1, Col2, Col3 and Col4 in the queries, should I create another unique index on this combination? If not, my queries do an index scan. Just wondering is there a general rule for this kind of situation as I have a lot of such cases in my database.depends. most of the time multi column indexes are not a good idea. if i remember correctly for multi column indexes to work at all, the query has to search the columns in the table in the order that you defined the columns in the index.

index scans are ok, it is table scans you have to worry about.

are you experiencing a performance issue?|||Thrasymachus- I agree, we have to include all the columns in the index in the where clause to avoid a index scan. But the table is huge and yes, I am experiencing performance issues. I know I could make them run better with a seek.|||what's huge? tens of millions of records?

what's the ddl like? lots of fields? big fields?

And if you go ahead with this index, remember the order inwhich you search has to match the order of index definition.

What's the query look like?|||3 million records. Table has 9 columns with 2 varchar(512) and a UniqueIdentifier.

Not sure what you mean by "And if you go ahead with this index, remember the order inwhich you search has to match the order of index definition."|||3 million is not that big.

If your index is created like so

CREATE INDEX MyIndex On Table1 (col1,col2,col3)

Then your query should go like

SELECT *
FROM Table1
Where col1 = @.param1
AND col2 = @.param2
AND col3 = @.param3

same order, see.

ALSO I JUST REMEMBERED, the most selective column in the index definition should come first. That is col1 in the index definintion should have the highest number of distinct values.|||I am not sure if the order of the columns in the index and the query really matters. QO is smart enough to re-phrase your query to use the index. But if your query's where clause is on col1 and col3 while your index is on (col1, col2 and col3) then there will be a scan versus a seek.|||I develop in sql 2k now, but I just looked at my old design book for sql 7 I have laying around to make sure and this part I did confirm. things might have changed. The other thing I remeber from an ex-colleague of mine who was the sql guru I ever knew.

"ALSO I JUST REMEMBERED, the most selective column in the index definition should come first. That is col1 in the index definintion should have the highest number of distinct values."|||I am not sure if the order of the columns in the index and the query really matters. QO is smart enough to re-phrase your query to use the index. But if your query's where clause is on col1 and col3 while your index is on (col1, col2 and col3) then there will be a scan versus a seek.

The notion that the WHERE clause search arguments have to be in the "same order " as the columns defined in the key of the index is rubbish.

Also, the term "index scan" to me means examining EVERY row of an index because to find a match on a search criteria. In your example, "where clause is on col1 and col3 while your index is on (col1, col2, col3)", given that you don't have any datatype mismatch issues, and given that the index is chosen as the access path, and given that the search argument operator is "=", and given that the index is large enough, then you will KEY POSITION on col1, and scan the rest of the index for col3 matches. So, every row of the index containing col1 value is "scanned", but no more than that.|||KSherlock- I agree with you. I have the same understanding as yours. We got side tracked into this discussion about column order. My original question is a little different. Still looking for some comments/suggestions on it.|||As for this part:

"ALSO I JUST REMEMBERED, the most selective column in the index definition should come first. That is col1 in the index definintion should have the highest number of distinct values."

I will refer you to Database Design on SQL Server 7 by certification insider press page 173.

As for the other thing. This kind of backs me up (taken from http://www.sql-server-performance.com/composite_indexes.asp). I have forgot a little of it:

"A composite index is generally only useful to a query if the WHERE clause of the query matches the column(s) that are leftmost in the index. So if you create a composite index, such as "City, State", then a query such as "WHERE City = 'Springfield'" will use the index, but the query "WHERE STATE = 'MO'" will not use the index. [6.5, 7.0, 2000] Updated 11-15-2004"

So heres mud in your eye Sherlock.|||Check me if I have this straight:

Col1 and Col2 make up the PK

And most queries use all 4 Col1 Col2 Col3 and Col4? Or is that "or"? After all, if you specify col1 and col2, specifying col3 and col4 would be a bit redundant, eh?

If you specify values for Col1 and Col2, you should be seeking on the PK. Are you using functions on the columns? Like

where upper(col1) = upper(some search argument)

The above will cause a table scan every time. Since the PK is usually clustered, you may be seeing this just represented as a clustered index scan. It is really the same thing.|||And if you go ahead with this index, remember the order inwhich you search has to match the order of index definition

...

"A composite index is generally only useful to a query if the WHERE clause of the query matches the column(s) that are leftmost in the index. So if you create a composite index, such as "City, State", then a query such as "WHERE City = 'Springfield'" will use the index, but the query "WHERE STATE = 'MO'" will not use the index. [6.5, 7.0, 2000] Updated 11-15-2004"

So heres mud in your eye Sherlock.

Uhhh, yea. Whatever you say...|||...most of the time multi column indexes are not a good idea...Huh? Please present your theories as such, not as a "proven fact", man. You want the guy to loose his job just because you couldn't resist making a statement?

...ALSO I JUST REMEMBERED, the most selective column in the index definition should come first. That is col1 in the index definintion should have the highest number of distinct values...Good point, but ... how should I say it? It's orthodox concept for a table that is NEVER a part of an action query. Introduction of indexes based on "best practices for SELECT queries" ONLY, - is a recepie for your action queries as a potential bottleneck.

No comments:

Post a Comment