Indexes and Deadlocks: Indexing on a Large table ??

I have a 1.2 million record (55 Column) table. I have an ASP page reading data from this table based on some criteria.
The criteria uses anywhere from 1 to 10 columns in the where clause depending on the user's entry on the front end.

Whats the best way to get the optimal performance for the query ??
Is it advisable to create 10 individual indexes ?First of all - what kind of database do you have - OLTP or WAREHOUSE?
If it is OLTP - I do not envy you (You have to split your database: OLTP and WAREHOUSE). If WAREHOUSE - you have to have fact table(s) and dimensions and a lot of your problems are gone. In WAREHOUSE you can have as much indexes as you want - even for every field. You could find a huge amount of tips about indexes in net. This just an example:

http://www.sql-server-performance.com/|||Snail asks a good question, other than that you have to balance your inserts versus your queries. Each index you have is an overhead on your inserts, however, if you have heavy inserts it is essential to (1) have a clustered index and (2) to have it on the right type of column to avoid page splits.|||Originally posted by snail
First of all - what kind of database do you have - OLTP or WAREHOUSE?
If it is OLTP - I do not envy you (You have to split your database: OLTP and WAREHOUSE). If WAREHOUSE - you have to have fact table(s) and dimensions and a lot of your problems are gone. In WAREHOUSE you can have as much indexes as you want - even for every field. You could find a huge amount of tips about indexes in net. This just an example:

http://www.sql-server-performance.com/

There are no transactions (Inserts/Deletes) happening on the table. Its a kind of a static table only.|||The table never gets updated ? Also, do you join to any other tables ? What are the variations of the columns in the where clause - are there consistent matchings of columns or are they totally independent ? For the 10 columns - how unique is the data ?|||If there are no indexes, updates or deletes, index the heck out of it, why not?|||Originally posted by rnealejr
The table never gets updated ? Also, do you join to any other tables ? What are the variations of the columns in the where clause - are there consistent matchings of columns or are they totally independent ? For the 10 columns - how unique is the data ?

The data in 10 columns is totally unique|||Originally posted by forXLDB
The data in 10 columns is totally unique
Just create index for every column - even if you will do search for couple or more columns - only one or two indexes are used but it is possibility to for searching by every column.|||Originally posted by snail
Just create index for every column - even if you will do search for couple or more columns - only one or two indexes are used but it is possibility to for searching by every column.

There is every possibility of using all the columns for searching.|||having an index for each field individually will have very little if any positive effect on performance if more than one field is present in the where clause. you'll need composite indexes to improve your performance, along with individual ones.|||Originally posted by ms_sql_dba
having an index for each field individually will have very little if any positive effect on performance if more than one field is present in the where clause. you'll need composite indexes to improve your performance, along with individual ones.

In that case, how do we design an indexing strategy which involves search based on anywhere from 1 to 10 columns.|||I follow this rule during indexing:

if your SQL syantax is :

Select * from Table WHERE Column1='a'
AND Column2='b'
AND Column5='c'

then you should create composite index following your where clause syantax

Composite index i_table1_1
Column1
Column2
Column5

so it is recommended that you follow a standard way of coding your SQL Syantax so as the sequence of your "WHERE" columns are almost uniform and you have less composite indexing to do.

Alternatively you could also use the index tuning wizard from your SQL Query Analyzer. Copy paste your syantax to QA, press ctrl-i and follow the instructions from there.|||Patrick, depending on the size and number of columns in the table, SQL may not choose to use that composite index. For example if you have 10 columns and there is low selectivity on the three columns, it would not make sense to go to the composite index to fetch back to the data pages where the rest of the data resides (the *)

Best thing would be to use the "set statistics IO on" command to see which indexes give you the lowest number of page reads.

HTH|||aaah..ic..thanks for the tip...:)|||"having an index for each field individually will have very little if any positive effect on performance if more than one field is present in the where clause. you'll need composite indexes to improve your performance, along with individual ones."

Not necessarilly. SQL serve can use more than one index at a time, and it you have multiple indexes set and keep good statistics then SQL server should be able to choose the most optimum combination of index and table scans. That's why its called the "optimizer".

Covered (composite) indexes aren't going to help you much here because, according to your initial post, your query could search based upon any combination of columns. You would need a composite index for each potential combination of columns.

blindman|||the meaning of the "optimizer" and the meaning of life

SQL Server DOES use more than one index if there is more than one table involved or if the same table needs to be processed more than once in the same query (in joining on itself, for example)

also, to clear it out for you, check "Index Tuning Recommendations" in bol for further explanation on index selection. you can find plenty of info there and in other topics that somewhat contradict your "contradiction" :)|||Originally posted by blindman
"having an index for each field individually will have very little if any positive effect on performance if more than one field is present in the where clause. you'll need composite indexes to improve your performance, along with individual ones."

Not necessarilly. SQL serve can use more than one index at a time, and it you have multiple indexes set and keep good statistics then SQL server should be able to choose the most optimum combination of index and table scans. That's why its called the "optimizer".

Covered (composite) indexes aren't going to help you much here because, according to your initial post, your query could search based upon any combination of columns. You would need a composite index for each potential combination of columns.

blindman

So, what does it mean...in terms of which columns to index ??|||Put an index on each of the ten columns. SQL Server may not use the index if the cardinality of the values isn't sufficient, but since you aren't doing a lot of inserts or changes on this table having an unused index won't be a big deal.

Put composite indexes on combinations of two or three columns that you think may be used frequently. (You will not need individual indexes on any columns which are the first field of a composite index.)

Make sure the statistics on this table are kept up-to-date, because the optimal query plan will likely vary from one query to the next.

blindman|||that's better, blinds, that's better :)|||Thanks for your approval, ms_sql_dba, but I don't see anything in BOL that contradicts my "contradiction".

The optimizer will frequently use more than one index, even if the table does not need to be processed more than once in the same query. This is simple to demonstrate.

Set up this table:
----------
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[IndexTest]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[IndexTest]
GO

CREATE TABLE [dbo].[IndexTest] ([ColumnA] [char] (1) NULL, [ColumnB] [char] (1) NULL)
GO

declare @.Counter int
set @.Counter = 0
while @.Counter < 100000
begin
insert into IndexTest(ColumnA, ColumnB) Values(left(newid(), 1), left(newid(), 1))
set @.Counter = @.Counter + 1
end
GO

CREATE INDEX [IndexA] ON [dbo].[IndexTest]([ColumnA]) ON [PRIMARY]
GO

CREATE INDEX [IndexB] ON [dbo].[IndexTest]([ColumnB]) ON [PRIMARY]
GO
----------

Then run this query and check the execution plan:
----------
select * from IndexTest where ColumnA = 'B' and ColumnB = 'B'
----------

You'll see the opimizer seek both indexes.

...but this sort of knowledge comes from experience, not Books Online...

blindman :cool:|||i am sorry, blinds, but i think you're not even following what you said at the beginning:

Not necessarilly. SQL serve can use more than one index at a time, and it you have multiple indexes set and keep good statistics then SQL server should be able to choose the most optimum combination of index and table scans. That's why its called the "optimizer".

Covered (composite) indexes aren't going to help you much here because, according to your initial post, your query could search based upon any combination of columns.

in your next post however, you're dancing the opposite direction (i do "see" it :) ):

Put composite indexes on combinations of two or three columns that you think may be used frequently. (You will not need individual indexes on any columns which are the first field of a composite index.)

and this is what the query plan shows (below). as you can see (ironic, isn't it, blindy?), there is only one index used, INDEXB (can you "see" it? :) )

select * from IndexTest where ColumnA = 'B' and ColumnB = 'B'
|--Filter(WHERE:([IndexTest].[ColumnA]='6'))
|--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([TEST1].[dbo].[IndexTest]))
|--Index Seek(OBJECT:([TEST1].[dbo].[IndexTest].[IndexB]), SEEK:([IndexTest].[ColumnB]='1') ORDERED FORWARD)|||Try more rows, and refresh your statistics. On my server showplan clearly indicated an index seek of both IndexA and IndexB. Also, you obviously fabricated this rather than copying and pasting it. Otherwise, why would it be seeking values '6' and '1' when you are selecting for values 'B' and 'B'?

You've got something screwy going on, Ms. Sqldba.

blindman :confused:|||OK - lets tone this down a bit. We are entitled to our "opinion" without provoking each other. That being said, what I am going to tell is absolute truth and not an "opinion" - so keep your "opinions" to yourself :-).

Anyway, basically, only you can determine which is best - composite indexes or individual indexes (or a mix). Realize that sql server's optimizer will perform an index intersection for multiple indexes - effectively acting like a composite index without the overhead of creating x number of composite indexes based on your combinations (with the potential of performance loss). So in your example, you are probably better off with creating individual indexes and let the optimizer perform the necessary index intersection ... but you will have to determine what combinations do occur and at what rate which may impact your decision.|||Originally posted by rnealejr
OK - lets tone this down a bit. We are entitled to our "opinion" without provoking each other. That being said, what I am going to tell is absolute truth and not an "opinion" - so keep your "opinions" to yourself :-).

Anyway, basically, only you can determine which is best - composite indexes or individual indexes (or a mix). Realize that sql server's optimizer will perform an index intersection for multiple indexes - effectively acting like a composite index without the overhead of creating x number of composite indexes based on your combinations (with the potential of performance loss). So in your example, you are probably better off with creating individual indexes and let the optimizer perform the necessary index intersection ... but you will have to determine what combinations do occur and at what rate which may impact your decision.

I agree with Blindman. SQL Server does look into available indexes for the query and uses it (or probably optimizes it).
I'd create individual and composite indexes depending upon the frequency of combination of columns. I need few iterations before I can give out my observations.

Thank you all for your gr8 insights. Keep it going !!|||select * from IndexTest where ColumnA = 'B' and ColumnB = 'B'

StmtText
|--Hash Match(Inner Join, HASH:([Bmk1000])=([Bmk1000]), RESIDUAL:([Bmk1000]=[Bmk1000]))
|--Index Seek(OBJECT:([dbadb].[dbo].[IndexTest].[IndexA]), SEEK:([IndexTest].[ColumnA]='B') ORDERED FORWARD)
|--Index Seek(OBJECT:([dbadb].[dbo].[IndexTest].[IndexB]), SEEK:([IndexTest].[ColumnB]='B') ORDERED FORWARD)|||Originally posted by rnealejr
OK - lets tone this down a bit. We are entitled to our "opinion" without provoking each other. That being said, what I am going to tell is absolute truth and not an "opinion" - so keep your "opinions" to yourself :-).

Anyway, basically, only you can determine which is best - composite indexes or individual indexes (or a mix). Realize that sql server's optimizer will perform an index intersection for multiple indexes - effectively acting like a composite index without the overhead of creating x number of composite indexes based on your combinations (with the potential of performance loss). So in your example, you are probably better off with creating individual indexes and let the optimizer perform the necessary index intersection ... but you will have to determine what combinations do occur and at what rate which may impact your decision.

Hold the phone...

Are you saying that Index intersection is just as effecient as going to 1, (count'em boys and girls) composite index?

is that what you're saying?

Come on, you can tell us...|||Brett,

No - Hence the statement (with the potential of performance loss)|||Originally posted by blindman
... Also, you obviously fabricated this rather than copying and pasting it. Otherwise, why would it be seeking values '6' and '1' when you are selecting for values 'B' and 'B'?

You've got something screwy going on, Ms. Sqldba.

blindman :confused:

OK, you're too quick, blindy, in occusing people of conspiracy. it was a simple paste into your select statement which i changed because in my pc the combination of 6 and 1 produced the highest number of matches.

but here is what i finally got on a 4-way machine using your exact script:

|--Hash Match(Inner Join, HASH:([Bmk1000])=([Bmk1000]), RESIDUAL:([Bmk1000]=[Bmk1000]))
|--Index Seek(OBJECT:([pubs].[dbo].[IndexTest].[IndexB]), SEEK:([IndexTest].[ColumnB]='B') ORDERED FORWARD)
|--Index Seek(OBJECT:([pubs].[dbo].[IndexTest].[IndexA]), SEEK:([IndexTest].[ColumnA]='B') ORDERED FORWARD)

Table 'IndexTest'. Scan count 2, logical reads 28, physical reads 0, read-ahead reads 0.

SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 31 ms.

in addition, the cost for index seek is equally distributed between IndexA and IndexB, both 10% each, while the cost for Hash Match/Inner Join of the 2 results is 79%. more than that, it truely appears on the surface that the optimizer is using both indexes simultaneously. however, it is not the case. the optimizer does an index seek on IndexA pages and retrieves the data that matches the criteria (i believe it doesn't even touch data pages since we're using covered queries), then it comes back to the index pages for the IndexB (remember what I was telling you earlier blindy?? "PROCESSING THE SAME TABLE TWICE!!!!) both index seeks result in a little over 6K of rows for each, 12K together. but this is not over yet :) the optimizer "JOINS" the results of both index seeks in the Hash Match/Inner Join operation that does...well, exactly what it's called, - hash-matches and inner joins producing a result of 408 records (at least on my system, results may vary since blindy is using newid() which is guaranteed to be universally unique, thus, - the experiment cannot be reproduced 100%).

but the observation is not quite ended yet. there is an additional piece of info that is worth mentioning, - IO!!! that bit clearly shows 2 (!!!) scans against the table, which is (AGAIN!!!) the proof of what i was posting earlier, - it may appear that both indexes are used together, but that appearance is obviously misleading, it all comes with a cost, blindy. and mind you we're talking hear about tables that don't exist (very unlikely) in reality. your initial recommendation of having 1 index per column may result in major performance issues for a real prod, not your 1-charachter-guid-based experiment.

on the other hand, if the initial poster were to implement a composite index (let's use blindy's IndexTest table for that and create idx_A_B index that inclused both ColumnA and ColumnB in this order), the results would be quite different:

|--Index Seek(OBJECT:([pubs].[dbo].[IndexTest].[idx_A_B]), SEEK:([IndexTest].[ColumnA]=[@.1] AND [IndexTest].[ColumnB]=[@.2]) ORDERED FORWARD)

Table 'IndexTest'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

so, are you still insisting on individual indexes vs. composite like you did in your original reply?

P.S.: you guessed ms, but you still can't figure out how to deal with it.|||I said that the optimizer would use both indexes, and when you got around to running the SQL statement I posted you confirmed this. Thanks. Obviously SQL Server thought there would be a performance improvement by using both indexes.

Your example of using a composite index will of course run faster, providing that the columns specified are known. But there are 1000 different possible combinations of the columns if just three parameters are passed, so clearly some intermediate solution is required. Composite indexes on columns that he knows will be frequently referenced. Single indexes on the others. That is what I suggested.

Have you seen a doctor about your blood pressure lately?

blindman

P.M.S. Deal with what?|||did you even see (ooops, that's hypothetic of course) the scan count??

yup, you have experience...in guessing the solutions and forcing the issues that have elegant resolve while leaving you unaware of it.

so, what you're saying is that you don't know how to go around 3-field search other than trying to come up with "some intermediate solution"? in other words, if i have a fairly static set of tables and i search by 10 different fields (also static number of fields), but yet i don't know which one of the 10 will actually have a value to search by, - your recommendation would be to create composite index on columns that he knows will be frequently reference and single indexes on the others?

i feel sorry for your users and admire their patience while waiting for their data using your "experienced" approach :)

PMS - comes and goes, your verbal diarrhea never ends. you have rectal issues, blindy :)|||When is it going to go?|||when you have the curtesy (does it come and go with sight?) to shut up|||Don't lecture me about courtesy when you can't even spell it.

"Curtesy" - The life tenure that by common law is held by a man over the property of his deceased wife if children with rights of inheritance were born during the marriage.

...or is that what you meant?|||so, like a "real man", you retain the right to lecture for yourself :)

"blindy is in the house! no lecturing is allowed!"

get a life, ...and a wife (boy do i feel sorry for her if you have one, which i doubt you do!)|||It would be quite nice to post a short conclusion for forXLDB's problem
(and for others like me who try to understand query optimizing)

so shake hands you both
and tell us the outcome of all of this|||It is not necessarily true that "a composite index is needed here, and/or that one will work faster." Quite possibly it will not.

For a query such as this, assuming an index on each column that could be referenced, I anticipate that SQL Server will use a bitmap approach (what FoxPro called Rushmore). Here's how it works:

Allocate a bitmap in memory, set entirely to ones, with one bit for each record. Now, for each query field, use the index to locate the matching records for that field. Create a second bitmap with a one for each record found. Logic-AND this bitmap with the first, and repeat this process for each field. When you are done, the (first) bitmap will now contain a one for each record that meets all of the criteria, Q.E.D.|||...and I don't event have "Curtesy" in my dictionary
(this word doesn't exists Mr Blind-hillbilly-talker)
so if I can't even improve my english while reading on indexes
its not the matter of bothering...|||http://dictionary.reference.com/search?q=curtesy|||:))

now waiting for the little "beginners-rsum" for this thread|||Where the hell was I when the techno-geek war broke out?

puuuuuleeze

ding ding ding

End of Round...go to your corners

The answer to everything...

"It Depends"

(and the other answer to everything..."I'll have another")|||Gentlemen! Please!|||Originally posted by joejcheng
Gentlemen! Please!

Thank you. Gentlebeings, always remember that while we think of ourselves as "talking" to one another, what we are actually doing is writing, and the written word carries much more impact than the spoken word. Considerably more!

We're also writing in haste, and tpyos ;) are the inevitable result. Nothing to be slammed there.

Our writing is being heard and read "around the world," by people representing dozens of cultures. We are a Company of Strangers. Friends who have never met and probably never will.

Ditto the fact that there probably does not live nor breathe any programmer-type whose opinions are not strongly felt, and just as strongly worded. (Show of hands, please? Yes, I'm counting my own hand too. Hold 'em up there just a second more... okay, thank you.)

Yup, just as I thought: we're all human. :rolleyes:

Rx: A pound of salt, a minute of reflection, and an ounce of forgiveness.

'Nuff said. :)|||I'll have another....|||me too, as long as blindy stops insulting people with his bar-style witt|||Since this has been rather going on for abit long...can u all continue
at my thread :)
http://www.dbforums.com/t974069.html

I'm abit desparate for some answers since yesterday ( always a disadvantage to post question from a diff time zone)

Hmm..the first to hijacked a thread...?|||One thing to bear in mind about "indexing large tables," specifically the question about it "taking a long time to rebuild indexes (from scratch)," the very interesting thing is that it really doesn't.

A database management system, when presented with the task of rebuilding an entire index, is able to do things that it cannot do when it is incrementally updating that index (i.e. as you insert, delete, and edit individual records). What's different is that it can make good use of sorting, which is an "unexpectedly efficient" process. You will find that it is usually very much faster to rebuild the index in toto, in appropriate situations and with B-tree based index types.

. . . . .

Nonetheless, to recap . . . the most important consideration, in any high-volume situation, is to pay extremely close attention to the algorithm: the approach you are taking; the work you are asking the server to do. Servers are programmed to accept the most cock'n'bull requests ( ;) so to speak...) if you present them ("yes, master" :rolleyes: ), but you pay the ol' Piper.

Monday, March 19, 2012

Indexing on a Large table ??

No comments:

Post a Comment

Indexes and Deadlocks

Blog Archive

About Me