Monday, March 12, 2012

indexing a table with 80 million records

i have a directory database with approx. 80 million records. i am feeding the database with bulk_insert. Indexing one of the fields took about 8 hrs. After indexing when i run queries with the indexed field the response time is under 1 sec. However if i run select queries with like on non-indexed fields it takes more than 2 mins. So i decided to index 4 other fields in the database and it looks like the indexing process is going to run for 2 days.
i am a novice in SQL database design and i am not sure if this is the best way to index the table. i am just using create index. Any suggestions / advice welcome.However if i run select queries with like on non-indexed fields it takes more than 2 mins.

if you query a table with a leading wildcard % in the like clause you will more than likely receive a table scan.

indexing is just part of the query process.
only create indexes on columns you will be querying
consider a fill factor with each index (depends on the transactional activity)
you can create composite (multi columned) indexes to increase performance and cover certain queries.|||Originally posted by Ruprect
if you query a table with a leading wildcard % in the like clause you will more than likely receive a table scan.

indexing is just part of the query process.
only create indexes on columns you will be querying
consider a fill factor with each index (depends on the transactional activity)
you can create composite (multi columned) indexes to increase performance and cover certain queries.

Try Creating a none_clustered index on all foreign Keys. And remember to create a Clustered index on the Columns that are queried most frequently. Make sure you use a Where clause in your query instead of using select * this will take a longer time in retrieving data|||clustered indexes arent always the best solution.
in many cases non-clustered indexes can give better performance that clustered due to the amount of io required to run the query.

The leaf level of the clustered index contains the actual data pages that make up the table, so the pages contain all of the columns and data from the table.
in a nonclustered index the leaf level contain only the column data that you created the index on and each row has a pointerthe actual data pages that contain the entire row

"covering" a query is when you have the nonclustered indexes created in such a way as to retrieve all your data from the leaf level of a nonclustered index and are not required to continue to the data pages. this avoids possible disk navigation

TABLE
fname Lname Mi Email

INDEX
1 nclustered on Fname and Lname

QUERY
select Fname, Lname from Table
where lname = 'smith'

in this case SQL Server never has to retrieve the rows from the data pages directly. it can stop at the index leaf level because the query is satisfied there.

you can see this in your graphical execution plan displayed in the popup information as "Scanning a non-clustered index entirely or only a range"

I suggest you check some of the indexing writings in Kalen Delaney's Book "Inside SQL Server 2000" or check our some of her indexing articles at www.SQLMag.com
and as usual read [BOL]|||thanks for the inputs. i did go through some of the docs available online on indexing and created a clustered index on one of my fields namely the state. the reason being that it is most frequently used and it has a max of 50 values.
i created a non-clustered index on the zip and phone.
with this setup i am able to get response times under a second most of the times. the application is not going to post any complex queries so we are within acceptable limits.|||http://www.sql-server-performance.com/gv_index_data_structures.asp about non-clustered importance.

The other option is to run PROFILER during the execution of queries and take trace to index tuning wizard for any recommendation on indexes.

No comments:

Post a Comment