Wednesday, March 7, 2012

Indexes and UniqueIdentifier Fields

I have a table that contains articles (as in, newspaper articles, blog
articles, whatever). I need to use a column of type uniqueidentifier
because one of the requirements is that I be able to write the articles
out to XML or import them from XML, and references (as in, "for more
info read this: 2323-232-90934" have to still work after the export and
import).

So as a minimum, the table is going to look like this:

CREATE TABLE Articles (
ArticleID uniqueidentifier,
PublishDate datetime,
Title nvarchar (50)
ArticleContent ntext
)
GO

ALTER TABLE Articles ADD
CONSTRAINT PK_Articles
PRIMARY KEY NONCLUSTERED (ArticleID)
WITH FILLFACTOR = 100
GO

As you can see, I'm not going to use a clustered index on a column of
type UniqueIdentifier. I got that much from this newsgroup and from
websites on sql server performance tuning.

Two questions. 1: I will obviously need to list recent articles. I'll
need to do: select top 10 ArticleID, PublishDate, Title from Articles
order by PublishDate desc

Will there be any problem with an index on a datetime field to make
that query faster?

CREATE UNIQUE CLUSTERED INDEX IX_Articles_PublishDate
ON Articles (PublishDate DESC)
WITH FILLFACTOR = 100
GO

Question 2: Is there anything else that I can do here that I'm missing?
Should I maybe also have a auto-increment field and put the clustered
index on it instead?

Thanks in advance
chrischristopher.secord@.gmail.com (christopher.secord@.gmail.com) writes:
> So as a minimum, the table is going to look like this:
> CREATE TABLE Articles (
> ArticleID uniqueidentifier,
> PublishDate datetime,
> Title nvarchar (50)
> ArticleContent ntext
> )
> GO
> ALTER TABLE Articles ADD
> CONSTRAINT PK_Articles
> PRIMARY KEY NONCLUSTERED (ArticleID)
> WITH FILLFACTOR = 100
> GO
> As you can see, I'm not going to use a clustered index on a column of
> type UniqueIdentifier. I got that much from this newsgroup and from
> websites on sql server performance tuning.

Certainly clustered index on Uniqueidentifier and a fillfactor of 100
is a recipe for disaster. However, SQL Server MVP Greg Linwood pointed
out to me that with a low fill factor, uniqueidentifier very can be a
good choice for a clustered index. New rows will be inserted in existing
gaps. But this requires that you reindex when the gaps are starting
to run out.

> Two questions. 1: I will obviously need to list recent articles. I'll
> need to do: select top 10 ArticleID, PublishDate, Title from Articles
> order by PublishDate desc
> Will there be any problem with an index on a datetime field to make
> that query faster?
> CREATE UNIQUE CLUSTERED INDEX IX_Articles_PublishDate
> ON Articles (PublishDate DESC)
> WITH FILLFACTOR = 100
> GO

PublishDate could indeed be a good choice for the clustered index,
but I don't think you should make it unique. Surely, you must be able
to store two articles published the same day! (I'm assuming that you
will keep the time portion to midnight.)

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Thanks for the advice. I've decided to add a column of type int
(indentity) and make that the clustered index. The result will be the
same as what I was looking for in indexing publishdate, but it's on an
simple integer field.

chris|||christopher.secord@.gmail.com (christopher.secord@.gmail.com) writes:
> Thanks for the advice. I've decided to add a column of type int
> (indentity) and make that the clustered index. The result will be the
> same as what I was looking for in indexing publishdate, but it's on an
> simple integer field.

Say that you have a query:

SELECT * FROM tbl WHERE publishdate BETWEEN '20050601' AND '20050630'

If you have a clustered index on publishdate, SQL Server will do a
clustered index seek in this interval and find the rows quickly. If you
cluster on the identity column instead, SQL Server will have to scan
the entire table. If you add a non-clustered index on publishdate, SQL
Server may use that index, but if the interval is too wide, it will estimate
that nc-index + bookmark lookup is too expensive, and scan the table
nevertheless.

This is because, there is no way that SQL Server can assert that the
date follows the identity column.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Have you thought about getting a textbase (document management tool) to
do this? SQL is not the answer to everything.

No comments:

Post a Comment