Friday, March 9, 2012

Indexing

Hi
Last year I read a lot of articles on indexing and some on
index tuning.
I was often confused because the diffent sources often
contradicted each other. Because of the many different
sources I needed a single paper to consult, so I sat down
and tried to extract the information I thought was correct.
The result of this work is the Microsoft Word 2000
document attached (I have not had the time to update it
lately).
NOTE: Since i cannot attach files to this newsgroup, the
file can be found at:
http://forums.databasejournal.com/showthread.php?
s=&threadid=33908
I have soon to begin a 2 days inspection of indexes in a
database with only a minor knowlegde on the actual use of
the database (I have told my boss, that only little will
come out of this work) but I will try to give my best shot.
I would very much appreciate if any of the visitors on
this website could comment on my understanding, pinpoint
errors and give input to further work.
Kind regards
Jakob PerssonJakob,
Overall a good paper, IMO. Some comments:
"Creating a PRIMARY KEY constraint automatically creates a primary key
index, which is a specific type of unique index. According to some sources
the primary key index offers better performance than the normal unique
indexes."
According to who? A PK index is the very same as an ordinary unique index,
no performance difference.
In Appendix A, you might want to differentiate between an index which covers
a query and a query which is covered by (possibly more than one) index.
In Appendix B
"Try to avoid using NOT EXISTS". I do not agree. NOT EXISTS is just as fine
as EXISTS, and in many cases faster than a IN/OUTER JOIN.
Also, you say avoid NULL. NULL shouldn't be a problem as it is treated as a
value in the index. Do you have an example for this?
Still appendix B, you say:
a.. Microsoft recommends that SQL statements use the ANSI join syntax,
e.g.:
SELECT COL1, COL2, COL3
FROM TABLEA INNER JOIN TABLEB
ON TABLEA.FK = TABLEB.PK
Instead of the T-SQL syntax, e.g.:
SELECT COL1, COL2, COL3
FROM TABLEA, TABLEB
WHERE TABLEA.FK = TABLEB.PK
There is nothing "TSQL" about the second join type. That join type was
defined in ANSI SQL-89. So, it is really not correct to call the modern
syntax "ANSI SQL" as both syntaxes are defined in ANSI SQL - however, many
people do call the modern syntax "ANSI SQL joins". You go on to say:
The recommendation is among others, because the ANSI join syntax better
helps avoiding problems with unexpected Cartesian products. There should be
no difference performance wise between the two syntaxes.
Be aware that there is (in vast vast majority of cases, exceptions are so
rare so they can be ignored) is indeed no perf difference for inner or cross
joins. But for ANSI JOIN, then there is a semantic difference, hence also a
performance difference.
--
Tibor Karaszi, SQL Server MVP
Archive at:
http://groups.google.com/groups?oi=djq&as_ugroup=microsoft.public.sqlserver
"Jakob Persson" <jakobpersson@.yahoo.dk> wrote in message
news:0c5c01c3aacb$0c2ca000$a401280a@.phx.gbl...
> Hi
> Last year I read a lot of articles on indexing and some on
> index tuning.
> I was often confused because the diffent sources often
> contradicted each other. Because of the many different
> sources I needed a single paper to consult, so I sat down
> and tried to extract the information I thought was correct.
> The result of this work is the Microsoft Word 2000
> document attached (I have not had the time to update it
> lately).
> NOTE: Since i cannot attach files to this newsgroup, the
> file can be found at:
> http://forums.databasejournal.com/showthread.php?
> s=&threadid=33908
> I have soon to begin a 2 days inspection of indexes in a
> database with only a minor knowlegde on the actual use of
> the database (I have told my boss, that only little will
> come out of this work) but I will try to give my best shot.
> I would very much appreciate if any of the visitors on
> this website could comment on my understanding, pinpoint
> errors and give input to further work.
> Kind regards
> Jakob Persson|||Indeed, a good article. It is now in my archive :-)
My comments (some may overlap with Tibor's comments):
- From a relational database point of view, each table should have a
Primary Key. Candidate keys should be marked with a Unique constraint.
From a technical point of view, the only difference between a Primary
Key and a Unique constraint is, that a Unique constraint allows NULLs,
whereas the Primary key does not allow any of its columns to be NULL.
- When not to index? Basically there are only two disadvantages to
indexes. They occupy disk space (and when they are used memory space)
and they add overhead when rows are added, changed or deleted. So IMO
one should only not create an index when the table is heavily updated
and/or when remaining disk space is low. All the reasons you mention
(which are valid) are reasons why it is useless to create an index,
simply because SQL-Server will never use those indexes.
- Search Arguments and Good Practices. If the indexed column of a table
has a uniform distribution of values, then using NOT IN, <>, etc. will
almost never use the index. But NOT EXISTS is a completely different
matter.
However, if the value distribution is not uniform (for example, the
value 1 is in 95% of all rows), then using <>, NOT IN, etc. might very
well use the index. In itself, NOT, <> etc. are search arguments. It is
simply the selectivity and data distribution that determine whether an
index is used.
- I missed the recommendation on Foreign Key use. Although this is not
directly about indexes, it is a good practice to define Foreign Key
constraints where applicable. Not just to get a proper data model and to
guard referential integrity, but also because the Query Optimizer can
use this information and in some situations avoid reads on the
referenced table.
- I missed the subject index intersection. SQL-Server can use several
indexes of one table and combine them (join, anti-join or intersect
them). For example, if you have a query with "WHERE A=5 AND B=6", and
both column A and B are indexed, then one option for SQL-Server is to
use these indexes, intersect the results and return the associated rows.
HTH,
Gert-Jan
Jakob Persson wrote:
> Hi
> Last year I read a lot of articles on indexing and some on
> index tuning.
> I was often confused because the diffent sources often
> contradicted each other. Because of the many different
> sources I needed a single paper to consult, so I sat down
> and tried to extract the information I thought was correct.
> The result of this work is the Microsoft Word 2000
> document attached (I have not had the time to update it
> lately).
> NOTE: Since i cannot attach files to this newsgroup, the
> file can be found at:
> http://forums.databasejournal.com/showthread.php?
> s=&threadid=33908
> I have soon to begin a 2 days inspection of indexes in a
> database with only a minor knowlegde on the actual use of
> the database (I have told my boss, that only little will
> come out of this work) but I will try to give my best shot.
> I would very much appreciate if any of the visitors on
> this website could comment on my understanding, pinpoint
> errors and give input to further work.
> Kind regards
> Jakob Persson|||Thank you very much both of you for your kind replys.
I look forward getting more into them as soon as possible.
To Tibor Karaszi
(Yes!!, I got it right the fifth time I think, the "szi"
key sequence is definitely something I should practice
on ;-)
Do you by stating "But for ANSI JOIN, then there is a
semantic difference, hence also a performance difference"
mean: that the "...FROM TABLE1 LEFT/INNER JOIN TABLE2 ..."
semantic is faster than then ".. WHERE TABLE1.FK =TABLE2.PK..." method?
Yours faithfully
Jakob Persson|||Jakob,
> To Tibor Karaszi
> (Yes!!, I got it right the fifth time I think, the "szi"
> key sequence is definitely something I should practice
> on ;-)
:-) Best one I had was a package delivery from IBM to a "Peoborg Khadaffi".
> Do you by stating "But for ANSI JOIN, then there is a
> semantic difference, hence also a performance difference"
> mean: that the "...FROM TABLE1 LEFT/INNER JOIN TABLE2 ..."
> semantic is faster than then ".. WHERE TABLE1.FK => TABLE2.PK..." method?
No, for INNER joins, the perf is the same. What I mean is that below:
FROM tbl1 *= tbl2
WHERE tbl1.c = tbl2.c
does not necessarily give the same as:
FROM tbl1 LEFT JOIN tbl2 ON tbl1.c = tbl2.c
If they don't return the same result, then obviously we can't expect the same cost. The examples can
be pretty convoluted, so search the archives and also for Joe Celko's posts. Bottom line, don't do
old style outer joins, and they are also scheduled to be removed from some future version.
--
Tibor Karaszi, SQL Server MVP
Archive at: http://groups.google.com/groups?oi=djq&as_ugroup=microsoft.public.sqlserver
"Jakob Persson" <anonymous@.discussions.microsoft.com> wrote in message
news:063e01c3adac$25358330$a501280a@.phx.gbl...
> Thank you very much both of you for your kind replys.
> I look forward getting more into them as soon as possible.
> To Tibor Karaszi
> (Yes!!, I got it right the fifth time I think, the "szi"
> key sequence is definitely something I should practice
> on ;-)
> Do you by stating "But for ANSI JOIN, then there is a
> semantic difference, hence also a performance difference"
> mean: that the "...FROM TABLE1 LEFT/INNER JOIN TABLE2 ..."
> semantic is faster than then ".. WHERE TABLE1.FK => TABLE2.PK..." method?
> Yours faithfully
> Jakob Persson

No comments:

Post a Comment