Showing posts with label star. Show all posts
Showing posts with label star. Show all posts

Wednesday, March 21, 2012

Indexing Star Schemas

Hi
Can anyone add to / comment on / improve / criticise this logic for
indexing a fact table in a star schema:
Presuming a fact table with 4 dimensions:
* Brand
* Product
* Time
* Store
1. Clustered index on Time dimension
EVERY query is going to filter based on date, so this is an obvious
choice.
It is NOT a composite index for several reasons:
a) keep the record ID narrow so the index (& thus non-clustered
indexes) load into memory easier/faster.
b) common queries don't always a particular other dimension, which
would mean the index would only be useful to the Time level anyway.
c) common queries on product and store are not very selective
2. Non-clustered indexes on Foreign Keys
Create non-clustered indexes on each of the other 3 foreign keys, for
obvious reasons: Brand, Produce, Store.
3. Composite index on all dimensions
Some common queries use all directions. A complete composite index
would assist Cartesian product lookups. Create the index in order of
selectivity: Brand, Product, Store
(I'm not sure if Time would need to be explicity declared in this
index - as this is a non-clustered index, Time is included
"behind-the-scenes" as part of the RID anyway - but does it need to be
explicity declared?)
4. Only remaining index
The only remaining index would be Product, Store. This is done in
order of selectivity.
This means that any combination of 1, 2, 3 or all 4 of the dimensions
as criteria in the query will be catered for.
Does that make for a good strategy? Any comments / further
suggestions?
Thanks
Sean
It appears that you are indexing for select queries against the data
warehouse. Are you reporting directly from the data warehouse or building
cubes and reporting from them?
If you will be building cubes and reporting from them, then the indexes on
the tables should be geared toward integrity ( like the multicolumn index) ,
and the selecttions related to cube build and ETL interaction...
Wayne Snyder, MCDBA, SQL Server MVP
Mariner, Charlotte, NC
www.mariner-usa.com
(Please respond only to the newsgroups.)
I support the Professional Association of SQL Server (PASS) and it's
community of SQL Server professionals.
www.sqlpass.org
"Sean" <plugwalsh@.yahoo.com> wrote in message
news:3698af3c.0410080749.cff18d3@.posting.google.co m...
> Hi
> Can anyone add to / comment on / improve / criticise this logic for
> indexing a fact table in a star schema:
> Presuming a fact table with 4 dimensions:
> * Brand
> * Product
> * Time
> * Store
> 1. Clustered index on Time dimension
> EVERY query is going to filter based on date, so this is an obvious
> choice.
> It is NOT a composite index for several reasons:
> a) keep the record ID narrow so the index (& thus non-clustered
> indexes) load into memory easier/faster.
> b) common queries don't always a particular other dimension, which
> would mean the index would only be useful to the Time level anyway.
> c) common queries on product and store are not very selective
> 2. Non-clustered indexes on Foreign Keys
> Create non-clustered indexes on each of the other 3 foreign keys, for
> obvious reasons: Brand, Produce, Store.
> 3. Composite index on all dimensions
> Some common queries use all directions. A complete composite index
> would assist Cartesian product lookups. Create the index in order of
> selectivity: Brand, Product, Store
> (I'm not sure if Time would need to be explicity declared in this
> index - as this is a non-clustered index, Time is included
> "behind-the-scenes" as part of the RID anyway - but does it need to be
> explicity declared?)
> 4. Only remaining index
> The only remaining index would be Product, Store. This is done in
> order of selectivity.
> This means that any combination of 1, 2, 3 or all 4 of the dimensions
> as criteria in the query will be catered for.
>
> Does that make for a good strategy? Any comments / further
> suggestions?
> Thanks
> Sean
|||if you use cubes, i personally wouldn't index a single thing
it's just a waste of time.
keep the clustered on time-- it's generally really helpful
what does the index tuning wizard reccomend?
"Wayne Snyder" <wayne.nospam.snyder@.mariner-usa.com> wrote in message
news:OSmxSFSsEHA.2808@.TK2MSFTNGP14.phx.gbl...
> It appears that you are indexing for select queries against the data
> warehouse. Are you reporting directly from the data warehouse or building
> cubes and reporting from them?
> If you will be building cubes and reporting from them, then the indexes on
> the tables should be geared toward integrity ( like the multicolumn index)
,
> and the selecttions related to cube build and ETL interaction...
>
> --
> Wayne Snyder, MCDBA, SQL Server MVP
> Mariner, Charlotte, NC
> www.mariner-usa.com
> (Please respond only to the newsgroups.)
> I support the Professional Association of SQL Server (PASS) and it's
> community of SQL Server professionals.
> www.sqlpass.org
> "Sean" <plugwalsh@.yahoo.com> wrote in message
> news:3698af3c.0410080749.cff18d3@.posting.google.co m...
>

Monday, March 19, 2012

Indexing in SQL Server Star Scheme Data Warehouse

Hi all,

Our star schema design has one fact table and 3 dimensions.

The FK's in the fact do not necessarily make up the primary key. So I have an identifier in the fact table as PK. Here is my index assignment:

Fact Table - Clustered Index on PK
Non Clustered Index 1 on FK1
Non Clustered Index 2 on FK2
Non Clustered Index 3 on FK3

Each Dimension Table - Clustered Index on PK
Non Clustered Index on Attribute. This is the attribute that will be used in reports / cubes.

Is the above design good to start with?

Thanks,

VThe indexing looks fine, but as to whether this is a good design or not you only need to check my sig below to get my opinion...|||Thanks Blindman.

The one issue that we are encountering is, we didnt create a separate time dimension (a mistake in design). We have a smalldatetime field (72 distinct values only, one for each month, so 6 years in total) in the fact table.

We are not able to query this smalldate time field efficiently because we didnt index it (as it was not part of the dimension). We would like to change the design now.

We would like to create a time dimension using the following:

1. Create Time Dimension Table
2. Create new column Time_Key in Fact Table
3. Create Non clustered Index on smalldatetime field in fact table.
4. Join on smalldatetime fields in Time Dimension and Fact table to populate time_key in 2 from Time Dimension table.
5. Drop the index and column of smalldatetime field in fact and reassign non clustered index to Time_Key (FK)

Let me know how the above approach sounds to you guys.

V|||My gut feeling is that creating and dropping the temporary index on the smalldatetime column will take as long as doing a non-indexed join. Generally, indexes are only valuable because they are used more than once, so the investment involved in creating them is saved over each subsequent operation.
An index on a set of 72 discreet values may not even give you much performance boost across millions of records.|||I suppose the question I would pose to you, more than your design, would be, have you chosen the right granularity for your fact table? I haven't seen too many fact tables that stop at a monthly level unless they are being used for forecasting or budgeting purposes and even then they align to pre-existing warehouses, like a sales warehouse. My best recommendation would be to take some time and study warehousing and ensure you are providing a solution that isn't going to have to be reworked a couple of months down the road when the end users want to be able to drill down into details.

Indexing for a Dimensional Model

Hello,
I've created a dimensional model (star schema) with a central fact table wit
h a bunch of Foreign Keys to dimension tables. In Oracle there is a special
way that you index these FKs to improve performance. Is there anything lik
e that with SQL Server or i
s there no need to index the FKs in the fact table.
If you could even just point me to some information on this subject I would
appreciate it.
Thank you!
Mark McCoidTypically, you place indexes on the FK's in your fact table, since your app
will usually access the fact table via the dimension tables.
Tom
---
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON Canada
www.pinnaclepublishing.com/sql
"markmccoid" <markm@.mactive.com> wrote in message
news:F54A999E-D9F9-4A70-BF27-BE4D29B7565D@.microsoft.com...
Hello,
I've created a dimensional model (star schema) with a central fact table
with a bunch of Foreign Keys to dimension tables. In Oracle there is a
special way that you index these FKs to improve performance. Is there
anything like that with SQL Server or is there no need to index the FKs in
the fact table.
If you could even just point me to some information on this subject I would
appreciate it.
Thank you!
Mark McCoid|||markmccoid
Yes , it is a good idea to create an index on FK column to get a performance
benefit.
For more info lookup CREATE INDEX in the BOL.
"markmccoid" <markm@.mactive.com> wrote in message
news:F54A999E-D9F9-4A70-BF27-BE4D29B7565D@.microsoft.com...
> Hello,
> I've created a dimensional model (star schema) with a central fact table
with a bunch of Foreign Keys to dimension tables. In Oracle there is a
special way that you index these FKs to improve performance. Is there
anything like that with SQL Server or is there no need to index the FKs in
the fact table.
> If you could even just point me to some information on this subject I
would appreciate it.
> Thank you!
> Mark McCoid