Sunday, February 19, 2012

Indexed Views with aggregated awareness

Please correct me if I'm wrong here but what's the purpose of sql server
indexed views.. I mean I don't mean to go off on rant here and please
understand this is not a post to discredited ms sql server 2005 at all, I'm
just trying to get some clarification that's all.
ok.. with that said. from what I understand in a nut shell about 2005
indexed views is when the view is persisted ALL objects with in the view are
persisted.. which is to be expected.. what I found to my surprises is how sq
l
server seems to perform aggregated awareness only when all element with the
view are present. in other words if you present any other elements ( like a
another dimensional table join that is at the same level of aggregation that
the view is at) currently at seems to break the optimal query plan that I
would expect the optimizer to take.
example:
1. base detail table called table_detail a has 1 million records in it.
create table tbl_detail
( client_id into, invoicedate, sales money)
2. an indexed view called mv_summary is created over tbl_detail table
create view mv_summary as
select client_id , datepart(yyyy,invoicedate) as invoiceyear, sum(sales) as
sales
go
create unique clustered index cidx_yearclient on ,mv_summary(invoiceyear,
client_id)
go
3. lets say the mv_summary view is at lower level of aggregation, now the
mv_summary has a 1000 aggregated record from the detail table.
any select statements ran against the mv_summary view directly the plan runs
as expected. even if you were to query the tbl_detail at the aggregation
level of the view the plan result returns mv_summary as expected via sql
server arrogation awareness. though once you introduce an dimensional table
is at the same level of aggregation to the query ( like a dimensional join )
ok this is were this get weird.
the plan goes to the tbl_detail not the indexed view as I would expect it to
.
basically what I attempted to do was filter on an dimensional descriptive
element via join and where clause.
example :
select client_id , sum(sales) as sales
from tbl_detail
inner join client_dimension
on
tbl_detail.client_id = client_dimension.client_id
where client_dimension.client_description 'test'
turns out what I have been able to come up with is ALL elements that you
need to filter join select on HAS to been in the indexed view... which seems
to been an issue.. that mean ALL data including dimensional data has to be
persisted again...
any thoughts on how I can use indexed views in a aggregation aware
environment without have to store ALL possible filterable elements in the
view.
thanks!!!In theory it ought to work fine (ie. use the indexed view in the query
plan) IFF the optimiser determines that that's the most efficient way to
return the data (although I've had cases where querying the base table
was so quick that the optimiser decided to chose it over a corresponding
indexed view anyway and not waste it's time evaluating query plans
against the indexed view).
The T-SQL code you post looks rather incomplete. Among other things,
shouldn't there be at least a GROUP BY in your view? What I'm trying to
get at is the SUM() aggregates in the view, are they usable in your
other query or are you grouping the rows into different groups when
calculating your SUM() aggregates? I'm guessing if you summed the sales
column from your view for all invoice years for a particular client_id
you'd get the same figure as summing the sales figure for that client_id
in the base table wouldn't you? Have I correctly guessed the grouping
you've used in the view?
If that's true then perhaps the optimiser thinks it's harder to join the
materialised view data to the client_dimension table than it is to join
the base tbl_detail table to that client_dimension table. I notice the
clustered index you create on the view has the invoiceyear column 1st
(and the client_id column 2nd), which would make it rather nasty to join
to client_dimension. I assume tbl_detail & client_dimension both have
nice indexes (maybe even clustered indexes) where client_id is the 1st
column in those indexes; if so, then a join between tbl_detail and
client_dimension would probably be more efficient than between
mv_summary and client_dimension (and so the query optimiser would
probably pick a join with the base table rather than the clustered index
on the view).
I'd check the execution plans, try changing the clustered unique index
on mv_summary so that client_id is the first column in the index and
make sure the grouping in the view & the later query are "compatible".
The bottom line is, in theory, it ought to work fine but there's
probably just something a bit off with your scenario/implementation.
*mike hodgson*
http://sqlnerd.blogspot.com
Eric wrote:

>Please correct me if I'm wrong here but what's the purpose of sql server
>indexed views.. I mean I don't mean to go off on rant here and please
>understand this is not a post to discredited ms sql server 2005 at all, I'm
>just trying to get some clarification that's all.
>ok.. with that said. from what I understand in a nut shell about 2005
>indexed views is when the view is persisted ALL objects with in the view ar
e
>persisted.. which is to be expected.. what I found to my surprises is how s
ql
>server seems to perform aggregated awareness only when all element with the
>view are present. in other words if you present any other elements ( like a
>another dimensional table join that is at the same level of aggregation tha
t
>the view is at) currently at seems to break the optimal query plan that I
>would expect the optimizer to take.
>example:
>1. base detail table called table_detail a has 1 million records in it.
>
>create table tbl_detail
>( client_id into, invoicedate, sales money)
>
>2. an indexed view called mv_summary is created over tbl_detail table
>
>create view mv_summary as
>select client_id , datepart(yyyy,invoicedate) as invoiceyear, sum(sales) as
>sales
>go
>create unique clustered index cidx_yearclient on ,mv_summary(invoiceyear,
>client_id)
>go
>
>3. lets say the mv_summary view is at lower level of aggregation, now the
>mv_summary has a 1000 aggregated record from the detail table.
>any select statements ran against the mv_summary view directly the plan run
s
>as expected. even if you were to query the tbl_detail at the aggregation
>level of the view the plan result returns mv_summary as expected via sql
>server arrogation awareness. though once you introduce an dimensional table
>is at the same level of aggregation to the query ( like a dimensional join
)
>ok this is were this get weird.
>the plan goes to the tbl_detail not the indexed view as I would expect it t
o.
>basically what I attempted to do was filter on an dimensional descriptive
>element via join and where clause.
>example :
>
>select client_id , sum(sales) as sales
>from tbl_detail
>inner join client_dimension
>on
>tbl_detail.client_id = client_dimension.client_id
>where client_dimension.client_description 'test'
>
>turns out what I have been able to come up with is ALL elements that you
>need to filter join select on HAS to been in the indexed view... which seem
s
>to been an issue.. that mean ALL data including dimensional data has to be
>persisted again...
>any thoughts on how I can use indexed views in a aggregation aware
>environment without have to store ALL possible filterable elements in the
>view.
>thanks!!!
>
>|||Eric,
have you tried to change the order of columns in the clustered index to
create unique clustered index cidx_yearclient on mv_summary(client_id,
invoiceyear)
and see if that helps?|||Hi Mike--
Ya sorry bout that... not only was my code incomplete but my grammar went to
hell as well... LOL! Typing when you’re running on 2 hours of sleep in a 7
2
hour window might do that to ya...
At any rate.
You are correct in my sample code there should be a group by (and is in my
testing) somehow i forgot to place in the post.
From what I have been able to devise is... it seems like the optimizer needs
to have any of all objects within the indexed view... Not just joined to
validate the objects binding prior to using it as a valid source for
aggregation. in other words all of the element that you may want to filter o
n
HAVE to be stored within the view...right? if that’s the case you would be
better off using the view strictly has a single source and never counting on
the optimizer performing the aggregation awareness.
For example:
Lets say I wanted the optimizer to use the indexed view based on a date
range. When I create the indexed view based on the first date of the month
for each grouping
Example: tbl_detail
client id invoice_date sales
a0000001 1/3/2005 10
a0000001 1/4/2005 30
a0000001 1/5/2005 20
a0000001 1/7/2005 50
gets resolved to : mv_summary
client id invoice_date sales
a0000001 1/1/2005 110
If I were to run a query against the tbl_detail table with the invoice_date
of between 1/1/2005 and 1/31/2005 it should be intelligent enough to know
that it would be more efficient to get the pre-aggregated data from the
indexed view verses pulling it from the table then aggregating it. (
obviously the given example is small but extrapolate it by a couple hundred
million and I believe it would make a difference)
Any thoughts on what might be the best approach to use indexed view in an
date aggregation aware scenario? basically I want to have a single source (
tbl_detail) that all querys are pointed to and based on the aggregation leve
l
and date constraints would determine which alternate data source would be
used ( like an indexed view summary) oracle has a function called " date
folding" that resolves a date constraint and re-writes the date portion of
the where clause to conform to the best element constraint based on the
finite or range of the date and aggregation level of the data.
Example:
invoice_date of 1/1/2005 through 1/31/2005 at the client level would be
resolved to the month of January and then...( not in the prior example code.
.
but invoice_date would be replaced with month instead) ... it would apply th
e
month code to the month column in the summary.
Please let me know if am not making since...
Oh, BTW I tried switching the date and client in the clustered index …
unfortunately with not avail)
Thanks
Eric
"Mike Hodgson" wrote:

> In theory it ought to work fine (ie. use the indexed view in the query
> plan) IFF the optimiser determines that that's the most efficient way to
> return the data (although I've had cases where querying the base table
> was so quick that the optimiser decided to chose it over a corresponding
> indexed view anyway and not waste it's time evaluating query plans
> against the indexed view).
> The T-SQL code you post looks rather incomplete. Among other things,
> shouldn't there be at least a GROUP BY in your view? What I'm trying to
> get at is the SUM() aggregates in the view, are they usable in your
> other query or are you grouping the rows into different groups when
> calculating your SUM() aggregates? I'm guessing if you summed the sales
> column from your view for all invoice years for a particular client_id
> you'd get the same figure as summing the sales figure for that client_id
> in the base table wouldn't you? Have I correctly guessed the grouping
> you've used in the view?
> If that's true then perhaps the optimiser thinks it's harder to join the
> materialised view data to the client_dimension table than it is to join
> the base tbl_detail table to that client_dimension table. I notice the
> clustered index you create on the view has the invoiceyear column 1st
> (and the client_id column 2nd), which would make it rather nasty to join
> to client_dimension. I assume tbl_detail & client_dimension both have
> nice indexes (maybe even clustered indexes) where client_id is the 1st
> column in those indexes; if so, then a join between tbl_detail and
> client_dimension would probably be more efficient than between
> mv_summary and client_dimension (and so the query optimiser would
> probably pick a join with the base table rather than the clustered index
> on the view).
> I'd check the execution plans, try changing the clustered unique index
> on mv_summary so that client_id is the first column in the index and
> make sure the grouping in the view & the later query are "compatible".
> The bottom line is, in theory, it ought to work fine but there's
> probably just something a bit off with your scenario/implementation.
> --
> *mike hodgson*
> http://sqlnerd.blogspot.com
>
> Eric wrote:
>
>|||Hi Alexander --
Yep, unfortunately it didn't seem to change the result... It still pulled
from the detail.
Thanks
Eric
"Alexander Kuznetsov" wrote:

> Eric,
> have you tried to change the order of columns in the clustered index to
> create unique clustered index cidx_yearclient on mv_summary(client_id,
> invoiceyear)
> and see if that helps?
>|||Whats the query that you are using to access the index view?... Try
retrieving the only columns - invoiceyear, client_id in your query and see
the execution plan.. It will be fetching the results directly from the
indexed view rather than from the base table..
Jayesh
"Eric" <Eric@.discussions.microsoft.com> wrote in message
news:1B096C2E-A43C-4F32-B12A-BDF09339FC7F@.microsoft.com...
> Please correct me if I'm wrong here but what's the purpose of sql server
> indexed views.. I mean I don't mean to go off on rant here and please
> understand this is not a post to discredited ms sql server 2005 at all,
> I'm
> just trying to get some clarification that's all.
> ok.. with that said. from what I understand in a nut shell about 2005
> indexed views is when the view is persisted ALL objects with in the view
> are
> persisted.. which is to be expected.. what I found to my surprises is how
> sql
> server seems to perform aggregated awareness only when all element with
> the
> view are present. in other words if you present any other elements ( like
> a
> another dimensional table join that is at the same level of aggregation
> that
> the view is at) currently at seems to break the optimal query plan that I
> would expect the optimizer to take.
> example:
> 1. base detail table called table_detail a has 1 million records in it.
>
> create table tbl_detail
> ( client_id into, invoicedate, sales money)
>
> 2. an indexed view called mv_summary is created over tbl_detail table
>
> create view mv_summary as
> select client_id , datepart(yyyy,invoicedate) as invoiceyear, sum(sales)
> as
> sales
> go
> create unique clustered index cidx_yearclient on ,mv_summary(invoiceyear,
> client_id)
> go
>
> 3. lets say the mv_summary view is at lower level of aggregation, now the
> mv_summary has a 1000 aggregated record from the detail table.
> any select statements ran against the mv_summary view directly the plan
> runs
> as expected. even if you were to query the tbl_detail at the aggregation
> level of the view the plan result returns mv_summary as expected via sql
> server arrogation awareness. though once you introduce an dimensional
> table
> is at the same level of aggregation to the query ( like a dimensional
> join )
> ok this is were this get weird.
> the plan goes to the tbl_detail not the indexed view as I would expect it
> to.
> basically what I attempted to do was filter on an dimensional descriptive
> element via join and where clause.
> example :
>
> select client_id , sum(sales) as sales
> from tbl_detail
> inner join client_dimension
> on
> tbl_detail.client_id = client_dimension.client_id
> where client_dimension.client_description 'test'
>
> turns out what I have been able to come up with is ALL elements that you
> need to filter join select on HAS to been in the indexed view... which
> seems
> to been an issue.. that mean ALL data including dimensional data has to be
> persisted again...
> any thoughts on how I can use indexed views in a aggregation aware
> environment without have to store ALL possible filterable elements in the
> view.
> thanks!!!
>|||Ok, I have written a script to reproduce the issue I am seeing...
(Though I was able to get SQL to pull from the indexed view with a
constraint on a joining dimensional table by removing the joining table from
the indexed view...) though that does seem odd to me why that won't work.
as far as the time awareness aggregation
it seems as if the analyzer is not wanting to honor the time_dim join
against the indexed view.
in a nut shell I am attempting to get the analyzer to query against the
summary indexed view based on a date constraint.
The thought behind this is a user would be able to build a query against the
lowest level of aggregation (the detail table "tbl_detail") and based on the
level of aggregation and the date constraints it would choose the most
appropriate table of indexed view to used to resolve the result set.
(Thinking that if a user had asked for a full years worth of data at the
client level (found in the index view "mv_summary" ) it would be more
efficient to pull process maybe 12 I/Os verses 12 million I/O s ) "That woul
d
make total since to me... in fact it does do that on the 1st query in my
example: " though once I introduce time into the equalization it go after th
e
detail table every time.
hmmm... the has to be a way to do this... maybe I just not seeing it... I
just can't accept that Microsoft would release a product that would allow yo
u
to perform aggregation one way but not another... especially on one such a
useful as date...
Any thoughts?
Thanks
Eric
/* code start : */
--drop sample objects
drop view mv_summary
drop table tbl_detail
drop table time_dim
drop table client_dim
--create detail table
create table tbl_detail(
ID int identity(1,1),
invoice_date datetime,
invoice_month as CASE
WHEN cast(datepart(mm,invoice_date) as varchar(2)) < = 9 THEN
cast(datepart(yyyy,invoice_date)as varchar(4)) + '0' +
cast(datepart(mm,invoice_date)as varchar(2))
ELSE cast(datepart(yyyy,invoice_date)as varchar(4)) +
cast(datepart(mm,invoice_date) as varchar(2))
END,
month_begin_date as dateadd(month,datediff(month,0,[invoice_date]),0),
client_id varchar(10),
sales money not null)
go
--create time dimension
create table time_dim(
date_number datetime ,
month_begin_date as dateadd(month,datediff(month,0,date_numb
er),0),
month_code as CASE
WHEN cast(datepart(mm,date_number) as varchar(2)) < = 9 THEN
cast(datepart(yyyy,date_number)as varchar(4)) + '0' +
cast(datepart(mm,date_number)as varchar(2))
ELSE cast(datepart(yyyy,date_number)as varchar(4)) +
cast(datepart(mm,date_number) as varchar(2))
END)
go
--create client list
create table client_dim (client_id varchar(10), client_desc varchar(20),
usedflag bit null)
insert into client_dim(client_id,client_desc)values(
'A0000001','A0000001TEST
')
insert into client_dim(client_id,client_desc)values(
'A0000002','A0000002TEST
')
insert into client_dim(client_id,client_desc)values(
'A0000003','A0000003TEST
')
insert into client_dim(client_id,client_desc)values(
'A0000004','A0000004TEST
')
insert into client_dim(client_id,client_desc)values(
'A0000005','A0000005TEST
')
insert into client_dim(client_id,client_desc)values(
'A0000006','A0000006TEST
')
insert into client_dim(client_id,client_desc)values(
'A0000007','A0000007TEST
')
insert into client_dim(client_id,client_desc)values(
'A0000008','A0000008TEST
')
insert into client_dim(client_id,client_desc)values(
'A0000009','A0000009TEST
')
insert into client_dim(client_id,client_desc)values(
'A0000010','A0000010TEST
')
go
--create index
create index idx_client_id on client_dim (client_id)
go
--loop through each client and build a random list of data.
declare @.loopcnt int
set @.loopcnt = 0
declare @.clientloopcnt int
set @.clientloopcnt = 0
declare @.clientid varchar(10)
declare @.nextdate datetime
set @.nextdate = getdate()
--populate detail table
while @.loopcnt < 1--0000
begin
insert into tbl_detail ( invoice_date, client_id, sales)
select
cast(cast(getdate() as int) -115* rand(cast(cast(newid() as binary(8))
as int))as datetime) as invoice_date,
client_id,
cast(cast(100 as int) -115* rand(cast(cast(newid() as binary(8)) as
int))as money)
from client_dim --where usedflag is not null
set @.loopcnt = @.loopcnt + 1
print 'The loop counter is ' + cast(@.loopcnt as char)
--update the client as being complete...
set @.clientloopcnt = @.clientloopcnt + 1
end
--populate time dimension
while @.loopcnt < 365
begin
insert into time_dim (date_number)
values (@.nextdate)
set @.nextdate = @.nextdate + 1
set @.loopcnt = @.loopcnt + 1
print 'The loop counter is ' + cast(@.loopcnt as char)
--update the client as being complete...
end
go
go
--create indexed view
create view mv_summary with schemabinding
as
select a.month_begin_date, a.client_id, sum(a.sales) as sales, count_big(*)
as RC
from dbo.tbl_detail a
group by a.month_begin_date ,a.client_id
go
create unique clustered index cidx_client_invoice_date on mv_summary(
client_id, month_begin_date)
go
--this query invokes the summary as expected...
select a.month_begin_date , b.client_id , b.client_desc , sum(sales) as sale
s
from dbo.tbl_detail a
inner join dbo.client_dim b
on
a.client_id = b.client_id
where b.client_desc = 'A0000001TEST'
group by a.month_begin_date, b.client_id , b.client_desc
--this query does NOT invokes the summary... though not sure why...
select a.month_begin_date , b.client_id , b.client_desc , sum(sales) as sale
s
from dbo.tbl_detail a
inner join dbo.client_dim b
on
a.client_id = b.client_id
inner join time_dim t
on
t.month_begin_date = a.month_begin_date
where b.client_desc = 'A0000001TEST' and t.date_number between '6/1/2006'
and '6/30/2006'
group by a.month_begin_date, b.client_id , b.client_desc
/* code end: */
"Mike Hodgson" wrote:

> In theory it ought to work fine (ie. use the indexed view in the query
> plan) IFF the optimiser determines that that's the most efficient way to
> return the data (although I've had cases where querying the base table
> was so quick that the optimiser decided to chose it over a corresponding
> indexed view anyway and not waste it's time evaluating query plans
> against the indexed view).
> The T-SQL code you post looks rather incomplete. Among other things,
> shouldn't there be at least a GROUP BY in your view? What I'm trying to
> get at is the SUM() aggregates in the view, are they usable in your
> other query or are you grouping the rows into different groups when
> calculating your SUM() aggregates? I'm guessing if you summed the sales
> column from your view for all invoice years for a particular client_id
> you'd get the same figure as summing the sales figure for that client_id
> in the base table wouldn't you? Have I correctly guessed the grouping
> you've used in the view?
> If that's true then perhaps the optimiser thinks it's harder to join the
> materialised view data to the client_dimension table than it is to join
> the base tbl_detail table to that client_dimension table. I notice the
> clustered index you create on the view has the invoiceyear column 1st
> (and the client_id column 2nd), which would make it rather nasty to join
> to client_dimension. I assume tbl_detail & client_dimension both have
> nice indexes (maybe even clustered indexes) where client_id is the 1st
> column in those indexes; if so, then a join between tbl_detail and
> client_dimension would probably be more efficient than between
> mv_summary and client_dimension (and so the query optimiser would
> probably pick a join with the base table rather than the clustered index
> on the view).
> I'd check the execution plans, try changing the clustered unique index
> on mv_summary so that client_id is the first column in the index and
> make sure the grouping in the view & the later query are "compatible".
> The bottom line is, in theory, it ought to work fine but there's
> probably just something a bit off with your scenario/implementation.
> --
> *mike hodgson*
> http://sqlnerd.blogspot.com
>
> Eric wrote:
>
>|||Ok, I have written a script to reproduce the issue I am seeing...
(Though I was able to get SQL to pull from the indexed view with a
constraint on a joining dimensional table by removing the joining table from
the indexed view...) though that does seem odd to me why that won't work.
as far as the time awareness aggregation
it seems as if the analyzer is not wanting to honor the time_dim join
against the indexed view.
in a nut shell I am attempting to get the analyzer to query against the
summary indexed view based on a date constraint.
The thought behind this is a user would be able to build a query against the
lowest level of aggregation (the detail table "tbl_detail") and based on the
level of aggregation and the date constraints it would choose the most
appropriate table of indexed view to used to resolve the result set.
(Thinking that if a user had asked for a full years worth of data at the
client level (found in the index view "mv_summary" ) it would be more
efficient to pull process maybe 12 I/Os verses 12 million I/O s ) "That woul
d
make total since to me... in fact it does do that on the 1st query in my
example: " though once I introduce time into the equalization it go after th
e
detail table every time.
hmmm... the has to be a way to do this... maybe I just not seeing it... I
just can't accept that Microsoft would release a product that would allow yo
u
to perform aggregation one way but not another... especially on one such a
useful as date...
Any thoughts?
Thanks
Eric
/* code start : */
--drop sample objects
drop view mv_summary
drop table tbl_detail
drop table time_dim
drop table client_dim
--create detail table
create table tbl_detail(
ID int identity(1,1),
invoice_date datetime,
invoice_month as CASE
WHEN cast(datepart(mm,invoice_date) as varchar(2)) < = 9 THEN
cast(datepart(yyyy,invoice_date)as varchar(4)) + '0' +
cast(datepart(mm,invoice_date)as varchar(2))
ELSE cast(datepart(yyyy,invoice_date)as varchar(4)) +
cast(datepart(mm,invoice_date) as varchar(2))
END,
month_begin_date as dateadd(month,datediff(month,0,[invoice_date]),0),
client_id varchar(10),
sales money not null)
go
--create time dimension
create table time_dim(
date_number datetime ,
month_begin_date as dateadd(month,datediff(month,0,date_numb
er),0),
month_code as CASE
WHEN cast(datepart(mm,date_number) as varchar(2)) < = 9 THEN
cast(datepart(yyyy,date_number)as varchar(4)) + '0' +
cast(datepart(mm,date_number)as varchar(2))
ELSE cast(datepart(yyyy,date_number)as varchar(4)) +
cast(datepart(mm,date_number) as varchar(2))
END)
go
--create client list
create table client_dim (client_id varchar(10), client_desc varchar(20),
usedflag bit null)
insert into client_dim(client_id,client_desc)values(
'A0000001','A0000001TEST
')
insert into client_dim(client_id,client_desc)values(
'A0000002','A0000002TEST
')
insert into client_dim(client_id,client_desc)values(
'A0000003','A0000003TEST
')
insert into client_dim(client_id,client_desc)values(
'A0000004','A0000004TEST
')
insert into client_dim(client_id,client_desc)values(
'A0000005','A0000005TEST
')
insert into client_dim(client_id,client_desc)values(
'A0000006','A0000006TEST
')
insert into client_dim(client_id,client_desc)values(
'A0000007','A0000007TEST
')
insert into client_dim(client_id,client_desc)values(
'A0000008','A0000008TEST
')
insert into client_dim(client_id,client_desc)values(
'A0000009','A0000009TEST
')
insert into client_dim(client_id,client_desc)values(
'A0000010','A0000010TEST
')
go
--create index
create index idx_client_id on client_dim (client_id)
go
--loop through each client and build a random list of data.
declare @.loopcnt int
set @.loopcnt = 0
declare @.clientloopcnt int
set @.clientloopcnt = 0
declare @.clientid varchar(10)
declare @.nextdate datetime
set @.nextdate = getdate()
--populate detail table
while @.loopcnt < 1--0000
begin
insert into tbl_detail ( invoice_date, client_id, sales)
select
cast(cast(getdate() as int) -115* rand(cast(cast(newid() as binary(8))
as int))as datetime) as invoice_date,
client_id,
cast(cast(100 as int) -115* rand(cast(cast(newid() as binary(8)) as
int))as money)
from client_dim --where usedflag is not null
set @.loopcnt = @.loopcnt + 1
print 'The loop counter is ' + cast(@.loopcnt as char)
--update the client as being complete...
set @.clientloopcnt = @.clientloopcnt + 1
end
--populate time dimension
while @.loopcnt < 365
begin
insert into time_dim (date_number)
values (@.nextdate)
set @.nextdate = @.nextdate + 1
set @.loopcnt = @.loopcnt + 1
print 'The loop counter is ' + cast(@.loopcnt as char)
--update the client as being complete...
end
go
go
--create indexed view
create view mv_summary with schemabinding
as
select a.month_begin_date, a.client_id, sum(a.sales) as sales, count_big(*)
as RC
from dbo.tbl_detail a
group by a.month_begin_date ,a.client_id
go
create unique clustered index cidx_client_invoice_date on mv_summary(
client_id, month_begin_date)
go
--this query invokes the summary as expected...
select a.month_begin_date , b.client_id , b.client_desc , sum(sales) as sale
s
from dbo.tbl_detail a
inner join dbo.client_dim b
on
a.client_id = b.client_id
where b.client_desc = 'A0000001TEST'
group by a.month_begin_date, b.client_id , b.client_desc
--this query does NOT invokes the summary... though not sure why...
select a.month_begin_date , b.client_id , b.client_desc , sum(sales) as sale
s
from dbo.tbl_detail a
inner join dbo.client_dim b
on
a.client_id = b.client_id
inner join time_dim t
on
t.month_begin_date = a.month_begin_date
where b.client_desc = 'A0000001TEST' and t.date_number between '6/1/2006'
and '6/30/2006'
group by a.month_begin_date, b.client_id , b.client_desc
/* code end: */
"Mike Hodgson" wrote:

> In theory it ought to work fine (ie. use the indexed view in the query
> plan) IFF the optimiser determines that that's the most efficient way to
> return the data (although I've had cases where querying the base table
> was so quick that the optimiser decided to chose it over a corresponding
> indexed view anyway and not waste it's time evaluating query plans
> against the indexed view).
> The T-SQL code you post looks rather incomplete. Among other things,
> shouldn't there be at least a GROUP BY in your view? What I'm trying to
> get at is the SUM() aggregates in the view, are they usable in your
> other query or are you grouping the rows into different groups when
> calculating your SUM() aggregates? I'm guessing if you summed the sales
> column from your view for all invoice years for a particular client_id
> you'd get the same figure as summing the sales figure for that client_id
> in the base table wouldn't you? Have I correctly guessed the grouping
> you've used in the view?
> If that's true then perhaps the optimiser thinks it's harder to join the
> materialised view data to the client_dimension table than it is to join
> the base tbl_detail table to that client_dimension table. I notice the
> clustered index you create on the view has the invoiceyear column 1st
> (and the client_id column 2nd), which would make it rather nasty to join
> to client_dimension. I assume tbl_detail & client_dimension both have
> nice indexes (maybe even clustered indexes) where client_id is the 1st
> column in those indexes; if so, then a join between tbl_detail and
> client_dimension would probably be more efficient than between
> mv_summary and client_dimension (and so the query optimiser would
> probably pick a join with the base table rather than the clustered index
> on the view).
> I'd check the execution plans, try changing the clustered unique index
> on mv_summary so that client_id is the first column in the index and
> make sure the grouping in the view & the later query are "compatible".
> The bottom line is, in theory, it ought to work fine but there's
> probably just something a bit off with your scenario/implementation.
> --
> *mike hodgson*
> http://sqlnerd.blogspot.com
>
> Eric wrote:
>
>|||Does anyone have any ideas as to why this might be happening.
"Eric" wrote:

> Please correct me if I'm wrong here but what's the purpose of sql server
> indexed views.. I mean I don't mean to go off on rant here and please
> understand this is not a post to discredited ms sql server 2005 at all, I'
m
> just trying to get some clarification that's all.
> ok.. with that said. from what I understand in a nut shell about 2005
> indexed views is when the view is persisted ALL objects with in the view a
re
> persisted.. which is to be expected.. what I found to my surprises is how
sql
> server seems to perform aggregated awareness only when all element with th
e
> view are present. in other words if you present any other elements ( like
a
> another dimensional table join that is at the same level of aggregation th
at
> the view is at) currently at seems to break the optimal query plan that I
> would expect the optimizer to take.
> example:
> 1. base detail table called table_detail a has 1 million records in it.
>
> create table tbl_detail
> ( client_id into, invoicedate, sales money)
>
> 2. an indexed view called mv_summary is created over tbl_detail table
>
> create view mv_summary as
> select client_id , datepart(yyyy,invoicedate) as invoiceyear, sum(sales) a
s
> sales
> go
> create unique clustered index cidx_yearclient on ,mv_summary(invoiceyear,
> client_id)
> go
>
> 3. lets say the mv_summary view is at lower level of aggregation, now the
> mv_summary has a 1000 aggregated record from the detail table.
> any select statements ran against the mv_summary view directly the plan ru
ns
> as expected. even if you were to query the tbl_detail at the aggregation
> level of the view the plan result returns mv_summary as expected via sql
> server arrogation awareness. though once you introduce an dimensional tabl
e
> is at the same level of aggregation to the query ( like a dimensional join
)
> ok this is were this get weird.
> the plan goes to the tbl_detail not the indexed view as I would expect it
to.
> basically what I attempted to do was filter on an dimensional descriptive
> element via join and where clause.
> example :
>
> select client_id , sum(sales) as sales
> from tbl_detail
> inner join client_dimension
> on
> tbl_detail.client_id = client_dimension.client_id
> where client_dimension.client_description 'test'
>
> turns out what I have been able to come up with is ALL elements that you
> need to filter join select on HAS to been in the indexed view... which see
ms
> to been an issue.. that mean ALL data including dimensional data has to be
> persisted again...
> any thoughts on how I can use indexed views in a aggregation aware
> environment without have to store ALL possible filterable elements in the
> view.
> thanks!!!
>|||I have one possible reason (which is merely a guess), and a speculation.
For a simple example where the cost difference (in absolute time) is not
that big (like the one in your repro script), the optimizer might not do
a full optimize, but stop searching for better query plans when a so
called 'obvious' plan is found. This mechanism is used to avoid
'wasting' time on searching for better plans that may never be found,
and instead to start executing immediately.
The speculation is, that Microsoft might not have invested enough effort
in analyzing if an indexed view might benefit the query. In your case,
the indexed view option is simply missed, because using it makes the
query definitely more efficient. Note that indexed views introduced as
'recently' as SQL Server 2000. IMO, its introduction this was mostly a
commercial statement, claiming that Microsoft was no longer behind
Oracle and IBM (with regard to this feature).
I can't tell if there were any improvements to indexed views (or their
use) in SQL Server 2005, but I can tell that I have not seen any
documentation suggesting there were any improvements. If someone has
seen such documentation, I would be very interested.
HTH,
Gert-Jan
Eric wrote:[vbcol=seagreen]
> Does anyone have any ideas as to why this might be happening.
> "Eric" wrote:
>

No comments:

Post a Comment