Sunday, February 19, 2012

Indexed Views with aggregated awareness

Please correct me if I'm wrong here but what's the purpose of sql server
indexed views.. I mean I don't mean to go off on rant here and please
understand this is not a post to discredited ms sql server 2005 at all, I'm
just trying to get some clarification that's all.
ok.. with that said. from what I understand in a nut shell about 2005
indexed views is when the view is persisted ALL objects with in the view are
persisted.. which is to be expected.. what I found to my surprises is how sql
server seems to perform aggregated awareness only when all element with the
view are present. in other words if you present any other elements ( like a
another dimensional table join that is at the same level of aggregation that
the view is at) currently at seems to break the optimal query plan that I
would expect the optimizer to take.
example:
1. base detail table called table_detail a has 1 million records in it.
create table tbl_detail
( client_id into, invoicedate, sales money)
2. an indexed view called mv_summary is created over tbl_detail table
create view mv_summary as
select client_id , datepart(yyyy,invoicedate) as invoiceyear, sum(sales) as
sales
go
create unique clustered index cidx_yearclient on ,mv_summary(invoiceyear,
client_id)
go
3. lets say the mv_summary view is at lower level of aggregation, now the
mv_summary has a 1000 aggregated record from the detail table.
any select statements ran against the mv_summary view directly the plan runs
as expected. even if you were to query the tbl_detail at the aggregation
level of the view the plan result returns mv_summary as expected via sql
server arrogation awareness. though once you introduce an dimensional table
is at the same level of aggregation to the query ( like a dimensional join )
ok this is were this get weird.
the plan goes to the tbl_detail not the indexed view as I would expect it to.
basically what I attempted to do was filter on an dimensional descriptive
element via join and where clause.
example :
select client_id , sum(sales) as sales
from tbl_detail
inner join client_dimension
on
tbl_detail.client_id = client_dimension.client_id
where client_dimension.client_description 'test'
turns out what I have been able to come up with is ALL elements that you
need to filter join select on HAS to been in the indexed view... which seems
to been an issue.. that mean ALL data including dimensional data has to be
persisted again...
any thoughts on how I can use indexed views in a aggregation aware
environment without have to store ALL possible filterable elements in the
view.
thanks!!!This is a multi-part message in MIME format.
--060206050409080602000404
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
In theory it ought to work fine (ie. use the indexed view in the query
plan) IFF the optimiser determines that that's the most efficient way to
return the data (although I've had cases where querying the base table
was so quick that the optimiser decided to chose it over a corresponding
indexed view anyway and not waste it's time evaluating query plans
against the indexed view).
The T-SQL code you post looks rather incomplete. Among other things,
shouldn't there be at least a GROUP BY in your view? What I'm trying to
get at is the SUM() aggregates in the view, are they usable in your
other query or are you grouping the rows into different groups when
calculating your SUM() aggregates? I'm guessing if you summed the sales
column from your view for all invoice years for a particular client_id
you'd get the same figure as summing the sales figure for that client_id
in the base table wouldn't you? Have I correctly guessed the grouping
you've used in the view?
If that's true then perhaps the optimiser thinks it's harder to join the
materialised view data to the client_dimension table than it is to join
the base tbl_detail table to that client_dimension table. I notice the
clustered index you create on the view has the invoiceyear column 1st
(and the client_id column 2nd), which would make it rather nasty to join
to client_dimension. I assume tbl_detail & client_dimension both have
nice indexes (maybe even clustered indexes) where client_id is the 1st
column in those indexes; if so, then a join between tbl_detail and
client_dimension would probably be more efficient than between
mv_summary and client_dimension (and so the query optimiser would
probably pick a join with the base table rather than the clustered index
on the view).
I'd check the execution plans, try changing the clustered unique index
on mv_summary so that client_id is the first column in the index and
make sure the grouping in the view & the later query are "compatible".
The bottom line is, in theory, it ought to work fine but there's
probably just something a bit off with your scenario/implementation.
--
*mike hodgson*
http://sqlnerd.blogspot.com
Eric wrote:
>Please correct me if I'm wrong here but what's the purpose of sql server
>indexed views.. I mean I don't mean to go off on rant here and please
>understand this is not a post to discredited ms sql server 2005 at all, I'm
>just trying to get some clarification that's all.
>ok.. with that said. from what I understand in a nut shell about 2005
>indexed views is when the view is persisted ALL objects with in the view are
>persisted.. which is to be expected.. what I found to my surprises is how sql
>server seems to perform aggregated awareness only when all element with the
>view are present. in other words if you present any other elements ( like a
>another dimensional table join that is at the same level of aggregation that
>the view is at) currently at seems to break the optimal query plan that I
>would expect the optimizer to take.
>example:
>1. base detail table called table_detail a has 1 million records in it.
>
>create table tbl_detail
>( client_id into, invoicedate, sales money)
>
>2. an indexed view called mv_summary is created over tbl_detail table
>
>create view mv_summary as
>select client_id , datepart(yyyy,invoicedate) as invoiceyear, sum(sales) as
>sales
>go
>create unique clustered index cidx_yearclient on ,mv_summary(invoiceyear,
>client_id)
>go
>
>3. lets say the mv_summary view is at lower level of aggregation, now the
>mv_summary has a 1000 aggregated record from the detail table.
>any select statements ran against the mv_summary view directly the plan runs
>as expected. even if you were to query the tbl_detail at the aggregation
>level of the view the plan result returns mv_summary as expected via sql
>server arrogation awareness. though once you introduce an dimensional table
>is at the same level of aggregation to the query ( like a dimensional join )
>ok this is were this get weird.
>the plan goes to the tbl_detail not the indexed view as I would expect it to.
>basically what I attempted to do was filter on an dimensional descriptive
>element via join and where clause.
>example :
>
>select client_id , sum(sales) as sales
>from tbl_detail
>inner join client_dimension
>on
>tbl_detail.client_id = client_dimension.client_id
>where client_dimension.client_description 'test'
>
>turns out what I have been able to come up with is ALL elements that you
>need to filter join select on HAS to been in the indexed view... which seems
>to been an issue.. that mean ALL data including dimensional data has to be
>persisted again...
>any thoughts on how I can use indexed views in a aggregation aware
>environment without have to store ALL possible filterable elements in the
>view.
>thanks!!!
>
>
--060206050409080602000404
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 8bit
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<tt>In theory it ought to work fine (ie. use the indexed view in the
query plan) IFF the optimiser determines that that's the most efficient
way to return the data (although I've had cases where querying the base
table was so quick that the optimiser decided to chose it over a
corresponding indexed view anyway and not waste it's time evaluating
query plans against the indexed view).<br>
<br>
The T-SQL code you post looks rather incomplete. Among other things,
shouldn't there be at least a GROUP BY in your view? What I'm trying
to get at is the SUM() aggregates in the view, are they usable in your
other query or are you grouping the rows into different groups when
calculating your SUM() aggregates? I'm guessing if you summed the
sales column from your view for all invoice years for a particular
client_id you'd get the same figure as summing the sales figure for
that client_id in the base table wouldn't you? Have I correctly
guessed the grouping you've used in the view?<br>
<br>
If that's true then perhaps the optimiser thinks it's harder to join
the materialised view data to the client_dimension table than it is to
join the base tbl_detail table to that client_dimension table. I
notice the clustered index you create on the view has the invoiceyear
column 1st (and the client_id column 2nd), which would make it rather
nasty to join to client_dimension. I assume tbl_detail &
client_dimension both have nice indexes (maybe even clustered indexes)
where client_id is the 1st column in those indexes; if so, then a join
between tbl_detail and client_dimension would probably be more
efficient than between mv_summary and client_dimension (and so the
query optimiser would probably pick a join with the base table rather
than the clustered index on the view).<br>
<br>
I'd check the execution plans, try changing the clustered unique index
on mv_summary so that client_id is the first column in the index and
make sure the grouping in the view & the later query are
"compatible". The bottom line is, in theory, it ought to work fine but
there's probably just something a bit off with your
scenario/implementation.<br>
</tt>
<div class="moz-signature">
<title></title>
<meta http-equiv="Content-Type" content="text/html; ">
<p><span lang="en-au"><font face="Tahoma" size="2">--<br>
</font></span> <b><span lang="en-au"><font face="Tahoma" size="2">mike
hodgson</font></span></b><span lang="en-au"><br>
<font face="Tahoma" size="2"><a href="http://links.10026.com/?link=http://sqlnerd.blogspot.com</a></font></span>">http://sqlnerd.blogspot.com">http://sqlnerd.blogspot.com</a></font></span>
</p>
</div>
<br>
<br>
Eric wrote:
<blockquote cite="mid1B096C2E-A43C-4F32-B12A-BDF09339FC7F@.microsoft.com"
type="cite">
<pre wrap="">Please correct me if I'm wrong here but what's the purpose of sql server
indexed views.. I mean I don't mean to go off on rant here and please
understand this is not a post to discredited ms sql server 2005 at all, I'm
just trying to get some clarification that's all.
ok.. with that said. from what I understand in a nut shell about 2005
indexed views is when the view is persisted ALL objects with in the view are
persisted.. which is to be expected.. what I found to my surprises is how sql
server seems to perform aggregated awareness only when all element with the
view are present. in other words if you present any other elements ( like a
another dimensional table join that is at the same level of aggregation that
the view is at) currently at seems to break the optimal query plan that I
would expect the optimizer to take.
example:
1. base detail table called table_detail a has 1 million records in it.
create table tbl_detail
( client_id into, invoicedate, sales money)
2. an indexed view called mv_summary is created over tbl_detail table
create view mv_summary as
select client_id , datepart(yyyy,invoicedate) as invoiceyear, sum(sales) as
sales
go
create unique clustered index cidx_yearclient on ,mv_summary(invoiceyear,
client_id)
go
3. lets say the mv_summary view is at lower level of aggregation, now the
mv_summary has a 1000 aggregated record from the detail table.
any select statements ran against the mv_summary view directly the plan runs
as expected. even if you were to query the tbl_detail at the aggregation
level of the view the plan result returns mv_summary as expected via sql
server arrogation awareness. though once you introduce an dimensional table
is at the same level of aggregation to the query ( like a dimensional join )
ok this is were this get weird.
the plan goes to the tbl_detail not the indexed view as I would expect it to.
basically what I attempted to do was filter on an dimensional descriptive
element via join and where clause.
example :
select client_id , sum(sales) as sales
from tbl_detail
inner join client_dimension
on
tbl_detail.client_id = client_dimension.client_id
where client_dimension.client_description 'test'
turns out what I have been able to come up with is ALL elements that you
need to filter join select on HAS to been in the indexed view... which seems
to been an issue.. that mean ALL data including dimensional data has to be
persisted again...
any thoughts on how I can use indexed views in a aggregation aware
environment without have to store ALL possible filterable elements in the
view.
thanks!!!
</pre>
</blockquote>
</body>
</html>
--060206050409080602000404--|||Eric,
have you tried to change the order of columns in the clustered index to
create unique clustered index cidx_yearclient on mv_summary(client_id,
invoiceyear)
and see if that helps?|||Hi Mike--
Ya sorry bout that... not only was my code incomplete but my grammar went to
hell as well... LOL! Typing when youâ're running on 2 hours of sleep in a 72
hour window might do that to ya...
At any rate.
You are correct in my sample code there should be a group by (and is in my
testing) somehow i forgot to place in the post.
From what I have been able to devise is... it seems like the optimizer needs
to have any of all objects within the indexed view... Not just joined to
validate the objects binding prior to using it as a valid source for
aggregation. in other words all of the element that you may want to filter on
HAVE to be stored within the view...right? if thatâ's the case you would be
better off using the view strictly has a single source and never counting on
the optimizer performing the aggregation awareness.
For example:
Lets say I wanted the optimizer to use the indexed view based on a date
range. When I create the indexed view based on the first date of the month
for each grouping
Example: tbl_detail
client id invoice_date sales
a0000001 1/3/2005 10
a0000001 1/4/2005 30
a0000001 1/5/2005 20
a0000001 1/7/2005 50
gets resolved to : mv_summary
client id invoice_date sales
a0000001 1/1/2005 110
If I were to run a query against the tbl_detail table with the invoice_date
of between 1/1/2005 and 1/31/2005 it should be intelligent enough to know
that it would be more efficient to get the pre-aggregated data from the
indexed view verses pulling it from the table then aggregating it. (
obviously the given example is small but extrapolate it by a couple hundred
million and I believe it would make a difference)
Any thoughts on what might be the best approach to use indexed view in an
date aggregation aware scenario? basically I want to have a single source (
tbl_detail) that all querys are pointed to and based on the aggregation level
and date constraints would determine which alternate data source would be
used ( like an indexed view summary) oracle has a function called " date
folding" that resolves a date constraint and re-writes the date portion of
the where clause to conform to the best element constraint based on the
finite or range of the date and aggregation level of the data.
Example:
invoice_date of 1/1/2005 through 1/31/2005 at the client level would be
resolved to the month of January and then...( not in the prior example code..
but invoice_date would be replaced with month instead) ... it would apply the
month code to the month column in the summary.
Please let me know if am not making since... :)
Oh, BTW I tried switching the date and client in the clustered index â?¦
unfortunately with not avail)
Thanks
Eric
"Mike Hodgson" wrote:
> In theory it ought to work fine (ie. use the indexed view in the query
> plan) IFF the optimiser determines that that's the most efficient way to
> return the data (although I've had cases where querying the base table
> was so quick that the optimiser decided to chose it over a corresponding
> indexed view anyway and not waste it's time evaluating query plans
> against the indexed view).
> The T-SQL code you post looks rather incomplete. Among other things,
> shouldn't there be at least a GROUP BY in your view? What I'm trying to
> get at is the SUM() aggregates in the view, are they usable in your
> other query or are you grouping the rows into different groups when
> calculating your SUM() aggregates? I'm guessing if you summed the sales
> column from your view for all invoice years for a particular client_id
> you'd get the same figure as summing the sales figure for that client_id
> in the base table wouldn't you? Have I correctly guessed the grouping
> you've used in the view?
> If that's true then perhaps the optimiser thinks it's harder to join the
> materialised view data to the client_dimension table than it is to join
> the base tbl_detail table to that client_dimension table. I notice the
> clustered index you create on the view has the invoiceyear column 1st
> (and the client_id column 2nd), which would make it rather nasty to join
> to client_dimension. I assume tbl_detail & client_dimension both have
> nice indexes (maybe even clustered indexes) where client_id is the 1st
> column in those indexes; if so, then a join between tbl_detail and
> client_dimension would probably be more efficient than between
> mv_summary and client_dimension (and so the query optimiser would
> probably pick a join with the base table rather than the clustered index
> on the view).
> I'd check the execution plans, try changing the clustered unique index
> on mv_summary so that client_id is the first column in the index and
> make sure the grouping in the view & the later query are "compatible".
> The bottom line is, in theory, it ought to work fine but there's
> probably just something a bit off with your scenario/implementation.
> --
> *mike hodgson*
> http://sqlnerd.blogspot.com
>
> Eric wrote:
> >Please correct me if I'm wrong here but what's the purpose of sql server
> >indexed views.. I mean I don't mean to go off on rant here and please
> >understand this is not a post to discredited ms sql server 2005 at all, I'm
> >just trying to get some clarification that's all.
> >
> >ok.. with that said. from what I understand in a nut shell about 2005
> >indexed views is when the view is persisted ALL objects with in the view are
> >persisted.. which is to be expected.. what I found to my surprises is how sql
> >server seems to perform aggregated awareness only when all element with the
> >view are present. in other words if you present any other elements ( like a
> >another dimensional table join that is at the same level of aggregation that
> >the view is at) currently at seems to break the optimal query plan that I
> >would expect the optimizer to take.
> >
> >example:
> >
> >1. base detail table called table_detail a has 1 million records in it.
> >
> >
> >create table tbl_detail
> >( client_id into, invoicedate, sales money)
> >
> >
> >
> >2. an indexed view called mv_summary is created over tbl_detail table
> >
> >
> >create view mv_summary as
> >select client_id , datepart(yyyy,invoicedate) as invoiceyear, sum(sales) as
> >sales
> >go
> >create unique clustered index cidx_yearclient on ,mv_summary(invoiceyear,
> >client_id)
> >go
> >
> >
> >3. lets say the mv_summary view is at lower level of aggregation, now the
> >mv_summary has a 1000 aggregated record from the detail table.
> >any select statements ran against the mv_summary view directly the plan runs
> >as expected. even if you were to query the tbl_detail at the aggregation
> >level of the view the plan result returns mv_summary as expected via sql
> >server arrogation awareness. though once you introduce an dimensional table
> >is at the same level of aggregation to the query ( like a dimensional join )
> >ok this is were this get weird.
> >the plan goes to the tbl_detail not the indexed view as I would expect it to.
> >basically what I attempted to do was filter on an dimensional descriptive
> >element via join and where clause.
> >
> >example :
> >
> >
> >select client_id , sum(sales) as sales
> >from tbl_detail
> >inner join client_dimension
> >on
> >tbl_detail.client_id = client_dimension.client_id
> >where client_dimension.client_description 'test'
> >
> >
> >
> >turns out what I have been able to come up with is ALL elements that you
> >need to filter join select on HAS to been in the indexed view... which seems
> >to been an issue.. that mean ALL data including dimensional data has to be
> >persisted again...
> >
> >any thoughts on how I can use indexed views in a aggregation aware
> >environment without have to store ALL possible filterable elements in the
> >view.
> >
> >thanks!!!
> >
> >
> >
> >
>|||Hi Alexander --
Yep, unfortunately it didn't seem to change the result... It still pulled
from the detail.
Thanks
Eric
"Alexander Kuznetsov" wrote:
> Eric,
> have you tried to change the order of columns in the clustered index to
> create unique clustered index cidx_yearclient on mv_summary(client_id,
> invoiceyear)
> and see if that helps?
>|||Whats the query that you are using to access the index view?... Try
retrieving the only columns - invoiceyear, client_id in your query and see
the execution plan.. It will be fetching the results directly from the
indexed view rather than from the base table..
Jayesh
"Eric" <Eric@.discussions.microsoft.com> wrote in message
news:1B096C2E-A43C-4F32-B12A-BDF09339FC7F@.microsoft.com...
> Please correct me if I'm wrong here but what's the purpose of sql server
> indexed views.. I mean I don't mean to go off on rant here and please
> understand this is not a post to discredited ms sql server 2005 at all,
> I'm
> just trying to get some clarification that's all.
> ok.. with that said. from what I understand in a nut shell about 2005
> indexed views is when the view is persisted ALL objects with in the view
> are
> persisted.. which is to be expected.. what I found to my surprises is how
> sql
> server seems to perform aggregated awareness only when all element with
> the
> view are present. in other words if you present any other elements ( like
> a
> another dimensional table join that is at the same level of aggregation
> that
> the view is at) currently at seems to break the optimal query plan that I
> would expect the optimizer to take.
> example:
> 1. base detail table called table_detail a has 1 million records in it.
>
> create table tbl_detail
> ( client_id into, invoicedate, sales money)
>
> 2. an indexed view called mv_summary is created over tbl_detail table
>
> create view mv_summary as
> select client_id , datepart(yyyy,invoicedate) as invoiceyear, sum(sales)
> as
> sales
> go
> create unique clustered index cidx_yearclient on ,mv_summary(invoiceyear,
> client_id)
> go
>
> 3. lets say the mv_summary view is at lower level of aggregation, now the
> mv_summary has a 1000 aggregated record from the detail table.
> any select statements ran against the mv_summary view directly the plan
> runs
> as expected. even if you were to query the tbl_detail at the aggregation
> level of the view the plan result returns mv_summary as expected via sql
> server arrogation awareness. though once you introduce an dimensional
> table
> is at the same level of aggregation to the query ( like a dimensional
> join )
> ok this is were this get weird.
> the plan goes to the tbl_detail not the indexed view as I would expect it
> to.
> basically what I attempted to do was filter on an dimensional descriptive
> element via join and where clause.
> example :
>
> select client_id , sum(sales) as sales
> from tbl_detail
> inner join client_dimension
> on
> tbl_detail.client_id = client_dimension.client_id
> where client_dimension.client_description 'test'
>
> turns out what I have been able to come up with is ALL elements that you
> need to filter join select on HAS to been in the indexed view... which
> seems
> to been an issue.. that mean ALL data including dimensional data has to be
> persisted again...
> any thoughts on how I can use indexed views in a aggregation aware
> environment without have to store ALL possible filterable elements in the
> view.
> thanks!!!
>|||Ok, I have written a script to reproduce the issue I am seeing...
(Though I was able to get SQL to pull from the indexed view with a
constraint on a joining dimensional table by removing the joining table from
the indexed view...) though that does seem odd to me why that won't work.
as far as the time awareness aggregation
it seems as if the analyzer is not wanting to honor the time_dim join
against the indexed view.
in a nut shell I am attempting to get the analyzer to query against the
summary indexed view based on a date constraint.
The thought behind this is a user would be able to build a query against the
lowest level of aggregation (the detail table "tbl_detail") and based on the
level of aggregation and the date constraints it would choose the most
appropriate table of indexed view to used to resolve the result set.
(Thinking that if a user had asked for a full years worth of data at the
client level (found in the index view "mv_summary" ) it would be more
efficient to pull process maybe 12 I/Os verses 12 million I/O s ) "That would
make total since to me... in fact it does do that on the 1st query in my
example: " though once I introduce time into the equalization it go after the
detail table every time.
hmmm... the has to be a way to do this... maybe I just not seeing it... I
just can't accept that Microsoft would release a product that would allow you
to perform aggregation one way but not another... especially on one such a
useful as date...
Any thoughts?
Thanks
Eric
/* code start : */
--drop sample objects
drop view mv_summary
drop table tbl_detail
drop table time_dim
drop table client_dim
--create detail table
create table tbl_detail(
ID int identity(1,1),
invoice_date datetime,
invoice_month as CASE
WHEN cast(datepart(mm,invoice_date) as varchar(2)) < = 9 THEN
cast(datepart(yyyy,invoice_date)as varchar(4)) + '0' +
cast(datepart(mm,invoice_date)as varchar(2))
ELSE cast(datepart(yyyy,invoice_date)as varchar(4)) +
cast(datepart(mm,invoice_date) as varchar(2))
END,
month_begin_date as dateadd(month,datediff(month,0,[invoice_date]),0),
client_id varchar(10),
sales money not null)
go
--create time dimension
create table time_dim(
date_number datetime ,
month_begin_date as dateadd(month,datediff(month,0,date_number),0),
month_code as CASE
WHEN cast(datepart(mm,date_number) as varchar(2)) < = 9 THEN
cast(datepart(yyyy,date_number)as varchar(4)) + '0' +
cast(datepart(mm,date_number)as varchar(2))
ELSE cast(datepart(yyyy,date_number)as varchar(4)) +
cast(datepart(mm,date_number) as varchar(2))
END)
go
--create client list
create table client_dim (client_id varchar(10), client_desc varchar(20),
usedflag bit null)
insert into client_dim(client_id,client_desc)values('A0000001','A0000001TEST')
insert into client_dim(client_id,client_desc)values('A0000002','A0000002TEST')
insert into client_dim(client_id,client_desc)values('A0000003','A0000003TEST')
insert into client_dim(client_id,client_desc)values('A0000004','A0000004TEST')
insert into client_dim(client_id,client_desc)values('A0000005','A0000005TEST')
insert into client_dim(client_id,client_desc)values('A0000006','A0000006TEST')
insert into client_dim(client_id,client_desc)values('A0000007','A0000007TEST')
insert into client_dim(client_id,client_desc)values('A0000008','A0000008TEST')
insert into client_dim(client_id,client_desc)values('A0000009','A0000009TEST')
insert into client_dim(client_id,client_desc)values('A0000010','A0000010TEST')
go
--create index
create index idx_client_id on client_dim (client_id)
go
--loop through each client and build a random list of data.
declare @.loopcnt int
set @.loopcnt = 0
declare @.clientloopcnt int
set @.clientloopcnt = 0
declare @.clientid varchar(10)
declare @.nextdate datetime
set @.nextdate = getdate()
--populate detail table
while @.loopcnt < 1--0000
begin
insert into tbl_detail ( invoice_date, client_id, sales)
select
cast(cast(getdate() as int) -115* rand(cast(cast(newid() as binary(8))
as int))as datetime) as invoice_date,
client_id,
cast(cast(100 as int) -115* rand(cast(cast(newid() as binary(8)) as
int))as money)
from client_dim --where usedflag is not null
set @.loopcnt = @.loopcnt + 1
print 'The loop counter is ' + cast(@.loopcnt as char)
--update the client as being complete...
set @.clientloopcnt = @.clientloopcnt + 1
end
--populate time dimension
while @.loopcnt < 365
begin
insert into time_dim (date_number)
values (@.nextdate)
set @.nextdate = @.nextdate + 1
set @.loopcnt = @.loopcnt + 1
print 'The loop counter is ' + cast(@.loopcnt as char)
--update the client as being complete...
end
go
go
--create indexed view
create view mv_summary with schemabinding
as
select a.month_begin_date, a.client_id, sum(a.sales) as sales, count_big(*)
as RC
from dbo.tbl_detail a
group by a.month_begin_date ,a.client_id
go
create unique clustered index cidx_client_invoice_date on mv_summary(
client_id, month_begin_date)
go
--this query invokes the summary as expected...
select a.month_begin_date , b.client_id , b.client_desc , sum(sales) as sales
from dbo.tbl_detail a
inner join dbo.client_dim b
on
a.client_id = b.client_id
where b.client_desc = 'A0000001TEST'
group by a.month_begin_date, b.client_id , b.client_desc
--this query does NOT invokes the summary... though not sure why...
select a.month_begin_date , b.client_id , b.client_desc , sum(sales) as sales
from dbo.tbl_detail a
inner join dbo.client_dim b
on
a.client_id = b.client_id
inner join time_dim t
on
t.month_begin_date = a.month_begin_date
where b.client_desc = 'A0000001TEST' and t.date_number between '6/1/2006'
and '6/30/2006'
group by a.month_begin_date, b.client_id , b.client_desc
/* code end: */
"Mike Hodgson" wrote:
> In theory it ought to work fine (ie. use the indexed view in the query
> plan) IFF the optimiser determines that that's the most efficient way to
> return the data (although I've had cases where querying the base table
> was so quick that the optimiser decided to chose it over a corresponding
> indexed view anyway and not waste it's time evaluating query plans
> against the indexed view).
> The T-SQL code you post looks rather incomplete. Among other things,
> shouldn't there be at least a GROUP BY in your view? What I'm trying to
> get at is the SUM() aggregates in the view, are they usable in your
> other query or are you grouping the rows into different groups when
> calculating your SUM() aggregates? I'm guessing if you summed the sales
> column from your view for all invoice years for a particular client_id
> you'd get the same figure as summing the sales figure for that client_id
> in the base table wouldn't you? Have I correctly guessed the grouping
> you've used in the view?
> If that's true then perhaps the optimiser thinks it's harder to join the
> materialised view data to the client_dimension table than it is to join
> the base tbl_detail table to that client_dimension table. I notice the
> clustered index you create on the view has the invoiceyear column 1st
> (and the client_id column 2nd), which would make it rather nasty to join
> to client_dimension. I assume tbl_detail & client_dimension both have
> nice indexes (maybe even clustered indexes) where client_id is the 1st
> column in those indexes; if so, then a join between tbl_detail and
> client_dimension would probably be more efficient than between
> mv_summary and client_dimension (and so the query optimiser would
> probably pick a join with the base table rather than the clustered index
> on the view).
> I'd check the execution plans, try changing the clustered unique index
> on mv_summary so that client_id is the first column in the index and
> make sure the grouping in the view & the later query are "compatible".
> The bottom line is, in theory, it ought to work fine but there's
> probably just something a bit off with your scenario/implementation.
> --
> *mike hodgson*
> http://sqlnerd.blogspot.com
>
> Eric wrote:
> >Please correct me if I'm wrong here but what's the purpose of sql server
> >indexed views.. I mean I don't mean to go off on rant here and please
> >understand this is not a post to discredited ms sql server 2005 at all, I'm
> >just trying to get some clarification that's all.
> >
> >ok.. with that said. from what I understand in a nut shell about 2005
> >indexed views is when the view is persisted ALL objects with in the view are
> >persisted.. which is to be expected.. what I found to my surprises is how sql
> >server seems to perform aggregated awareness only when all element with the
> >view are present. in other words if you present any other elements ( like a
> >another dimensional table join that is at the same level of aggregation that
> >the view is at) currently at seems to break the optimal query plan that I
> >would expect the optimizer to take.
> >
> >example:
> >
> >1. base detail table called table_detail a has 1 million records in it.
> >
> >
> >create table tbl_detail
> >( client_id into, invoicedate, sales money)
> >
> >
> >
> >2. an indexed view called mv_summary is created over tbl_detail table
> >
> >
> >create view mv_summary as
> >select client_id , datepart(yyyy,invoicedate) as invoiceyear, sum(sales) as
> >sales
> >go
> >create unique clustered index cidx_yearclient on ,mv_summary(invoiceyear,
> >client_id)
> >go
> >
> >
> >3. lets say the mv_summary view is at lower level of aggregation, now the
> >mv_summary has a 1000 aggregated record from the detail table.
> >any select statements ran against the mv_summary view directly the plan runs
> >as expected. even if you were to query the tbl_detail at the aggregation
> >level of the view the plan result returns mv_summary as expected via sql
> >server arrogation awareness. though once you introduce an dimensional table
> >is at the same level of aggregation to the query ( like a dimensional join )
> >ok this is were this get weird.
> >the plan goes to the tbl_detail not the indexed view as I would expect it to.
> >basically what I attempted to do was filter on an dimensional descriptive
> >element via join and where clause.
> >
> >example :
> >
> >
> >select client_id , sum(sales) as sales
> >from tbl_detail
> >inner join client_dimension
> >on
> >tbl_detail.client_id = client_dimension.client_id
> >where client_dimension.client_description 'test'
> >
> >
> >
> >turns out what I have been able to come up with is ALL elements that you
> >need to filter join select on HAS to been in the indexed view... which seems
> >to been an issue.. that mean ALL data including dimensional data has to be
> >persisted again...
> >
> >any thoughts on how I can use indexed views in a aggregation aware
> >environment without have to store ALL possible filterable elements in the
> >view.
> >
> >thanks!!!
> >
> >
> >
> >
>|||Does anyone have any ideas as to why this might be happening.
"Eric" wrote:
> Please correct me if I'm wrong here but what's the purpose of sql server
> indexed views.. I mean I don't mean to go off on rant here and please
> understand this is not a post to discredited ms sql server 2005 at all, I'm
> just trying to get some clarification that's all.
> ok.. with that said. from what I understand in a nut shell about 2005
> indexed views is when the view is persisted ALL objects with in the view are
> persisted.. which is to be expected.. what I found to my surprises is how sql
> server seems to perform aggregated awareness only when all element with the
> view are present. in other words if you present any other elements ( like a
> another dimensional table join that is at the same level of aggregation that
> the view is at) currently at seems to break the optimal query plan that I
> would expect the optimizer to take.
> example:
> 1. base detail table called table_detail a has 1 million records in it.
>
> create table tbl_detail
> ( client_id into, invoicedate, sales money)
>
> 2. an indexed view called mv_summary is created over tbl_detail table
>
> create view mv_summary as
> select client_id , datepart(yyyy,invoicedate) as invoiceyear, sum(sales) as
> sales
> go
> create unique clustered index cidx_yearclient on ,mv_summary(invoiceyear,
> client_id)
> go
>
> 3. lets say the mv_summary view is at lower level of aggregation, now the
> mv_summary has a 1000 aggregated record from the detail table.
> any select statements ran against the mv_summary view directly the plan runs
> as expected. even if you were to query the tbl_detail at the aggregation
> level of the view the plan result returns mv_summary as expected via sql
> server arrogation awareness. though once you introduce an dimensional table
> is at the same level of aggregation to the query ( like a dimensional join )
> ok this is were this get weird.
> the plan goes to the tbl_detail not the indexed view as I would expect it to.
> basically what I attempted to do was filter on an dimensional descriptive
> element via join and where clause.
> example :
>
> select client_id , sum(sales) as sales
> from tbl_detail
> inner join client_dimension
> on
> tbl_detail.client_id = client_dimension.client_id
> where client_dimension.client_description 'test'
>
> turns out what I have been able to come up with is ALL elements that you
> need to filter join select on HAS to been in the indexed view... which seems
> to been an issue.. that mean ALL data including dimensional data has to be
> persisted again...
> any thoughts on how I can use indexed views in a aggregation aware
> environment without have to store ALL possible filterable elements in the
> view.
> thanks!!!
>|||I have one possible reason (which is merely a guess), and a speculation.
For a simple example where the cost difference (in absolute time) is not
that big (like the one in your repro script), the optimizer might not do
a full optimize, but stop searching for better query plans when a so
called 'obvious' plan is found. This mechanism is used to avoid
'wasting' time on searching for better plans that may never be found,
and instead to start executing immediately.
The speculation is, that Microsoft might not have invested enough effort
in analyzing if an indexed view might benefit the query. In your case,
the indexed view option is simply missed, because using it makes the
query definitely more efficient. Note that indexed views introduced as
'recently' as SQL Server 2000. IMO, its introduction this was mostly a
commercial statement, claiming that Microsoft was no longer behind
Oracle and IBM (with regard to this feature).
I can't tell if there were any improvements to indexed views (or their
use) in SQL Server 2005, but I can tell that I have not seen any
documentation suggesting there were any improvements. If someone has
seen such documentation, I would be very interested.
HTH,
Gert-Jan
Eric wrote:
> Does anyone have any ideas as to why this might be happening.
> "Eric" wrote:
> > Please correct me if I'm wrong here but what's the purpose of sql server
> > indexed views.. I mean I don't mean to go off on rant here and please
> > understand this is not a post to discredited ms sql server 2005 at all, I'm
> > just trying to get some clarification that's all.
> >
> > ok.. with that said. from what I understand in a nut shell about 2005
> > indexed views is when the view is persisted ALL objects with in the view are
> > persisted.. which is to be expected.. what I found to my surprises is how sql
> > server seems to perform aggregated awareness only when all element with the
> > view are present. in other words if you present any other elements ( like a
> > another dimensional table join that is at the same level of aggregation that
> > the view is at) currently at seems to break the optimal query plan that I
> > would expect the optimizer to take.
> >
> > example:
> >
> > 1. base detail table called table_detail a has 1 million records in it.
> >
> >
> > create table tbl_detail
> > ( client_id into, invoicedate, sales money)
> >
> >
> >
> > 2. an indexed view called mv_summary is created over tbl_detail table
> >
> >
> > create view mv_summary as
> > select client_id , datepart(yyyy,invoicedate) as invoiceyear, sum(sales) as
> > sales
> > go
> > create unique clustered index cidx_yearclient on ,mv_summary(invoiceyear,
> > client_id)
> > go
> >
> >
> > 3. lets say the mv_summary view is at lower level of aggregation, now the
> > mv_summary has a 1000 aggregated record from the detail table.
> > any select statements ran against the mv_summary view directly the plan runs
> > as expected. even if you were to query the tbl_detail at the aggregation
> > level of the view the plan result returns mv_summary as expected via sql
> > server arrogation awareness. though once you introduce an dimensional table
> > is at the same level of aggregation to the query ( like a dimensional join )
> > ok this is were this get weird.
> > the plan goes to the tbl_detail not the indexed view as I would expect it to.
> > basically what I attempted to do was filter on an dimensional descriptive
> > element via join and where clause.
> >
> > example :
> >
> >
> > select client_id , sum(sales) as sales
> > from tbl_detail
> > inner join client_dimension
> > on
> > tbl_detail.client_id = client_dimension.client_id
> > where client_dimension.client_description 'test'
> >
> >
> >
> > turns out what I have been able to come up with is ALL elements that you
> > need to filter join select on HAS to been in the indexed view... which seems
> > to been an issue.. that mean ALL data including dimensional data has to be
> > persisted again...
> >
> > any thoughts on how I can use indexed views in a aggregation aware
> > environment without have to store ALL possible filterable elements in the
> > view.
> >
> > thanks!!!
> >
> >|||Hi Gert-Jan
The optimizer, since version 6.5, rarely does a full optimize, as this would
just be too expensive. It basically starts looking for a 'good' plan,
starting with something simple and getting more and more complex. Checking
to see of there is a match between the query graph of your query and the
query graphs for all existing indexed views is way down on the list. So if
the optimizer thinks it has a good enough plan before it gets that far, it
stops looking and will never even consider the indexed view.
In a case like this, you might really need to test with much bigger tables.
This whitepaper has a section on "What's New in Indexed Views in SQL Server
2005".
http://www.microsoft.com/technet/prodtechnol/sql/2005/ipsql05iv.mspx
--
HTH
Kalen Delaney, SQL Server MVP
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:4496E642.F68C6115@.toomuchspamalready.nl...
>I have one possible reason (which is merely a guess), and a speculation.
> For a simple example where the cost difference (in absolute time) is not
> that big (like the one in your repro script), the optimizer might not do
> a full optimize, but stop searching for better query plans when a so
> called 'obvious' plan is found. This mechanism is used to avoid
> 'wasting' time on searching for better plans that may never be found,
> and instead to start executing immediately.
> The speculation is, that Microsoft might not have invested enough effort
> in analyzing if an indexed view might benefit the query. In your case,
> the indexed view option is simply missed, because using it makes the
> query definitely more efficient. Note that indexed views introduced as
> 'recently' as SQL Server 2000. IMO, its introduction this was mostly a
> commercial statement, claiming that Microsoft was no longer behind
> Oracle and IBM (with regard to this feature).
> I can't tell if there were any improvements to indexed views (or their
> use) in SQL Server 2005, but I can tell that I have not seen any
> documentation suggesting there were any improvements. If someone has
> seen such documentation, I would be very interested.
> HTH,
> Gert-Jan
>
> Eric wrote:
>> Does anyone have any ideas as to why this might be happening.
>> "Eric" wrote:
>> > Please correct me if I'm wrong here but what's the purpose of sql
>> > server
>> > indexed views.. I mean I don't mean to go off on rant here and please
>> > understand this is not a post to discredited ms sql server 2005 at all,
>> > I'm
>> > just trying to get some clarification that's all.
>> >
>> > ok.. with that said. from what I understand in a nut shell about 2005
>> > indexed views is when the view is persisted ALL objects with in the
>> > view are
>> > persisted.. which is to be expected.. what I found to my surprises is
>> > how sql
>> > server seems to perform aggregated awareness only when all element with
>> > the
>> > view are present. in other words if you present any other elements (
>> > like a
>> > another dimensional table join that is at the same level of aggregation
>> > that
>> > the view is at) currently at seems to break the optimal query plan that
>> > I
>> > would expect the optimizer to take.
>> >
>> > example:
>> >
>> > 1. base detail table called table_detail a has 1 million records in it.
>> >
>> >
>> > create table tbl_detail
>> > ( client_id into, invoicedate, sales money)
>> >
>> >
>> >
>> > 2. an indexed view called mv_summary is created over tbl_detail table
>> >
>> >
>> > create view mv_summary as
>> > select client_id , datepart(yyyy,invoicedate) as invoiceyear,
>> > sum(sales) as
>> > sales
>> > go
>> > create unique clustered index cidx_yearclient on
>> > ,mv_summary(invoiceyear,
>> > client_id)
>> > go
>> >
>> >
>> > 3. lets say the mv_summary view is at lower level of aggregation, now
>> > the
>> > mv_summary has a 1000 aggregated record from the detail table.
>> > any select statements ran against the mv_summary view directly the plan
>> > runs
>> > as expected. even if you were to query the tbl_detail at the
>> > aggregation
>> > level of the view the plan result returns mv_summary as expected via
>> > sql
>> > server arrogation awareness. though once you introduce an dimensional
>> > table
>> > is at the same level of aggregation to the query ( like a dimensional
>> > join )
>> > ok this is were this get weird.
>> > the plan goes to the tbl_detail not the indexed view as I would expect
>> > it to.
>> > basically what I attempted to do was filter on an dimensional
>> > descriptive
>> > element via join and where clause.
>> >
>> > example :
>> >
>> >
>> > select client_id , sum(sales) as sales
>> > from tbl_detail
>> > inner join client_dimension
>> > on
>> > tbl_detail.client_id = client_dimension.client_id
>> > where client_dimension.client_description 'test'
>> >
>> >
>> >
>> > turns out what I have been able to come up with is ALL elements that
>> > you
>> > need to filter join select on HAS to been in the indexed view... which
>> > seems
>> > to been an issue.. that mean ALL data including dimensional data has to
>> > be
>> > persisted again...
>> >
>> > any thoughts on how I can use indexed views in a aggregation aware
>> > environment without have to store ALL possible filterable elements in
>> > the
>> > view.
>> >
>> > thanks!!!
>> >
>> >|||Thanks for the link! Interesting article!
I guess this is a good time to eat my words, because the article shows
quite a lot of improvements, not just in expanded functionality but also
in query coverage. So there is great improvement.
Having said that, I guess it is no surprise that it can still be
improved upon.
The repro script Eric posted (on June 13th) inserted 10,000 rows (4MB)
in tbl_detail, which - for performance testing - is rather few and table
size rather small. So I expanded this to 1 millions rows (421MB). Now
the query optimizer really has a reason to thouroughly search for the
optimal plan, because scanning the table will take a considerable amount
of time.
Yet the query
select a.month_begin_date , b.client_id , b.client_desc , sum(sales)
as sales
from dbo.tbl_detail a
inner join dbo.client_dim b on a.client_id = b.client_id
inner join time_dim t on t.month_begin_date = a.month_begin_date
where b.client_desc = 'A0000001TEST'
and t.date_number between '20060601' and '20060630'
group by a.month_begin_date, b.client_id , b.client_desc
still uses the base table, and on my machine completes after 52632
logical reads in 41 seconds.
It looks like the optimizer does not recognize that the view can be
used, because the query
select a.month_begin_date , b.client_id , b.client_desc , sum(sales)
as sales
from mv_summary a
inner join dbo.client_dim b on a.client_id = b.client_id
inner join time_dim t on t.month_begin_date = a.month_begin_date
where b.client_desc = 'A0000001TEST'
and t.date_number between '20060601' and '20060630'
group by a.month_begin_date, b.client_id , b.client_desc
does use the indexed view, returns the same result and completes after 3
logical reads in 5 ms.
Gert-Jan
Kalen Delaney wrote:
> Hi Gert-Jan
> The optimizer, since version 6.5, rarely does a full optimize, as this would
> just be too expensive. It basically starts looking for a 'good' plan,
> starting with something simple and getting more and more complex. Checking
> to see of there is a match between the query graph of your query and the
> query graphs for all existing indexed views is way down on the list. So if
> the optimizer thinks it has a good enough plan before it gets that far, it
> stops looking and will never even consider the indexed view.
> In a case like this, you might really need to test with much bigger tables.
> This whitepaper has a section on "What's New in Indexed Views in SQL Server
> 2005".
> http://www.microsoft.com/technet/prodtechnol/sql/2005/ipsql05iv.mspx
> --
> HTH
> Kalen Delaney, SQL Server MVP
> "Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
> news:4496E642.F68C6115@.toomuchspamalready.nl...
> >I have one possible reason (which is merely a guess), and a speculation.
> >
> > For a simple example where the cost difference (in absolute time) is not
> > that big (like the one in your repro script), the optimizer might not do
> > a full optimize, but stop searching for better query plans when a so
> > called 'obvious' plan is found. This mechanism is used to avoid
> > 'wasting' time on searching for better plans that may never be found,
> > and instead to start executing immediately.
> >
> > The speculation is, that Microsoft might not have invested enough effort
> > in analyzing if an indexed view might benefit the query. In your case,
> > the indexed view option is simply missed, because using it makes the
> > query definitely more efficient. Note that indexed views introduced as
> > 'recently' as SQL Server 2000. IMO, its introduction this was mostly a
> > commercial statement, claiming that Microsoft was no longer behind
> > Oracle and IBM (with regard to this feature).
> >
> > I can't tell if there were any improvements to indexed views (or their
> > use) in SQL Server 2005, but I can tell that I have not seen any
> > documentation suggesting there were any improvements. If someone has
> > seen such documentation, I would be very interested.
> >
> > HTH,
> > Gert-Jan
> >
> >
> > Eric wrote:
> >>
> >> Does anyone have any ideas as to why this might be happening.
> >>
> >> "Eric" wrote:
> >>
> >> > Please correct me if I'm wrong here but what's the purpose of sql
> >> > server
> >> > indexed views.. I mean I don't mean to go off on rant here and please
> >> > understand this is not a post to discredited ms sql server 2005 at all,
> >> > I'm
> >> > just trying to get some clarification that's all.
> >> >
> >> > ok.. with that said. from what I understand in a nut shell about 2005
> >> > indexed views is when the view is persisted ALL objects with in the
> >> > view are
> >> > persisted.. which is to be expected.. what I found to my surprises is
> >> > how sql
> >> > server seems to perform aggregated awareness only when all element with
> >> > the
> >> > view are present. in other words if you present any other elements (
> >> > like a
> >> > another dimensional table join that is at the same level of aggregation
> >> > that
> >> > the view is at) currently at seems to break the optimal query plan that
> >> > I
> >> > would expect the optimizer to take.
> >> >
> >> > example:
> >> >
> >> > 1. base detail table called table_detail a has 1 million records in it.
> >> >
> >> >
> >> > create table tbl_detail
> >> > ( client_id into, invoicedate, sales money)
> >> >
> >> >
> >> >
> >> > 2. an indexed view called mv_summary is created over tbl_detail table
> >> >
> >> >
> >> > create view mv_summary as
> >> > select client_id , datepart(yyyy,invoicedate) as invoiceyear,
> >> > sum(sales) as
> >> > sales
> >> > go
> >> > create unique clustered index cidx_yearclient on
> >> > ,mv_summary(invoiceyear,
> >> > client_id)
> >> > go
> >> >
> >> >
> >> > 3. lets say the mv_summary view is at lower level of aggregation, now
> >> > the
> >> > mv_summary has a 1000 aggregated record from the detail table.
> >> > any select statements ran against the mv_summary view directly the plan
> >> > runs
> >> > as expected. even if you were to query the tbl_detail at the
> >> > aggregation
> >> > level of the view the plan result returns mv_summary as expected via
> >> > sql
> >> > server arrogation awareness. though once you introduce an dimensional
> >> > table
> >> > is at the same level of aggregation to the query ( like a dimensional
> >> > join )
> >> > ok this is were this get weird.
> >> > the plan goes to the tbl_detail not the indexed view as I would expect
> >> > it to.
> >> > basically what I attempted to do was filter on an dimensional
> >> > descriptive
> >> > element via join and where clause.
> >> >
> >> > example :
> >> >
> >> >
> >> > select client_id , sum(sales) as sales
> >> > from tbl_detail
> >> > inner join client_dimension
> >> > on
> >> > tbl_detail.client_id = client_dimension.client_id
> >> > where client_dimension.client_description 'test'
> >> >
> >> >
> >> >
> >> > turns out what I have been able to come up with is ALL elements that
> >> > you
> >> > need to filter join select on HAS to been in the indexed view... which
> >> > seems
> >> > to been an issue.. that mean ALL data including dimensional data has to
> >> > be
> >> > persisted again...
> >> >
> >> > any thoughts on how I can use indexed views in a aggregation aware
> >> > environment without have to store ALL possible filterable elements in
> >> > the
> >> > view.
> >> >
> >> > thanks!!!
> >> >
> >> >|||This is a multi-part message in MIME format.
--000801080406050903040801
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
This is an interesting case. I ran the repro and upped the loop counter
that populates the tbl_detail rows to 1,000,000 so there were 10M rows
in that table and I also added 10 years worth of dates to time_dim. It
took over half an hour to populate the tables on my SAN-attached 4-way
3.6GHz 64-bit box running SQL 2005 EE (not the most efficient population
loop) and the query with dbo.time_dim joined in still went back to the
base table. It took about 4 sec to run and did a table scan on
tbl_detail (because tbl_detail has no indexes at all). Whereas when I
simply changed 'dbo.tbl_detail' to 'dbo.mv_summary' in the select
statement it ran in 16 millisec and only did half a dozen or so IOs.
This looks like a case of the optimiser just not automatically matching
the view to the base table when the time_dim table is joined in for some
reason. The indexed view definitely *can* be used, as was proven when I
just changed the table name to the view name. I'm running Enterprise
Edition so it should be considered, and I think the execution plan that
scans the entire base table and returns 10M rows is not a good plan (the
plan involving the indexed view only returns 1 row from the equivalent
branch of the plan) so the optimiser should consider alternatives.
I tried a few things, like rewriting the query slightly and
restructuring the clustered index on the view, to get it to consider the
indexed view automatically all to no avail. To me this looks like a
potential bug. Perhaps the fact that the time_dim table contains mostly
computed columns, and that's what is getting joined to the based
table/indexed view, throws the query optimiser? This may need someone
from MS to look at perhaps (or at least someone who has had more sleep
than me recently, ie. who is not a football fan - go the Aussies!!)
--
*mike hodgson*
http://sqlnerd.blogspot.com
Gert-Jan Strik wrote:
>Thanks for the link! Interesting article!
>I guess this is a good time to eat my words, because the article shows
>quite a lot of improvements, not just in expanded functionality but also
>in query coverage. So there is great improvement.
>Having said that, I guess it is no surprise that it can still be
>improved upon.
>The repro script Eric posted (on June 13th) inserted 10,000 rows (4MB)
>in tbl_detail, which - for performance testing - is rather few and table
>size rather small. So I expanded this to 1 millions rows (421MB). Now
>the query optimizer really has a reason to thouroughly search for the
>optimal plan, because scanning the table will take a considerable amount
>of time.
>Yet the query
> select a.month_begin_date , b.client_id , b.client_desc , sum(sales)
>as sales
> from dbo.tbl_detail a
> inner join dbo.client_dim b on a.client_id = b.client_id
> inner join time_dim t on t.month_begin_date = a.month_begin_date
> where b.client_desc = 'A0000001TEST'
> and t.date_number between '20060601' and '20060630'
> group by a.month_begin_date, b.client_id , b.client_desc
>still uses the base table, and on my machine completes after 52632
>logical reads in 41 seconds.
>It looks like the optimizer does not recognize that the view can be
>used, because the query
> select a.month_begin_date , b.client_id , b.client_desc , sum(sales)
>as sales
> from mv_summary a
> inner join dbo.client_dim b on a.client_id = b.client_id
> inner join time_dim t on t.month_begin_date = a.month_begin_date
> where b.client_desc = 'A0000001TEST'
> and t.date_number between '20060601' and '20060630'
> group by a.month_begin_date, b.client_id , b.client_desc
>does use the indexed view, returns the same result and completes after 3
>logical reads in 5 ms.
>Gert-Jan
>
>Kalen Delaney wrote:
>
>>Hi Gert-Jan
>>The optimizer, since version 6.5, rarely does a full optimize, as this would
>>just be too expensive. It basically starts looking for a 'good' plan,
>>starting with something simple and getting more and more complex. Checking
>>to see of there is a match between the query graph of your query and the
>>query graphs for all existing indexed views is way down on the list. So if
>>the optimizer thinks it has a good enough plan before it gets that far, it
>>stops looking and will never even consider the indexed view.
>>In a case like this, you might really need to test with much bigger tables.
>>This whitepaper has a section on "What's New in Indexed Views in SQL Server
>>2005".
>>http://www.microsoft.com/technet/prodtechnol/sql/2005/ipsql05iv.mspx
>>--
>>HTH
>>Kalen Delaney, SQL Server MVP
>>"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
>>news:4496E642.F68C6115@.toomuchspamalready.nl...
>>
>>I have one possible reason (which is merely a guess), and a speculation.
>>For a simple example where the cost difference (in absolute time) is not
>>that big (like the one in your repro script), the optimizer might not do
>>a full optimize, but stop searching for better query plans when a so
>>called 'obvious' plan is found. This mechanism is used to avoid
>>'wasting' time on searching for better plans that may never be found,
>>and instead to start executing immediately.
>>The speculation is, that Microsoft might not have invested enough effort
>>in analyzing if an indexed view might benefit the query. In your case,
>>the indexed view option is simply missed, because using it makes the
>>query definitely more efficient. Note that indexed views introduced as
>>'recently' as SQL Server 2000. IMO, its introduction this was mostly a
>>commercial statement, claiming that Microsoft was no longer behind
>>Oracle and IBM (with regard to this feature).
>>I can't tell if there were any improvements to indexed views (or their
>>use) in SQL Server 2005, but I can tell that I have not seen any
>>documentation suggesting there were any improvements. If someone has
>>seen such documentation, I would be very interested.
>>HTH,
>>Gert-Jan
>>
>>Eric wrote:
>>
>>Does anyone have any ideas as to why this might be happening.
>>"Eric" wrote:
>>
>>Please correct me if I'm wrong here but what's the purpose of sql
>>server
>>indexed views.. I mean I don't mean to go off on rant here and please
>>understand this is not a post to discredited ms sql server 2005 at all,
>>I'm
>>just trying to get some clarification that's all.
>>ok.. with that said. from what I understand in a nut shell about 2005
>>indexed views is when the view is persisted ALL objects with in the
>>view are
>>persisted.. which is to be expected.. what I found to my surprises is
>>how sql
>>server seems to perform aggregated awareness only when all element with
>>the
>>view are present. in other words if you present any other elements (
>>like a
>>another dimensional table join that is at the same level of aggregation
>>that
>>the view is at) currently at seems to break the optimal query plan that
>>I
>>would expect the optimizer to take.
>>example:
>>1. base detail table called table_detail a has 1 million records in it.
>>
>>create table tbl_detail
>>( client_id into, invoicedate, sales money)
>>
>>2. an indexed view called mv_summary is created over tbl_detail table
>>
>>create view mv_summary as
>>select client_id , datepart(yyyy,invoicedate) as invoiceyear,
>>sum(sales) as
>>sales
>>go
>>create unique clustered index cidx_yearclient on
>>,mv_summary(invoiceyear,
>>client_id)
>>go
>>
>>3. lets say the mv_summary view is at lower level of aggregation, now
>>the
>>mv_summary has a 1000 aggregated record from the detail table.
>>any select statements ran against the mv_summary view directly the plan
>>runs
>>as expected. even if you were to query the tbl_detail at the
>>aggregation
>>level of the view the plan result returns mv_summary as expected via
>>sql
>>server arrogation awareness. though once you introduce an dimensional
>>table
>>is at the same level of aggregation to the query ( like a dimensional
>>join )
>>ok this is were this get weird.
>>the plan goes to the tbl_detail not the indexed view as I would expect
>>it to.
>>basically what I attempted to do was filter on an dimensional
>>descriptive
>>element via join and where clause.
>>example :
>>
>>select client_id , sum(sales) as sales
>>from tbl_detail
>>inner join client_dimension
>>on
>>tbl_detail.client_id = client_dimension.client_id
>>where client_dimension.client_description 'test'
>>
>>turns out what I have been able to come up with is ALL elements that
>>you
>>need to filter join select on HAS to been in the indexed view... which
>>seems
>>to been an issue.. that mean ALL data including dimensional data has to
>>be
>>persisted again...
>>any thoughts on how I can use indexed views in a aggregation aware
>>environment without have to store ALL possible filterable elements in
>>the
>>view.
>>thanks!!!
>>
>>
--000801080406050903040801
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<tt>This is an interesting case. I ran the repro and upped the loop
counter that populates the tbl_detail rows to 1,000,000 so there were
10M rows in that table and I also added 10 years worth of dates to
time_dim. It took over half an hour to populate the tables on my
SAN-attached 4-way 3.6GHz 64-bit box running SQL 2005 EE (not the most
efficient population loop) and the query with dbo.time_dim joined in
still went back to the base table. It took about 4 sec to run and did
a table scan on tbl_detail (because tbl_detail has no indexes at all).
Whereas when I simply changed 'dbo.tbl_detail' to 'dbo.mv_summary' in
the select statement it ran in 16 millisec and only did half a dozen or
so IOs.<br>
<br>
This looks like a case of the optimiser just not automatically matching
the view to the base table when the time_dim table is joined in for
some reason. The indexed view definitely <b>can</b> be used, as was
proven when I just changed the table name to the view name. I'm
running Enterprise Edition so it should be considered, and I think the
execution plan that scans the entire base table and returns 10M rows is
not a good plan (the plan involving the indexed view only returns 1 row
from the equivalent branch of the plan) so the optimiser should
consider alternatives.<br>
<br>
I tried a few things, like rewriting the query slightly and
restructuring the clustered index on the view, to get it to consider
the indexed view automatically all to no avail. To me this looks like
a potential bug. Perhaps the fact that the time_dim table contains
mostly computed columns, and that's what is getting joined to the based
table/indexed view, throws the query optimiser? This may need someone
from MS to look at perhaps (or at least someone who has had more sleep
than me recently, ie. who is not a football fan - go the Aussies!!)<br>
</tt>
<div class="moz-signature">
<p><span lang="en-au"><font face="Tahoma" size="2">--<br>
</font></span> <b><span lang="en-au"><font face="Tahoma" size="2">mike
hodgson</font></span></b><span lang="en-au"><br>
<font face="Tahoma" size="2"><a href="http://links.10026.com/?link=http://sqlnerd.blogspot.com</a></font></span>">http://sqlnerd.blogspot.com">http://sqlnerd.blogspot.com</a></font></span>
</p>
</div>
<br>
<br>
Gert-Jan Strik wrote:
<blockquote cite="mid449709C8.D3B182BE@.toomuchspamalready.nl"
type="cite">
<pre wrap="">Thanks for the link! Interesting article!
I guess this is a good time to eat my words, because the article shows
quite a lot of improvements, not just in expanded functionality but also
in query coverage. So there is great improvement.
Having said that, I guess it is no surprise that it can still be
improved upon.
The repro script Eric posted (on June 13th) inserted 10,000 rows (4MB)
in tbl_detail, which - for performance testing - is rather few and table
size rather small. So I expanded this to 1 millions rows (421MB). Now
the query optimizer really has a reason to thouroughly search for the
optimal plan, because scanning the table will take a considerable amount
of time.
Yet the query
select a.month_begin_date , b.client_id , b.client_desc , sum(sales)
as sales
from dbo.tbl_detail a
inner join dbo.client_dim b on a.client_id = b.client_id
inner join time_dim t on t.month_begin_date = a.month_begin_date
where b.client_desc = 'A0000001TEST'
and t.date_number between '20060601' and '20060630'
group by a.month_begin_date, b.client_id , b.client_desc
still uses the base table, and on my machine completes after 52632
logical reads in 41 seconds.
It looks like the optimizer does not recognize that the view can be
used, because the query
select a.month_begin_date , b.client_id , b.client_desc , sum(sales)
as sales
from mv_summary a
inner join dbo.client_dim b on a.client_id = b.client_id
inner join time_dim t on t.month_begin_date = a.month_begin_date
where b.client_desc = 'A0000001TEST'
and t.date_number between '20060601' and '20060630'
group by a.month_begin_date, b.client_id , b.client_desc
does use the indexed view, returns the same result and completes after 3
logical reads in 5 ms.
Gert-Jan
Kalen Delaney wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Hi Gert-Jan
The optimizer, since version 6.5, rarely does a full optimize, as this would
just be too expensive. It basically starts looking for a 'good' plan,
starting with something simple and getting more and more complex. Checking
to see of there is a match between the query graph of your query and the
query graphs for all existing indexed views is way down on the list. So if
the optimizer thinks it has a good enough plan before it gets that far, it
stops looking and will never even consider the indexed view.
In a case like this, you might really need to test with much bigger tables.
This whitepaper has a section on "What's New in Indexed Views in SQL Server
2005".
<a class="moz-txt-link-freetext" href="http://links.10026.com/?link=http://www.microsoft.com/technet/prodtechnol/sql/2005/ipsql05iv.mspx</a>">http://www.microsoft.com/technet/prodtechnol/sql/2005/ipsql05iv.mspx">http://www.microsoft.com/technet/prodtechnol/sql/2005/ipsql05iv.mspx</a>
--
HTH
Kalen Delaney, SQL Server MVP
"Gert-Jan Strik" <a class="moz-txt-link-rfc2396E" href="http://links.10026.com/?link=mailto:sorry@.toomuchspamalready.nl"><sorry@.toomuchspamalready.nl></a> wrote in message
<a class="moz-txt-link-freetext" href="http://links.10026.com/?link=news:4496E642.F68C6115@.toomuchspamalready.nl">news:4496E642.F68C6115@.toomuchspamalready.nl</a>...
</pre>
<blockquote type="cite">
<pre wrap="">I have one possible reason (which is merely a guess), and a speculation.
For a simple example where the cost difference (in absolute time) is not
that big (like the one in your repro script), the optimizer might not do
a full optimize, but stop searching for better query plans when a so
called 'obvious' plan is found. This mechanism is used to avoid
'wasting' time on searching for better plans that may never be found,
and instead to start executing immediately.
The speculation is, that Microsoft might not have invested enough effort
in analyzing if an indexed view might benefit the query. In your case,
the indexed view option is simply missed, because using it makes the
query definitely more efficient. Note that indexed views introduced as
'recently' as SQL Server 2000. IMO, its introduction this was mostly a
commercial statement, claiming that Microsoft was no longer behind
Oracle and IBM (with regard to this feature).
I can't tell if there were any improvements to indexed views (or their
use) in SQL Server 2005, but I can tell that I have not seen any
documentation suggesting there were any improvements. If someone has
seen such documentation, I would be very interested.
HTH,
Gert-Jan
Eric wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Does anyone have any ideas as to why this might be happening.
"Eric" wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Please correct me if I'm wrong here but what's the purpose of sql
server
indexed views.. I mean I don't mean to go off on rant here and please
understand this is not a post to discredited ms sql server 2005 at all,
I'm
just trying to get some clarification that's all.
ok.. with that said. from what I understand in a nut shell about 2005
indexed views is when the view is persisted ALL objects with in the
view are
persisted.. which is to be expected.. what I found to my surprises is
how sql
server seems to perform aggregated awareness only when all element with
the
view are present. in other words if you present any other elements (
like a
another dimensional table join that is at the same level of aggregation
that
the view is at) currently at seems to break the optimal query plan that
I
would expect the optimizer to take.
example:
1. base detail table called table_detail a has 1 million records in it.
create table tbl_detail
( client_id into, invoicedate, sales money)
2. an indexed view called mv_summary is created over tbl_detail table
create view mv_summary as
select client_id , datepart(yyyy,invoicedate) as invoiceyear,
sum(sales) as
sales
go
create unique clustered index cidx_yearclient on
,mv_summary(invoiceyear,
client_id)
go
3. lets say the mv_summary view is at lower level of aggregation, now
the
mv_summary has a 1000 aggregated record from the detail table.
any select statements ran against the mv_summary view directly the plan
runs
as expected. even if you were to query the tbl_detail at the
aggregation
level of the view the plan result returns mv_summary as expected via
sql
server arrogation awareness. though once you introduce an dimensional
table
is at the same level of aggregation to the query ( like a dimensional
join )
ok this is were this get weird.
the plan goes to the tbl_detail not the indexed view as I would expect
it to.
basically what I attempted to do was filter on an dimensional
descriptive
element via join and where clause.
example :
select client_id , sum(sales) as sales
from tbl_detail
inner join client_dimension
on
tbl_detail.client_id = client_dimension.client_id
where client_dimension.client_description 'test'
turns out what I have been able to come up with is ALL elements that
you
need to filter join select on HAS to been in the indexed view... which
seems
to been an issue.. that mean ALL data including dimensional data has to
be
persisted again...
any thoughts on how I can use indexed views in a aggregation aware
environment without have to store ALL possible filterable elements in
the
view.
thanks!!!
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</body>
</html>
--000801080406050903040801--|||LOL!!.
Thanks Mike,
My findings where exactly what you and Gert-Jan Strik had experienced, for a
while there I thought I was losing my mindâ?¦
Sorry bout the ill performing loop logic, I was in a hurry and need to make
the example script as simple as possible.
As far as the time_dim having calculated parts I changed them to persisted
data values and attempted to test again... unfortunately with no avail. In
addition I also tested this in SQL 2000 enterprise ... and guess what. It
used the exact same plan that SQL 2005 enterprise did...
Thanks again.
Eric
"Mike Hodgson" wrote:
> This is an interesting case. I ran the repro and upped the loop counter
> that populates the tbl_detail rows to 1,000,000 so there were 10M rows
> in that table and I also added 10 years worth of dates to time_dim. It
> took over half an hour to populate the tables on my SAN-attached 4-way
> 3.6GHz 64-bit box running SQL 2005 EE (not the most efficient population
> loop) and the query with dbo.time_dim joined in still went back to the
> base table. It took about 4 sec to run and did a table scan on
> tbl_detail (because tbl_detail has no indexes at all). Whereas when I
> simply changed 'dbo.tbl_detail' to 'dbo.mv_summary' in the select
> statement it ran in 16 millisec and only did half a dozen or so IOs.
> This looks like a case of the optimiser just not automatically matching
> the view to the base table when the time_dim table is joined in for some
> reason. The indexed view definitely *can* be used, as was proven when I
> just changed the table name to the view name. I'm running Enterprise
> Edition so it should be considered, and I think the execution plan that
> scans the entire base table and returns 10M rows is not a good plan (the
> plan involving the indexed view only returns 1 row from the equivalent
> branch of the plan) so the optimiser should consider alternatives.
> I tried a few things, like rewriting the query slightly and
> restructuring the clustered index on the view, to get it to consider the
> indexed view automatically all to no avail. To me this looks like a
> potential bug. Perhaps the fact that the time_dim table contains mostly
> computed columns, and that's what is getting joined to the based
> table/indexed view, throws the query optimiser? This may need someone
> from MS to look at perhaps (or at least someone who has had more sleep
> than me recently, ie. who is not a football fan - go the Aussies!!)
> --
> *mike hodgson*
> http://sqlnerd.blogspot.com
>
> Gert-Jan Strik wrote:
> >Thanks for the link! Interesting article!
> >
> >I guess this is a good time to eat my words, because the article shows
> >quite a lot of improvements, not just in expanded functionality but also
> >in query coverage. So there is great improvement.
> >
> >Having said that, I guess it is no surprise that it can still be
> >improved upon.
> >
> >The repro script Eric posted (on June 13th) inserted 10,000 rows (4MB)
> >in tbl_detail, which - for performance testing - is rather few and table
> >size rather small. So I expanded this to 1 millions rows (421MB). Now
> >the query optimizer really has a reason to thouroughly search for the
> >optimal plan, because scanning the table will take a considerable amount
> >of time.
> >
> >Yet the query
> >
> > select a.month_begin_date , b.client_id , b.client_desc , sum(sales)
> >as sales
> > from dbo.tbl_detail a
> > inner join dbo.client_dim b on a.client_id = b.client_id
> > inner join time_dim t on t.month_begin_date = a.month_begin_date
> > where b.client_desc = 'A0000001TEST'
> > and t.date_number between '20060601' and '20060630'
> > group by a.month_begin_date, b.client_id , b.client_desc
> >
> >still uses the base table, and on my machine completes after 52632
> >logical reads in 41 seconds.
> >
> >It looks like the optimizer does not recognize that the view can be
> >used, because the query
> >
> > select a.month_begin_date , b.client_id , b.client_desc , sum(sales)
> >as sales
> > from mv_summary a
> > inner join dbo.client_dim b on a.client_id = b.client_id
> > inner join time_dim t on t.month_begin_date = a.month_begin_date
> > where b.client_desc = 'A0000001TEST'
> > and t.date_number between '20060601' and '20060630'
> > group by a.month_begin_date, b.client_id , b.client_desc
> >
> >does use the indexed view, returns the same result and completes after 3
> >logical reads in 5 ms.
> >
> >Gert-Jan
> >
> >
> >Kalen Delaney wrote:
> >
> >
> >>Hi Gert-Jan
> >>
> >>The optimizer, since version 6.5, rarely does a full optimize, as this would
> >>just be too expensive. It basically starts looking for a 'good' plan,
> >>starting with something simple and getting more and more complex. Checking
> >>to see of there is a match between the query graph of your query and the
> >>query graphs for all existing indexed views is way down on the list. So if
> >>the optimizer thinks it has a good enough plan before it gets that far, it
> >>stops looking and will never even consider the indexed view.
> >>
> >>In a case like this, you might really need to test with much bigger tables.
> >>
> >>This whitepaper has a section on "What's New in Indexed Views in SQL Server
> >>2005".
> >>
> >>http://www.microsoft.com/technet/prodtechnol/sql/2005/ipsql05iv.mspx
> >>
> >>--
> >>HTH
> >>Kalen Delaney, SQL Server MVP
> >>
> >>"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
> >>news:4496E642.F68C6115@.toomuchspamalready.nl...
> >>
> >>
> >>I have one possible reason (which is merely a guess), and a speculation.
> >>
> >>For a simple example where the cost difference (in absolute time) is not
> >>that big (like the one in your repro script), the optimizer might not do
> >>a full optimize, but stop searching for better query plans when a so
> >>called 'obvious' plan is found. This mechanism is used to avoid
> >>'wasting' time on searching for better plans that may never be found,
> >>and instead to start executing immediately.
> >>
> >>The speculation is, that Microsoft might not have invested enough effort
> >>in analyzing if an indexed view might benefit the query. In your case,
> >>the indexed view option is simply missed, because using it makes the
> >>query definitely more efficient. Note that indexed views introduced as
> >>'recently' as SQL Server 2000. IMO, its introduction this was mostly a
> >>commercial statement, claiming that Microsoft was no longer behind
> >>Oracle and IBM (with regard to this feature).
> >>
> >>I can't tell if there were any improvements to indexed views (or their
> >>use) in SQL Server 2005, but I can tell that I have not seen any
> >>documentation suggesting there were any improvements. If someone has
> >>seen such documentation, I would be very interested.
> >>
> >>HTH,
> >>Gert-Jan
> >>
> >>
> >>Eric wrote:
> >>
> >>
> >>Does anyone have any ideas as to why this might be happening.
> >>
> >>"Eric" wrote:
> >>
> >>
> >>
> >>Please correct me if I'm wrong here but what's the purpose of sql
> >>server
> >>indexed views.. I mean I don't mean to go off on rant here and please
> >>understand this is not a post to discredited ms sql server 2005 at all,
> >>I'm
> >>just trying to get some clarification that's all.
> >>
> >>ok.. with that said. from what I understand in a nut shell about 2005
> >>indexed views is when the view is persisted ALL objects with in the
> >>view are
> >>persisted.. which is to be expected.. what I found to my surprises is
> >>how sql
> >>server seems to perform aggregated awareness only when all element with
> >>the
> >>view are present. in other words if you present any other elements (
> >>like a
> >>another dimensional table join that is at the same level of aggregation
> >>that
> >>the view is at) currently at seems to break the optimal query plan that
> >>I
> >>would expect the optimizer to take.
> >>
> >>example:
> >>
> >>1. base detail table called table_detail a has 1 million records in it.
> >>
> >>
> >>create table tbl_detail
> >>( client_id into, invoicedate, sales money)
> >>
> >>
> >>
> >>2. an indexed view called mv_summary is created over tbl_detail table
> >>
> >>
> >>create view mv_summary as
> >>select client_id , datepart(yyyy,invoicedate) as invoiceyear,
> >>sum(sales) as
> >>sales
> >>go
> >>create unique clustered index cidx_yearclient on
> >>,mv_summary(invoiceyear,
> >>client_id)
> >>go
> >>
> >>
> >>3. lets say the mv_summary view is at lower level of aggregation, now
> >>the
> >>mv_summary has a 1000 aggregated record from the detail table.
> >>any select statements ran against the mv_summary view directly the plan
> >>runs
> >>as expected. even if you were to query the tbl_detail at the
> >>aggregation
> >>level of the view the plan result returns mv_summary as expected via
> >>sql
> >>server arrogation awareness. though once you introduce an dimensional
> >>table
> >>is at the same level of aggregation to the query ( like a dimensional
> >>join )
> >>ok this is were this get weird.
> >>the plan goes to the tbl_detail not the indexed view as I would expect
> >>it to.
> >>basically what I attempted to do was filter on an dimensional
> >>descriptive
> >>element via join and where clause.
> >>
> >>example :
> >>
> >>
> >>select client_id , sum(sales) as sales
> >>from tbl_detail
> >>inner join client_dimension
> >>on
> >>tbl_detail.client_id = client_dimension.client_id
> >>where client_dimension.client_description 'test'
> >>
> >>
> >>
> >>turns out what I have been able to come up with is ALL elements that
> >>you
> >>need to filter join select on HAS to been in the indexed view... which
> >>seems
> >>to been an issue.. that mean ALL data including dimensional data has to
> >>be
> >>persisted again...
> >>
> >>any thoughts on how I can use indexed views in a aggregation aware
> >>environment without have to store ALL possible filterable elements in
> >>the
> >>view.
> >>
> >>thanks!!!
> >>
> >>
> >>
> >>
>|||Its hard for me to believe that MS would release a product like SQL 2005 and
boast about how well their indexed views work without taking in to
consideration that none of there users would want to perform date
aggregations. Not sure if I had mentioned it but i heard oracle has a
function called 'date folding' that performs an analysis on the date
constraints and rewrites the query at run time to pull from the proper unit
of date measure based on the finite date or date range.
Very similar to what the SQL optimizer does when analysis the attribute
grain of the query when deciding to use the indexed view verses the detail
under lining table.
Hmm... Could it be that I am not including the time_dim in the indexed
view...? Does SQL need to know how to create that relationship prior to
considering the indexed view?
Thanks
Eric
"Eric" wrote:
> LOL!!.
> Thanks Mike,
> My findings where exactly what you and Gert-Jan Strik had experienced, for a
> while there I thought I was losing my mindâ?¦
> Sorry bout the ill performing loop logic, I was in a hurry and need to make
> the example script as simple as possible.
> As far as the time_dim having calculated parts I changed them to persisted
> data values and attempted to test again... unfortunately with no avail. In
> addition I also tested this in SQL 2000 enterprise ... and guess what. It
> used the exact same plan that SQL 2005 enterprise did...
> Thanks again.
> Eric
>
>
> "Mike Hodgson" wrote:
> > This is an interesting case. I ran the repro and upped the loop counter
> > that populates the tbl_detail rows to 1,000,000 so there were 10M rows
> > in that table and I also added 10 years worth of dates to time_dim. It
> > took over half an hour to populate the tables on my SAN-attached 4-way
> > 3.6GHz 64-bit box running SQL 2005 EE (not the most efficient population
> > loop) and the query with dbo.time_dim joined in still went back to the
> > base table. It took about 4 sec to run and did a table scan on
> > tbl_detail (because tbl_detail has no indexes at all). Whereas when I
> > simply changed 'dbo.tbl_detail' to 'dbo.mv_summary' in the select
> > statement it ran in 16 millisec and only did half a dozen or so IOs.
> >
> > This looks like a case of the optimiser just not automatically matching
> > the view to the base table when the time_dim table is joined in for some
> > reason. The indexed view definitely *can* be used, as was proven when I
> > just changed the table name to the view name. I'm running Enterprise
> > Edition so it should be considered, and I think the execution plan that
> > scans the entire base table and returns 10M rows is not a good plan (the
> > plan involving the indexed view only returns 1 row from the equivalent
> > branch of the plan) so the optimiser should consider alternatives.
> >
> > I tried a few things, like rewriting the query slightly and
> > restructuring the clustered index on the view, to get it to consider the
> > indexed view automatically all to no avail. To me this looks like a
> > potential bug. Perhaps the fact that the time_dim table contains mostly
> > computed columns, and that's what is getting joined to the based
> > table/indexed view, throws the query optimiser? This may need someone
> > from MS to look at perhaps (or at least someone who has had more sleep
> > than me recently, ie. who is not a football fan - go the Aussies!!)
> >
> > --
> > *mike hodgson*
> > http://sqlnerd.blogspot.com
> >
> >
> >
> > Gert-Jan Strik wrote:
> >
> > >Thanks for the link! Interesting article!
> > >
> > >I guess this is a good time to eat my words, because the article shows
> > >quite a lot of improvements, not just in expanded functionality but also
> > >in query coverage. So there is great improvement.
> > >
> > >Having said that, I guess it is no surprise that it can still be
> > >improved upon.
> > >
> > >The repro script Eric posted (on June 13th) inserted 10,000 rows (4MB)
> > >in tbl_detail, which - for performance testing - is rather few and table
> > >size rather small. So I expanded this to 1 millions rows (421MB). Now
> > >the query optimizer really has a reason to thouroughly search for the
> > >optimal plan, because scanning the table will take a considerable amount
> > >of time.
> > >
> > >Yet the query
> > >
> > > select a.month_begin_date , b.client_id , b.client_desc , sum(sales)
> > >as sales
> > > from dbo.tbl_detail a
> > > inner join dbo.client_dim b on a.client_id = b.client_id
> > > inner join time_dim t on t.month_begin_date = a.month_begin_date
> > > where b.client_desc = 'A0000001TEST'
> > > and t.date_number between '20060601' and '20060630'
> > > group by a.month_begin_date, b.client_id , b.client_desc
> > >
> > >still uses the base table, and on my machine completes after 52632
> > >logical reads in 41 seconds.
> > >
> > >It looks like the optimizer does not recognize that the view can be
> > >used, because the query
> > >
> > > select a.month_begin_date , b.client_id , b.client_desc , sum(sales)
> > >as sales
> > > from mv_summary a
> > > inner join dbo.client_dim b on a.client_id = b.client_id
> > > inner join time_dim t on t.month_begin_date = a.month_begin_date
> > > where b.client_desc = 'A0000001TEST'
> > > and t.date_number between '20060601' and '20060630'
> > > group by a.month_begin_date, b.client_id , b.client_desc
> > >
> > >does use the indexed view, returns the same result and completes after 3
> > >logical reads in 5 ms.
> > >
> > >Gert-Jan
> > >
> > >
> > >Kalen Delaney wrote:
> > >
> > >
> > >>Hi Gert-Jan
> > >>
> > >>The optimizer, since version 6.5, rarely does a full optimize, as this would
> > >>just be too expensive. It basically starts looking for a 'good' plan,
> > >>starting with something simple and getting more and more complex. Checking
> > >>to see of there is a match between the query graph of your query and the
> > >>query graphs for all existing indexed views is way down on the list. So if
> > >>the optimizer thinks it has a good enough plan before it gets that far, it
> > >>stops looking and will never even consider the indexed view.
> > >>
> > >>In a case like this, you might really need to test with much bigger tables.
> > >>
> > >>This whitepaper has a section on "What's New in Indexed Views in SQL Server
> > >>2005".
> > >>
> > >>http://www.microsoft.com/technet/prodtechnol/sql/2005/ipsql05iv.mspx
> > >>
> > >>--
> > >>HTH
> > >>Kalen Delaney, SQL Server MVP
> > >>
> > >>"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
> > >>news:4496E642.F68C6115@.toomuchspamalready.nl...
> > >>
> > >>
> > >>I have one possible reason (which is merely a guess), and a speculation.
> > >>
> > >>For a simple example where the cost difference (in absolute time) is not
> > >>that big (like the one in your repro script), the optimizer might not do
> > >>a full optimize, but stop searching for better query plans when a so
> > >>called 'obvious' plan is found. This mechanism is used to avoid
> > >>'wasting' time on searching for better plans that may never be found,
> > >>and instead to start executing immediately.
> > >>
> > >>The speculation is, that Microsoft might not have invested enough effort
> > >>in analyzing if an indexed view might benefit the query. In your case,
> > >>the indexed view option is simply missed, because using it makes the
> > >>query definitely more efficient. Note that indexed views introduced as
> > >>'recently' as SQL Server 2000. IMO, its introduction this was mostly a
> > >>commercial statement, claiming that Microsoft was no longer behind
> > >>Oracle and IBM (with regard to this feature).
> > >>
> > >>I can't tell if there were any improvements to indexed views (or their
> > >>use) in SQL Server 2005, but I can tell that I have not seen any
> > >>documentation suggesting there were any improvements. If someone has
> > >>seen such documentation, I would be very interested.
> > >>
> > >>HTH,
> > >>Gert-Jan
> > >>
> > >>
> > >>Eric wrote:
> > >>
> > >>
> > >>Does anyone have any ideas as to why this might be happening.
> > >>
> > >>"Eric" wrote:
> > >>
> > >>
> > >>
> > >>Please correct me if I'm wrong here but what's the purpose of sql
> > >>server
> > >>indexed views.. I mean I don't mean to go off on rant here and please
> > >>understand this is not a post to discredited ms sql server 2005 at all,
> > >>I'm
> > >>just trying to get some clarification that's all.
> > >>
> > >>ok.. with that said. from what I understand in a nut shell about 2005
> > >>indexed views is when the view is persisted ALL objects with in the
> > >>view are
> > >>persisted.. which is to be expected.. what I found to my surprises is
> > >>how sql
> > >>server seems to perform aggregated awareness only when all element with
> > >>the
> > >>view are present. in other words if you present any other elements (
> > >>like a
> > >>another dimensional table join that is at the same level of aggregation
> > >>that
> > >>the view is at) currently at seems to break the optimal query plan that
> > >>I
> > >>would expect the optimizer to take.
> > >>
> > >>example:
> > >>
> > >>1. base detail table called table_detail a has 1 million records in it.
> > >>
> > >>
> > >>create table tbl_detail
> > >>( client_id into, invoicedate, sales money)
> > >>
> > >>
> > >>
> > >>2. an indexed view called mv_summary is created over tbl_detail table
> > >>
> > >>
> > >>create view mv_summary as
> > >>select client_id , datepart(yyyy,invoicedate) as invoiceyear,
> > >>sum(sales) as
> > >>sales
> > >>go
> > >>create unique clustered index cidx_yearclient on
> > >>,mv_summary(invoiceyear,
> > >>client_id)
> > >>go
> > >>
> > >>
> > >>3. lets say the mv_summary view is at lower level of aggregation, now
> > >>the
> > >>mv_summary has a 1000 aggregated record from the detail table.
> > >>any select statements ran against the mv_summary view directly the plan
> > >>runs
> > >>as expected. even if you were to query the tbl_detail at the
> > >>aggregation
> > >>level of the view the plan result returns mv_summary as expected via
> > >>sql
> > >>server arrogation awareness. though once you introduce an dimensional
> > >>table
> > >>is at the same level of aggregation to the query ( like a dimensional
> > >>join )
> > >>ok this is were this get weird.
> > >>the plan goes to the tbl_detail not the indexed view as I would expect
> > >>it to.
> > >>basically what I attempted to do was filter on an dimensional
> > >>descriptive
> > >>element via join and where clause.
> > >>
> > >>example :
> > >>
> > >>
> > >>select client_id , sum(sales) as sales
> > >>from tbl_detail
> > >>inner join client_dimension
> > >>on
> > >>tbl_detail.client_id = client_dimension.client_id
> > >>where client_dimension.client_description 'test'
> > >>
> > >>
> > >>
> > >>turns out what I have been able to come up with is ALL elements that
> > >>you
> > >>need to filter join select on HAS to been in the indexed view... which
> > >>seems
> > >>to been an issue.. that mean ALL data including dimensional data has to
> > >>be
> > >>persisted again...
> > >>
> > >>any thoughts on how I can use indexed views in a aggregation aware
> > >>environment without have to store ALL possible filterable elements in
> > >>the
> > >>view.
> > >>
> > >>thanks!!!
> > >>
> > >>
> > >>
> > >>
> >

No comments:

Post a Comment