Wednesday, March 7, 2012

Indexes on datasets created froma Union All task

Hi

I have a package that takes data from 3 different sources, unions them all together and then performs a lookup against a tabel on a different server. The 3 sources of data creates a dataset with quite a large volume of rows (approx 100,000) and the look up takes an age to complete.

Is there anyway of adding an index of sorts to the dataset created by the union to speed the process up? I guess i could output the union all to a temp table and look up against this but wondered if there was another way

thanks

scott

100,000 rows is not much in my experience and fully-cached lookups are very fast once the reference data is loaded. I assume you're not running the lookup in cache mode?

If you are running in non-cached mode, I would think you would be focusing your indexing efforts on the reference table, not the rows in the pipeline. And is that really necessary?

I don't follow you on the temp table idea. It sounds like the data from the union all is the reference data. I can't visualize the scenario.

If the lookup is cached, are you sure the slowdown is in the Lookup and not downstream? A component cannot output rows if the next component is not ready to receive them. Try replacing the everything below the Lookup with a Union All and no destination. It should scream.
|||

Hi Jay

thanks for the reply

I made a mistake it is only 10,000 rows so even less!

I have 3 sources going into a union all which takes about a 2 seconds to 'go green'. next the Union and the look up go yellow and it takes about 90 seconds to complete. After this I simply have 2 destinations - success from look up goes to one tabel and error output goes to another table. Once the look up is complete the population of the 2 destination tables takes the blink of an eye.

I am not sure what you mean by caching though?

cheers

scott

|||

On the Advanced tab of the Lookup component, make sure that "Enable Memory Restriction" is not checked. Also, make sure that you are not using the table selection method in the lookup - you should specify a SQL statement that returns the smallest number of columns possible. So ideally you would return two columns, the value to match on, and the value to return.

How many rows in your lookup?

|||

Racsco wrote:

I have 3 sources going into a union all which takes about a 2 seconds to 'go green'. next the Union and the look up go yellow and it takes about 90 seconds to complete. After this I simply have 2 destinations - success from look up goes to one tabel and error output goes to another table. Once the look up is complete the population of the 2 destination tables takes the blink of an eye.

90 seconds for 10,000 rows is 111 rows/second. That could be indicative of not caching. In this situation, the Lookup would issue 10,000 select statements to your reference table. When caching is enabled, the lookup should be almost as fast as the union.

I want to make sure you understand my point about the downstream components. If one of those destinations is slow (i.e. not using fast load), then the lookup and the union will appear slow as they wait for the destination to accept new rows.
|||

Hi John

I checked the advanced tab and enable memory restriction was checked. I unchecked it and ran it again and it was rapid to say the least. However out of the 10000 rows only 770 matched the lookup condition and the rest errored. I tested it again and checked the enable memory restriction box, ran it and all passed the look up condition

any ideas as to why?

If i run the look up as a join in query analyser all 10000 rows pass the look up condition

|||

Hi Jay

both destinations are set to table or view fast load and both are tables. Please see my previous reply above your most recent post as i got it going much faster but now have a problem with the look up

cheers

scott

|||

Racsco wrote:

Hi John

I checked the advanced tab and enable memory restriction was checked. I unchecked it and ran it again and it was rapid to say the least. However out of the 10000 rows only 770 matched the lookup condition and the rest errored. I tested it again and checked the enable memory restriction box, ran it and all passed the look up condition

any ideas as to why?

If i run the look up as a join in query analyser all 10000 rows pass the look up condition

I bet it's the collation. SQL Server performs case-insensitive comparisons by default. The lookup is case-sensitive. The would enable SQL Server to make matches that the lookup can't. You can try converting everything to uppercase.

No comments:

Post a Comment