Showing posts with label fixed. Show all posts
Showing posts with label fixed. Show all posts

Monday, March 26, 2012

influence the length of sub-tokens indexed in fuzzy look-up

can I influence the lenghth of sub-tokens that are indexed in fuzzy lookup? Is it just fixed as 4?

ThanksAt this time the size is fixed. There is a performance trade-off between Error Tolerant Index size and the length and number of sub-tokens indexed per record. Using a fixed length may cause errors in short tokens to be missed if there are no other tokens in common between the input and target records. One approach, if your reference table is small, is to set the Exhaustive property to True. This will make Fuzzy Lookup skip the ETI and compare against each and every record in the reference table. Again, this is an expensive operation for large ref tables, so you might consider only doing it if the input record has only one short token. Likewise, you might also create a view of your reference table that contains only records of short length and do the Exhaustive match on just the view. You could have this as a separate branch in your Data Flow pipeline and use the Conditional Split transform to direct only short input records down it.

Hope this helps,
-Kris

Monday, March 12, 2012

indexing

Hi,
There is a table which I regularly run a select query on.
The select query always has a fixed where clause on only three of the columns with different parameters.

This is a query that runs each time:

select * from tblData
where
PersonNo = 2 and EmployeeType = 4 and DataDate = getdate()

This are the types of indexes the table currently has:
One index for each of these three fields
i.e. index1 for PersonNo
index2 for EmployeeType
index3 for DataDate
In addition to the above, I also have created a covering index as follows
index4 for PersonNo,EmployeeType,DataDate

Is what I have enough for indexes on this table please?
Is there anything else I have to do on indexing this table?
Thanks

Indexing is a big subject. At a base level, you should index those columns that will be queried most often, in the order that they are most often queried. That being said, there's a lot more to learn. Here's some beginning info, which has pointers to more advanced concepts:

http://www.informit.com/guides/content.asp?g=sqlserver&seqNum=200&rl=1

|||I am baffled by the DataDate = getdate(); I am surprised that this works; is this just something you meant generically or are you really searching like this?