Monday, March 19, 2012

indexing image columns

Hi,
I'm building a search engine for a website using sql server 2000 (sp3).
My development invironment is Windows XP Pro and the production server
is Windows 2003, and I'm getting the same problem on both machines.
I have two tables in the cataolg, one contaning only text fields
(varchar and ntext). The results come out great for this table, no
problem here. The second table has an image field that can contain just
about any kind of documents: .txt, .doc, .pdf, .xls, .mpg, .zip, etc.
And this image column can sometimes be empty for cetain records. There's
also a column indicating the file type and I'm using the file extension
for this. I wasn't sure what to put here I tried the mime type, the
extension, with and without the period this never changed anything. I
read somewhere that it should be the file extension with the period so
I've set it back to that.
I've done a lot of searching and reading in the past few days but can't
find the problem. I've tryed searches on various types of documents,
including .txt, .doc and .pdf. I'm not absolutely sure but I think my
image column is't being indexed at all; this same table also has a
"title" and "description" field included in the index and if I search
for text contained in either of those 2 columns they turn up in the results.
Any ideas?
Any help would be much appreciated.
tia
Lucas,
Yes. First of all, could you post the full output of -- SELECT @.@.version --
as well as the table schema of both your tables (via sp_help <table_name>)
as will help in understanding your environment. The datatype, size and
nullablity of the "file extension" column is very important in getting this
to work correctly. You can include or exclude the "." period when populating
the values in your "file extension" column, but then you will need to define
it as a varchar(4) or you can use the sysname datatype.
As you second table contains an image column and the "file extension"
column, it can only hold binary file types, such as doc, .pdf, .xls, .mpg,
..zip, but not .txt. For text (.txt) files, you must store this type file in
a Text or NText datatype as the pure text will be FT Indexed without the
file extension. For non-MS Office file types, such as Adobe PDF, MPG and ZIP
files, you will need to install 3rd party IFilters that support these file
types, these can be downloaded from:
Adobe PDF IFilter v6.0:
http://www.adobe.com/support/downloa...11&fileID=2457
Zip IFilter:
http://www.ifiltershop.com/zipfilter.html
Zip IFilter:
http://www.4-share.com/
mp3 (MEPG) IFilter
http://www.meticulus.com/mp3filter.html
Microsoft Office file types (.doc, .xls, .ppt) and text (.txt) and HTML
(.htm) files are FT Indexed out-of-the-box by SQL Server 2000. See SQL
Server 2000 BOL title "Filtering Supported File Types" from more info.
Finally, you should always review the server's Application event log for
information on the success &/or failure of FT Indexing specific file types.
Regards,
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"lucas" <lucarc@.hotmail.qc> wrote in message
news:xgXDd.32709$Y61.1126448@.wagner.videotron.net. ..
> Hi,
> I'm building a search engine for a website using sql server 2000 (sp3).
> My development invironment is Windows XP Pro and the production server
> is Windows 2003, and I'm getting the same problem on both machines.
> I have two tables in the cataolg, one contaning only text fields
> (varchar and ntext). The results come out great for this table, no
> problem here. The second table has an image field that can contain just
> about any kind of documents: .txt, .doc, .pdf, .xls, .mpg, .zip, etc.
> And this image column can sometimes be empty for cetain records. There's
> also a column indicating the file type and I'm using the file extension
> for this. I wasn't sure what to put here I tried the mime type, the
> extension, with and without the period this never changed anything. I
> read somewhere that it should be the file extension with the period so
> I've set it back to that.
> I've done a lot of searching and reading in the past few days but can't
> find the problem. I've tryed searches on various types of documents,
> including .txt, .doc and .pdf. I'm not absolutely sure but I think my
> image column is't being indexed at all; this same table also has a
> "title" and "description" field included in the index and if I search
> for text contained in either of those 2 columns they turn up in the
results.
> Any ideas?
> Any help would be much appreciated.
> tia
|||Hi John,
Thanks for your response. I finally got it to work; it was nothing more
than an error in my query, I wasn't actualy ever searching my image
column.....d'oh!
It's working fine now.

No comments:

Post a Comment