Wednesday, March 21, 2012

Indexing text within pdf content as a text file

Hello everyone,
I have a very strange problem : I've installed SQL server 2005, create
a storage for full index and full text index on a table. I've also
installed Adobe IFilter. Searches are OK but only on one criteria I
have problem. let me explain.
I'm searching in files in database for "c#" and some pdf files returned
by search are not containing "c#". Opening these PDF files with notepad
and searching for string "c#" let me find out why this file is returned
by search... Is it really the problem or not ? (is seems to do the same
on word files too...)
Thanks for your help.
Regards,
Ben.
SQL FTS interprets C# as C#, but c# as c. So you need to capitalize the C in
your searches and content for this to work correctly.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
<benjamin.soulier@.gmail.com> wrote in message
news:1169490279.707200.130060@.a75g2000cwd.googlegr oups.com...
> Hello everyone,
> I have a very strange problem : I've installed SQL server 2005, create
> a storage for full index and full text index on a table. I've also
> installed Adobe IFilter. Searches are OK but only on one criteria I
> have problem. let me explain.
> I'm searching in files in database for "c#" and some pdf files returned
> by search are not containing "c#". Opening these PDF files with notepad
> and searching for string "c#" let me find out why this file is returned
> by search... Is it really the problem or not ? (is seems to do the same
> on word files too...)
> Thanks for your help.
> Regards,
> Ben.
>
|||Hello Hilary,
Thanks for your quick answer, but it seems that the problem still
remains : I was wondering if it has something to do with the full text
index in database, as I configured it to case insensitive.
Do I have to put it back to case sensitive to make it work ?
Do I have to change noise words files (I've already removed "C" letter
from neutral and english files) ?.
Hilary Cotter a crit :[vbcol=seagreen]
> SQL FTS interprets C# as C#, but c# as c. So you need to capitalize the Cin
> your searches and content for this to work correctly.
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
>
> <benjamin.soulier@.gmail.com> wrote in message
> news:1169490279.707200.130060@.a75g2000cwd.googlegr oups.com...
|||I tried to do a test for myself on this :
I created a new table, with full text index in case sensitive mode,
adding documents to it.
I still get files not containing C# values (all types of documents).
I even tried to use CONTAINSTABLE and FREETEXTTABLE, but problem is the
same...
Any ideas ?
benjamin.soulier@.gmail.com a crit :[vbcol=seagreen]
> Hello Hilary,
> Thanks for your quick answer, but it seems that the problem still
> remains : I was wondering if it has something to do with the full text
> index in database, as I configured it to case insensitive.
> Do I have to put it back to case sensitive to make it work ?
> Do I have to change noise words files (I've already removed "C" letter
> from neutral and english files) ?.
> Hilary Cotter a crit :
|||I take it you are Swiss and using the German or French (or Italian) word
breakers. In these languages c# and C# are indexed as c and C# - a search on
c# or C# will match with c and c# (lower and upper case). English
unfortunately is the exception here. c# is indexed as c, and C# is indexed
as C#, the search is done on C# which will match with C# and a search on c#
will match with c.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
<benjamin.soulier@.gmail.com> wrote in message
news:1169542818.105342.131360@.a75g2000cwd.googlegr oups.com...
Hello Hilary,
Thanks for your quick answer, but it seems that the problem still
remains : I was wondering if it has something to do with the full text
index in database, as I configured it to case insensitive.
Do I have to put it back to case sensitive to make it work ?
Do I have to change noise words files (I've already removed "C" letter
from neutral and english files) ?.
Hilary Cotter a crit :[vbcol=seagreen]
> SQL FTS interprets C# as C#, but c# as c. So you need to capitalize the C
> in
> your searches and content for this to work correctly.
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
>
> <benjamin.soulier@.gmail.com> wrote in message
> news:1169490279.707200.130060@.a75g2000cwd.googlegr oups.com...
|||Hello,
I need to implement full text search with in PDF and Text files along with tables in sql server 2000 , I think you have done some simimlar things ,
Can you give me detail s on how i can achieve same .
Thanks for your help in advance
EggHeadCafe.com - .NET Developer Portal of Choice
http://www.eggheadcafe.com

No comments:

Post a Comment