Monday, March 19, 2012

Indexing PDFs in SQL Server 2005

Did you run the pdf's through filtdump to see if there is any textual info
in them? Sometimes pdf's are images only.
Can you send me some of these pdf's offline or post them here?
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"ddaiker" <ddaiker@.gmail.com> wrote in message
news:1167410997.650379.296710@.i12g2000cwa.googlegr oups.com...
> I'm having troubles getting SQL Server 2005 FTS to index PDFs stored in
> and image field with a doctype field value of "PDF". The file is being
> processed as I can query on the filename but I get no hits when
> querying any of the content. FiltDump returns all the plain text of
> the document.
> I've installed Acrobat Reader 7 so the IFiler being used is "C:\Program
> Files\Adobe\Acrobat 7.0\Reader\AcroRdIF.dll"
> I did some Googling and found some references to allowing SQL Server to
> use system filters and unsigned filters so I ran these two commands on
> my database, restarted the services and reindexed but still no luck.
> exec sp_fulltext_service 'load_os_resources',1
> exec sp_fulltext_service 'verify_signature', 0
> Anybody have any ideas what I'm doing wrong?
>
Hilary, thanks agian for your help with my FTS problems.
I've tried several PDFs, FiltDump spits out the expected text for all
of them. I'm emailing you one of the PDFs that I created as a test
using MS Word 2003 and the CutePDF printer driver.
Hilary Cotter wrote:[vbcol=seagreen]
> Did you run the pdf's through filtdump to see if there is any textual info
> in them? Sometimes pdf's are images only.
> Can you send me some of these pdf's offline or post them here?
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
> "ddaiker" <ddaiker@.gmail.com> wrote in message
> news:1167410997.650379.296710@.i12g2000cwa.googlegr oups.com...
|||Well, after redoing the following steps it's now indexing my PDFs (and
the PDF properties too!).
1) Run "exec sp_fulltext_service 'load_os_resources',1"
2) Run "exec sp_fulltext_service 'verify_signature', 0"
3) Restart SQL Server
4) Delete all the rows in my table with PDFs in them and re-insert
them.
I know I did all these last week but this time it worked? Now I'll
experment with needing both these settings and what Acrobat 8.0 does to
the equasion.
Hilary Cotter wrote:[vbcol=seagreen]
> Did you run the pdf's through filtdump to see if there is any textual info
> in them? Sometimes pdf's are images only.
> Can you send me some of these pdf's offline or post them here?
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> Looking for a FAQ on Indexing Services/SQL FTS
> http://www.indexserverfaq.com
> "ddaiker" <ddaiker@.gmail.com> wrote in message
> news:1167410997.650379.296710@.i12g2000cwa.googlegr oups.com...

No comments:

Post a Comment