We currently have an application that OCRs a tif image and places the
recognized text in a SQL table.
The table is then indexed by the FTS service.
The app then allows you to search for any of the text and display the
corresponding tif image in a viewer.
I would also like to be able to search WORD docs for their contents using
the same catalog.
What is the proper manner to have the WORD docs indexed by the FTS service?
Do I need to extract the text from the WORD doc and store it in the table
much like the recognized text
from the OCR process?
Thanks
John,
What is the relationship between FTS and Indexing Service?
It looks like the Indexing Service maintains a catalog much the same as FTS.
We have support for WORD in our app already by storing the WORD doc in our
file warehouse on the file system.
We can display the .doc file in our viewer the same as a .tif image.
We currently don't have functionality to search for data in the WORD docs,
only text from the OCR process.
Since the WORD file is already stored in the file system and referenced by
our application, I was wondering about the feature that is titled "Full-text
Querying of File Data"
It looks like it uses the Index Service to allow searching for data in files
on the file system.
Wouldn't that work for my scenario?
It appears that when we want to search for data contained in a WORD doc, we
would use the SCOPE function in our query. Otherwise, we continue to search
for text from the OCR process.
Can you provide some insight?
Thanks
"John Kane" <jt-kane@.comcast.net> wrote in message
news:O7TiL6BWEHA.2928@.tk2msftngp13.phx.gbl...
> Binder,
> What version of SQL Server (2000 or 7.0) and on what OS platform (NT4.0,
> Win2K, or Win2003) is it installed? Could you post the full output of
> SELECT @.@.version -- as this is helpful to answering your question.
> If you are using SQL Server 2000, you can use it's new feature (this
feature
> is not present in SQL 7.0) - from SQL Sever 2000 BOL title "Filtering
> Supported File Types". This feature allows you to store the binary version
> of the MS Word document and then in your table define a file extension
> column and populate it with the correct values ("doc" for MS Word
document)
> and then run a Full Population and then you can use the CONTAINS or
FREETEXT
> quires to FTS the contents of these files stored in a sql table>
> If you are using SQL Server 7.0, you will need to setup a process to
extract
> the MS Word text and then store this text in a TEXT column and the FT
Index[vbcol=seagreen]
> that column, much as you do for your OCR'ed data.
> Regards,
> John
>
> "Binder" <rgondzur@.NO_SPAM_aicsoft.com> wrote in message
> news:eIXu546VEHA.2716@.tk2msftngp13.phx.gbl...
using[vbcol=seagreen]
> service?
table
>
|||System Parameters:
Windows 2000 Server
Microsoft SQL Server 2000 - 8.00.194 (Intel X86)
Aug 6 2000 00:57:48
Copyright (c) 1988-2000 Microsoft Corporation
Enterprise Edition on Windows NT 5.0 (Build 2195: Service Pack 4)
"Binder" <rgondzur@.NO_SPAM_aicsoft.com> wrote in message
news:OcUOqQGWEHA.1012@.TK2MSFTNGP09.phx.gbl...
> John,
> What is the relationship between FTS and Indexing Service?
> It looks like the Indexing Service maintains a catalog much the same as
FTS.
> We have support for WORD in our app already by storing the WORD doc in our
> file warehouse on the file system.
> We can display the .doc file in our viewer the same as a .tif image.
> We currently don't have functionality to search for data in the WORD docs,
> only text from the OCR process.
> Since the WORD file is already stored in the file system and referenced by
> our application, I was wondering about the feature that is titled
"Full-text
> Querying of File Data"
> It looks like it uses the Index Service to allow searching for data in
files
> on the file system.
> Wouldn't that work for my scenario?
> It appears that when we want to search for data contained in a WORD doc,
we
> would use the SCOPE function in our query. Otherwise, we continue to
search[vbcol=seagreen]
> for text from the OCR process.
> Can you provide some insight?
> Thanks
>
>
> "John Kane" <jt-kane@.comcast.net> wrote in message
> news:O7TiL6BWEHA.2928@.tk2msftngp13.phx.gbl...
> feature
version
> document)
> FREETEXT
> extract
> Index
> using
> table
>
|||Binder,
Q. What is the relationship between FTS and Indexing Service?
A. While they use the same underlying Microsoft Search Technology, they full
text index different servers. Indexing Service handles the server's files on
its local disk drive, while FTS (or really the "Micrsoft Search" service
[mssearch.exe]) full text indexes textaul (char, nvarchar, text, etc.)
columns in SQL Server tables. Yes, it seems to me that using the Indexing
Service, should work for you.
What is the name of your app? Does it support SQL Server 2000? If so, does
it support the storage of MS Word documents in columns that are defined with
the IMAGE datatype? Is the feature that is titled "Full-text Querying of
File Data", a feature of your app, or are you referring to the feature of
SQL Severer (version) ?
In addition to SQL Server's Full-text Search (FTS) component, you can also
define a "Linked Server" to the Indexing Service via using MSIDX, the "OLE
DB Provider for Microsoft Indexing Service". You would define this linked
server via sp_addlinkedserver. Below is an example from SQL Server 2000
Books Online:
G. Use the Microsoft OLE DB Provider for Indexing Service
This example creates a linked server and uses OPENQUERY to retrieve
information from both the linked server and the file system enabled for
Indexing Service.
EXEC sp_addlinkedserver FileSystem,
'Index Server',
'MSIDXS',
'Web'
GO
USE pubs
GO
IF EXISTS(SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'yEmployees')
DROP TABLE yEmployees
GO
CREATE TABLE yEmployees
(
id int NOT NULL,
lname varchar(30) NOT NULL,
fname varchar(30) NOT NULL,
salary money,
hiredate datetime
)
GO
INSERT yEmployees VALUES
(
10,
'Fuller',
'Andrew',
$60000,
'9/12/98'
)
GO
IF EXISTS(SELECT TABLE_NAME FROM INFORMATION_SCHEMA.VIEWS
WHERE TABLE_NAME = 'DistribFiles')
DROP VIEW DistribFiles
GO
CREATE VIEW DistribFiles
AS
SELECT *
FROM OPENQUERY(FileSystem,
'SELECT Directory,
FileName,
DocAuthor,
Size,
Create,
Write
FROM SCOPE('' "c:\My Documents" '')
WHERE CONTAINS(''Distributed'') > 0
AND FileName LIKE ''%.doc%'' ')
WHERE DATEPART(yy, Write) = 1998
GO
SELECT *
FROM DistribFiles
GO
SELECT Directory,
FileName,
DocAuthor,
hiredate
FROM DistribFiles D, yEmployees E
WHERE D.DocAuthor = E.FName + ' ' + E.LName
GO
Regards,
John
"Binder" <rgondzur@.NO_SPAM_aicsoft.com> wrote in message
news:OcUOqQGWEHA.1012@.TK2MSFTNGP09.phx.gbl...
> John,
> What is the relationship between FTS and Indexing Service?
> It looks like the Indexing Service maintains a catalog much the same as
FTS.
> We have support for WORD in our app already by storing the WORD doc in our
> file warehouse on the file system.
> We can display the .doc file in our viewer the same as a .tif image.
> We currently don't have functionality to search for data in the WORD docs,
> only text from the OCR process.
> Since the WORD file is already stored in the file system and referenced by
> our application, I was wondering about the feature that is titled
"Full-text
> Querying of File Data"
> It looks like it uses the Index Service to allow searching for data in
files
> on the file system.
> Wouldn't that work for my scenario?
> It appears that when we want to search for data contained in a WORD doc,
we
> would use the SCOPE function in our query. Otherwise, we continue to
search[vbcol=seagreen]
> for text from the OCR process.
> Can you provide some insight?
> Thanks
>
>
> "John Kane" <jt-kane@.comcast.net> wrote in message
> news:O7TiL6BWEHA.2928@.tk2msftngp13.phx.gbl...
> feature
version
> document)
> FREETEXT
> extract
> Index
> using
> table
>
No comments:
Post a Comment