According to the Books Online information on full-text indexing, there
are limits on the size of a file that can be indexed in an image
column: 16MB filesize, 256 KB of filtered text. I've exceeded those
limits in my testing (with Word docs), and still appear to be able to
access information in those files with CONTAINS. Is the documentation
out of date? Are there only certain conditions under which those limits
apply? The word I'm searching for appears only at the end of the test
document, so it's not indexing only the first part of the file...
Since it seems to be a common question here, this is my @.@.version:
Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002
14:22:05 Copyright (c) 1988-2003 Microsoft Corporation Enterprise
Edition on Windows NT 5.2 (Build 3790: )
And, just for clarity, I don't have any problem with SQL Server
indexing more than I had planned on, I just don't want any surprises
down the road.
Thanks for any ideas you have,
Joel
Last time I tested, when the hard limit was exceeded the remaining content
was not indexed.
So if you index a document containing more than 256k of text, and then put
the word rats at the end, and then tried to search on the word rats, you
would not get hits to this row, if the word rats did not occur in the first
256k of text.
One question for you is did these word docs contains any images? Images will
not be indexed, and can swell the document size, without pushing you over
the 256 k limit.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
<nospamforjoel@.yahoo.com> wrote in message
news:1105371120.743181.194610@.c13g2000cwb.googlegr oups.com...
> According to the Books Online information on full-text indexing, there
> are limits on the size of a file that can be indexed in an image
> column: 16MB filesize, 256 KB of filtered text. I've exceeded those
> limits in my testing (with Word docs), and still appear to be able to
> access information in those files with CONTAINS. Is the documentation
> out of date? Are there only certain conditions under which those limits
> apply? The word I'm searching for appears only at the end of the test
> document, so it's not indexing only the first part of the file...
> Since it seems to be a common question here, this is my @.@.version:
> Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002
> 14:22:05 Copyright (c) 1988-2003 Microsoft Corporation Enterprise
> Edition on Windows NT 5.2 (Build 3790: )
> And, just for clarity, I don't have any problem with SQL Server
> indexing more than I had planned on, I just don't want any surprises
> down the road.
> Thanks for any ideas you have,
> Joel
>
|||No, the documents that are confusing me did not have any images. They
were just a bunch of text, pasted repeatedly. I ran them through
filtdump, to make sure they really did have more than 256K of text. The
test you describe is exactly what I did--I put words at the very end of
the document that I was sure weren't in the document before, and once
the catalog rebuilt, I searched for them, and found them.
Thanks,
Joel
|||Let me try this myself. I did try this several years ago so this may have
changed with a recent sp.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
<nospamforjoel@.yahoo.com> wrote in message
news:1105386126.803682.118060@.c13g2000cwb.googlegr oups.com...
> No, the documents that are confusing me did not have any images. They
> were just a bunch of text, pasted repeatedly. I ran them through
> filtdump, to make sure they really did have more than 256K of text. The
> test you describe is exactly what I did--I put words at the very end of
> the document that I was sure weren't in the document before, and once
> the catalog rebuilt, I searched for them, and found them.
> Thanks,
> Joel
>
|||Joel,
Q. Is the documentation out of date?
A. Actually, it is wrong as there is a DOC bug filed for this limited in the
BOL title "Filtering Supported File Types" - "Note For full-text indexing,
a document must be less than 16 megabytes (MB) in size and must not contain
more than 256 kilobytes (KB) of filtered text" and this limit can be
over-ridden via KB article: 308771 (Q308771) "PRB: A Full-Text Search May
Not Return Any Hits If It Fails to Index a File" at
http://support.microsoft.com/default...;en-us;308771. and the
FilterProcessMemoryQuota registry key value. However, you should be careful
in making adjustments to this registry key and incrementally increase it
based upon your server's memory and avg. file sizes.
Q. Are there only certain conditions under which those limits apply?
A. Not specific conditions, but you should ensure that you have enough disk
free space (at least always 15% free) at all times where you have your FT
Catalog folder located as temp. files are written out as needed for the
processing of large files at the same location.
Regards,
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
<nospamforjoel@.yahoo.com> wrote in message
news:1105386126.803682.118060@.c13g2000cwb.googlegr oups.com...
> No, the documents that are confusing me did not have any images. They
> were just a bunch of text, pasted repeatedly. I ran them through
> filtdump, to make sure they really did have more than 256K of text. The
> test you describe is exactly what I did--I put words at the very end of
> the document that I was sure weren't in the document before, and once
> the catalog rebuilt, I searched for them, and found them.
> Thanks,
> Joel
>
|||I'm not entirely sure I'm clear. If I'm reading that article right, it
looks like there is still some point at which indexing a document will
fail due to lack of memory. However, that point cannot be determined by
examining the file size of the document. Is that accurate?
Thanks,
Joel
John Kane wrote:
> Joel,
> Q. Is the documentation out of date?
> A. Actually, it is wrong as there is a DOC bug filed for this limited
in the
> BOL title "Filtering Supported File Types" - "Note For full-text
indexing,
> a document must be less than 16 megabytes (MB) in size and must not
contain
> more than 256 kilobytes (KB) of filtered text" and this limit can be
> over-ridden via KB article: 308771 (Q308771) "PRB: A Full-Text Search
May
> Not Return Any Hits If It Fails to Index a File" at
> http://support.microsoft.com/default...;en-us;308771. and
the
> FilterProcessMemoryQuota registry key value. However, you should be
careful
> in making adjustments to this registry key and incrementally increase
it
> based upon your server's memory and avg. file sizes.
> Q. Are there only certain conditions under which those limits apply?
> A. Not specific conditions, but you should ensure that you have
enough disk
> free space (at least always 15% free) at all times where you have
your FT
> Catalog folder located as temp. files are written out as needed for
the[vbcol=seagreen]
> processing of large files at the same location.
> Regards,
> John
> --
> SQL Full Text Search Blog
> http://spaces.msn.com/members/jtkane/
>
> <nospamforjoel@.yahoo.com> wrote in message
> news:1105386126.803682.118060@.c13g2000cwb.googlegr oups.com...
They[vbcol=seagreen]
The[vbcol=seagreen]
end of[vbcol=seagreen]
once[vbcol=seagreen]
|||You're welcome, Joel,
Yea, the RESOLUTION section states "Unfortunately, there is no way to
calculate directly from the size of the document to be full-text indexed how
much memory the filter process needs. The memory quota only exists to
protect against badly written filters, and they do spike to large amounts if
some bogus size contains a negative number. The quota itself can be made
larger, as long as it is finite. "
While no upper limit size for documents to be FT Indexed is documented, you
can increase the amount of text to be indexed by modifying the
FilterProcessMemoryQuota registry key value and you need to test your
documents on your server to get a feel for what is the "finite" limit and
monitor the server's application event log for "Microsoft Search" source
events for very large files that fail.
Regards,
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
<nospamforjoel@.yahoo.com> wrote in message
news:1105452953.633306.318410@.f14g2000cwb.googlegr oups.com...
> I'm not entirely sure I'm clear. If I'm reading that article right, it
> looks like there is still some point at which indexing a document will
> fail due to lack of memory. However, that point cannot be determined by
> examining the file size of the document. Is that accurate?
> Thanks,
> Joel
> John Kane wrote:
> in the
> indexing,
> contain
> May
> the
> careful
> it
> enough disk
> your FT
> the
> They
> The
> end of
> once
>
|||I just tried it again. I indexed a 32 Mg text and a 16 Mg word doc and have
confirmed that at least first 256 k of extracted text is indexed, but that
tokens at the end of the documents are not. Any textual data after this 256k
boundary appears to be ignored.
I have the same version of SQL Server as you, only I am running on Win2k.
Let me try with Win2003.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
"Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
news:u6I6P509EHA.3376@.TK2MSFTNGP12.phx.gbl...
> Let me try this myself. I did try this several years ago so this may have
> changed with a recent sp.
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> <nospamforjoel@.yahoo.com> wrote in message
> news:1105386126.803682.118060@.c13g2000cwb.googlegr oups.com...
>
Monday, March 19, 2012
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment