On Thu, Dec 09, 2004 at 08:29:49AM -0800, Stewart, John wrote:
>
> Well, I've had to revert to not indexing .pdf files, and that seems to work
> all right. Does anyone have any suggestion on limiting memory usage indexing
> pdf files, or getting the -e flag to work?
If it's the indexing of PDF files specifically (not just the number or
size of your docs) then it's not a problem with swish-e itself that's
eating memory.
Hum, looking back at your config:
IndexOnly .html .txt .pdf .htm .doc
NoContents .gif .jpg
#
# Don't do these pdf's - crashing
FileRules pathname contains marketing/competition
#
# Index the main internal section
#
#IndexDir /www/internal
ReplaceRules remove /www/internal
#
# Index the internal_web section on titan
#
#IndexDir /home/groups/internal_web
ReplaceRules remove /home/groups/internal_web
#
# Index the manuals section
#
IndexDir /home/manuals
ReplaceRules remove /home
How are you converting pdf to a format that swish-e can parse? Swish
can parse text, html, and xml. All you are doing above is telling
swish to index files that end in .html .txt .pdf .htm and .doc, but
swish doesn't know how to index .pdf or .doc without using a filter of
some type.
(Also NoContents .gif .jpg has no effect -- they are not included in
your "IndexOnly" list of extensions to index.)
--
Bill Moseley
moseley@hank.org
Unsubscribe from or help with the swish-e list:
http://swish-e.org/Discussion/
Help with Swish-e:
http://swish-e.org/current/docs
swish-e@sunsite.berkeley.edu
Received on Thu Dec 9 10:31:36 2004