On Thu, Feb 24, 2005 at 12:21:07PM -0800, Thomas Angst wrote:
> With your information about the HTML* parser, I changed the
> DefaultContents to TXT. swish-e is now several times faster. But I get
> now for each Image a warning. Do you know how I can suppress these
> warnings and is there any limitation if I'm using TXT for the
> DefaultContent-Scanner, when I will set all other scanable files to
> another scan engine?
I guess I'd use locate(1) to index and search for file names.
I'm not sure, but I think NoContents was for indexing only <title> of
html docs, no so much for indexing the names of binary files --
because it's doesn't make much sense to read in and try to parse
For that to work right swish needs to look at the file name before
fetching and if it's not HTML* then don't read it and just index the
And if I was indexing images I think I'd index a description file and
then use ReplaceRules to change the description file to the actual
image name when indexing -- that might make searching for images more
And if I really just only wanted to index the file names then I would
use DirTree.pl and then create a text document on-the-fly for the
> Warning: Substituted 677 embedded null character(s) in file
> '/var/samba/daten/dokumentationen/mozillamailer/pfeile.bmp' with a newline
Swish indexes text -- so sending it a binary file will confuse it.
Unsubscribe from or help with the swish-e list:
Help with Swish-e:
Received on Thu Feb 24 16:11:56 2005