Hi,
Now that I have swish-e installed and mostly working, I have run into a
slight problem, more than likely due to my own lack of understanding. I
want swish-e to index most of the files at my site, use the filters
where necessary, and for certain types of files, namely graphics (e.g.
.gif .jpg etc.) to only index the file path/name. So my config files
are set up this way (edited for brevity):
in my swish-e config file:
-----
IndexDir spider.pl
NoContents .gif .jpg .png .cgi .pl .log .jar .ico .js .class .log .sql
.csv .dir .idx .dat
IndexContents HTML* .htm .html .shtm .shtml .css
IndexContents TXT* .txt .text
IndexContents XML* .xml .wml .rdf .rss
DefaultContents HTML
SwishProgParameters
/home/afana/public_html/swish-e/lib/swish-e/SwishSpiderConfig.pl
http://www.afana.com http://www.afana.com/blog/archives/
http://www.afana.com/album/ http://www.afana.com/webbbs/bbs1/
http://www.afana.com/webbbs/bbs0
-----
in my spider.pl config file I have this:
-----
filter_content => \&filter_content,
sub filter_content {
my ( $uri, $server, $response, $content_ref ) = @_;
# Uncomment this to enable debugging of SWISH::Filter
# $ENV{FILTER_DEBUG} = 1;
my $content_type = $response->content_type;
my $uri_ext = $uri->path;
# Ignore text/* content type -- no need to filter
return 1 if !$content_type || (($content_type =~ m!^text/!) ||
($uri_ext =~ /\.(gif|jpg|jpeg|png)?$/));
# Load the module - returns FALSE if cannot load module.
unless ( $filter ) {
eval { require SWISH::Filter };
if ( $@ ) {
$server->{abort} = $@;
return;
}
$filter = SWISH::Filter->new;
unless ( $filter ) {
$server->{abort} = "Failed to create filter object";
return;
}
}
# If not filtered return false and doc will be ignored (not indexed)
my $doc = $filter->convert(
document => $content_ref,
name => $response->base,
content_type => $content_type,
);
return unless $doc;
# return unless $doc->was_filtered # could do this since checking
for text/* above
return if $doc->is_binary;
$$content_ref = ${$doc->fetch_doc};
# let's see if we can set the parser.
$server->{parser_type} = $doc->swish_parser_type || '';
return 1;
}
-----
When the indexing runs, swish-e attempts to read and interpret the jpeg
files rather than simply adding the file path and name to the index as
indicated in the NoContent directive.
So what I am doing wrong? (Or at least... where do I start? :-) )
Regards,
-Rob de Santos
-Columbus, Ohio USA
Chairman of the Board,
Australian Football Association of North America (AFANA)
ph: 1-888-4AFANA1 (North America) (1-888-423-2621)
ph: 1-614-338-0002 (outside NA)
e-mail: rdesantos(at)not-real.afana.com web: <http://www.afana.com>
Received on Tue Jan 27 23:31:23 2004