Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] First time Swish-e user with some thoughts/feedback

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Mon Feb 26 2007 - 19:06:52 GMT
Jason Purdy scribbled on 2/21/07 8:51 AM:
> I just got up & running with Swish-e and I hit a few speedbumps along 
> the way, so I thought I'd share them
> 
> 1) Spidering a site (-S http vs. -S prog + spider.pl)

> my %opts = ( 'raise_error' => 1 );
> $content = $request->decoded_content( %opts );


maybe set raise_error and then wrap decoded_content() in an eval() so that you 
don't lose all your work to that failed doc?


> The HTML Validator is a great tool to figure out where your source is 
> messing up.  Come to find out, it was an included database value that 
> was everywhere.  What a mess. :)
> 

in general, swish-e plows ahead with indexing despite failed parsing on all 
levels: in spider.pl, in the libxml2 parser, etc. If the indexer fails on one 
doc, it carps (to one degree or another) and plunges on to the next doc. That 
design decision does seem a little reckless. OTOH, I suspect the lack of truly 
incremental indexing means that you could lose hours of indexing work if a 
single doc failed to parse late in the process.

But you're right: a more helpful error message would be appropriate here, imo.


> 2) Using a template system
> 
> I was excited to see that you could use HTML::Template w/ the search 
> results, as that's our template language of choice, but I couldn't find 
> really good documentation on how to configure .swishcgi.conf accordingly 
> until I dove into the source code for swish.cgi.  Here is my .swishcgi.conf:
> 
> return {
>      title        => 'QSR magazine search results',
>      swish_binary => '/usr/local/bin/swish-e',
>      swish_index  => '/var/www/qsr/web/search/index.swish-e',
>      template     => {
>              package         => 'SWISH::TemplateHTMLTemplate',
>              options         => {
>                  filename            => 'swish.tmpl',
>                  path                => '/var/www/qsr/web/search',
>                  die_on_bad_params   => 0,
>                  loop_context_vars   => 1,
>                  cache               => 1,
>              },
>          },
> }
> 
> I got stuck b/c I thought the file parameter was named 'file' and was 
> its own key/value vs. being nested in 'options'.
> 

the file parameter is called 'filename' in your example above? or 'path'?


-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Mon Feb 26 14:04:16 2007