Skip to main content.
home | support | download

Back to List Archive

Re: Configuration

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Dec 29 2005 - 15:32:51 GMT
On Thu, Dec 29, 2005 at 07:34:57AM -0500, Lars D. Noodén wrote:
> >2) Have swish read from the progam
> >	swish-e -S prog -i ./DirTree.pl
> >   or set the program in IndexDir in a swish-config file
> >
> >       swish.conf:
> >       IndexDir DirTree.pl
> >
> >       swish-e -S prog -c swish.conf
> 
> I've tried variations of that.  It looks like 'SwishProgParameters' may be 
> needed.

Yes, that was just an example.  The examples were that swish either
has to read from stdin or be told to run an external program via -i
(or the equivalent config option IndexDir).  Yes, when running an
external program you might need to pass it some parameters and that's
what SwishProgParameters is used for.

man SWISH-RUN
http://swish-e.org/docs/swish-run.html#indexing_command_line_arguments

> B) This one does not find the directory, though it is specified.
> 
> $ swish-e -S prog -i /usr/local/lib/swish-e/DirTree.pl /Library/WebServer/ODF/

Look at that one.  You are passing two values to the -i option.  You
are saying that you have *two* programs to run.

And then swish is telling you that it isn't a program:

>  Indexing "/Library/WebServer/ODF/"
> 
>  Warning: Found '/Library/WebServer/ODF' in PATH but is not executable
>  err: Failed to find program '/Library/WebServer/ODF' in PATH:
>  /Developer/q t/bin:/sw/bin:/sw/sbin:/bin:/sbin:/usr/bin:/usr/sbin:
>  /sw/bin:/usr/local/bin:/usr/X11R6/bin:/usr/local/lib/swish-e


Now this one is correct:

> 
> C) This one finds the directory and the files in it, but not the filters, 
> not even the PDF filter:
> 
> $ /usr/local/lib/swish-e/DirTree.pl  /Library/WebServer/ODF/ | swish-e -S prog -i stdin

>  Wide character in print at /usr/local/lib/swish-e/DirTree.pl line 195.
> 
>  Warning: Unknown header line: 'th-Name: /Library/WebServer/ODF/
> 	bio100_2.odt' from program stdin
>  err: External program failed to return required headers Path-Name:
>  .

Now that's a problem with character encoding.  Here's the perldiag
comment on that error:

       Wide character in %s
           (W utf8) Perl met a wide character (>255) when it wasn't expecting
           one.  This warning is by default on for I/O (like print).  The easi-
           est way to quiet this warning is simply to add the ":utf8" layer to
           the output, e.g. "binmode STDOUT, ':utf8'".  Another way to turn off
           the warning is to add "no warnings 'utf8';" but that is often closer
           to cheating.  In general, you are supposed to explicitly mark the
           filehandle with an encoding, see open and "binmode" in perlfunc.

What does "locale" show on your shell where you are running swish?
If it reports UTF-8 then you might have luck and fix by setting your
LANG to 'en_US'.

Otherwise, I'm not clear on the problem.


You should understand how -S prog works to understand that "Unknown
header line" above.  The format is a stream of bytes.  It starts with
a set of variable length headers that are all terminated by a newline.
There's a content-length header that tells swish how many *bytes* long
the document is.  The headers end at the first blank line.  Then the
content begins and goes for content-length bytes.  The headers for the
next document follow that last byte.

When you have:

    Warning: Unknown header line: 'th-Name: /Library/WebServer/ODF/

you can see that swish is trying to read the next set of headers at
the wrong place.  That should be "Path-Name:", so swish was (I guess)
told to read two more bytes than it should have.

Here's the other hint:

    Wide character in print at /usr/local/lib/swish-e/DirTree.pl line 195.

So the basic problem is DirTree.pl (or Perl) is reading in
wide-character data and not reporting to swish the correct number of
bytes.  Unfortunately, Perl's support of unicode has changed quite a
bit over the years, so DirTree.pl tries to use a method that works in
quite a few versions of Perl:

    # Get the length of the content - have to worry about multi-byte content
    # ugly and maybe expensive, but perhaps more portable than "use bytes"
    my $bytecount = length pack 'C0a*', $$content_ref;

But maybe that isn't working on your system.

You might try replacing that with:

    my $bytecount = do { use bytes; length( $$content_ref ) };

But I get the same thing with both:

moseley@bumby:~$ cat b.pl
my $x = "Hello\x{0100}\n";
print $x;

print length $x;

my $b = length pack 'C0a*', $x;
print $b;

my $l = do { use bytes; length( $x ) };
print $l;

moseley@bumby:~$ perl -wl b.pl
Wide character in print at b.pl line 2.
HelloÄ

7
8
8






-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Thu Dec 29 07:32:59 2005