Skip to main content.
home | support | download

Back to List Archive

Re: dirTree.pl

From: Lee Thompson <lee(at)not-real.kaim.com>
Date: Mon Nov 17 2003 - 00:45:20 GMT
Bill,

Thanks, using the file stat was indeed giving me the wrong content
length, instead for Windows I can read <FH> filehandle to a $variable
and and use length($variable) for the content-length header.

 

Thanks for the quick and helpful answer!

Lee Thompson

 

 

-----Original Message-----

From: swish-e@sunsite.berkeley.edu [
<mailto:swish-e@sunsite.berkeley.edu>
mailto:swish-e@sunsite.berkeley.edu] On Behalf Of moseley@hank.org

Sent: Saturday, November 15, 2003 10:53 AM

To: Multiple recipients of list

Subject: [SWISH-E] Re: dirTree.pl

 

On Sat, Nov 15, 2003 at 07:17:14AM -0800, Lee Thompson wrote:

> Hi,

> 

> Has anyone tried modifying dirTree.pl for use on Windows? It does 

> find all files, but swish-e doesn't seem to be able to tell where one 

> file ends and the next file starts.

Make sure the version you are using does NOT use binmode (swish-e reads
in text mode). Does windows have a standard tool like "od" or "file" to
look at the output from DirTree.pl to see what kind line endings it has?

If that's not it, maybe you are using utf_8 and the 

content length is wrong. For that you could output one file with 

DirTree.pl, edit it and note the content-length. Then cut all the 

header lines, including the blank line between the header and content 

and save the file. The resulting file size should be what the 

content-lenght header said. That's assuming you have an editor that 

won't screw things up and add a line ending at the end if there isn't 

already one there.

 

> Should the data from dirTree.pl have

> something specific that indicates where one file ends and the next 

> starts?

No. It knows the end by the content-length.

 

> It does put in the same headers as spider.pl, spider.pl works fine 

> here.

spider.pl uses this to determine the content length (in the event that 

the content ends up in utf-8 with multi-byte chars:

# ugly and maybe expensive, but perhaps more portable than "use bytes"

my $bytecount = length pack 'C0a*', $$content;

But DirTree.pl uses the length from the stat command. Hard to imagine 

that would be wrong.

 

 

 

 

The errors show are:

> 

> ----------------------------------------

> C:\KaTS\SWISH-E>swish-e -S prog -c conf/filetree.config -i 

> ./prog-bin/DirTree.pl -f i:\Data\Taxonomy\mydrive.swish-e Indexing 

> Data Source: "External-Program" Indexing "./prog-bin/DirTree.pl"

> External Program found: ./prog-bin/DirTree.pl

> 

> Warning: Unknown header line: 'Logging information for IE6Setup.exe 

> ...' from program ./prog-bin/DirTree.pl /WINNT/Active Setup Log.txt - 

> Using TXT2 parser - (2819 words)

> 

> Warning: Unknown header line: 'b.dll' from program 

> ./prog-bin/DirTree.pl

> 

> Warning: Unknown header line: 'Search fixed drives = FALSE' from 

> program ./prog-bin/DirTree.pl

> 

> Warning: Unknown header line: 'Search remote drives = FALSE' from 

> program ./prog-bin/DirTree.pl

> 

> Warning: Unknown header line: 'Search removable drives = FALSE' from 

> program ./prog-bin/DirTree.pl

> 

> Warning: Unknown header line: 'Search CD-ROM drives = FALSE' from 

> program ./prog-bin/DirTree.pl

> 

> Warning: Unknown header line: 'Search specific directories = TRUE' 

> from program ./prog-bin/DirTree.pl

> 

> Warning: Unknown header line: 'Custom directories = 

> C:\WINNT\Microsoft.NET\Framework' from program ./prog-bin/DirTree.pl

> 

> Warning: Unknown header line: 'Recurse custom dirs = TRUE' from 

> program ./prog-bin/DirTree.pl

> 

> Warning: Unknown header line: 'Result = 0' from program 

> ./prog-bin/DirTree.pl

> 

> Warning: Unknown header line: 'END: Perform action: Search for File' 

> from program ./prog-bin/DirTree.pl

> err: External program failed to return required headers Path-Name:

> .----------------------------------------

> 

> If I run dirTree.pl on it's own I do get all the correct swish-e 

> header lines, for example:

> 

> Path-Name: /WINNT/Active Setup Log.txt

> Content-Length: 21382

> Last-Mtime: 1062197811

> Document-Type: TXT*

> 

> 

> 

> 

> Lee Thompson

> 

> 

> 

> 

> 

> 

> 

> *********************************************************************

> Due to deletion of content types excluded from this list by policy, 

> this multipart message was reduced to a single part, and from there to


> a plain text message.

> *********************************************************************

> 

-- 

Bill Moseley

moseley@hank.org

 




*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Mon Nov 17 00:45:31 2003