Skip to main content.
home | support | download

Back to List Archive

RE: Swish-e Filtering on Win2003

From: Philippus, Brian <BPhilippus(at)not-real.nevp.com>
Date: Thu Mar 18 2004 - 16:57:28 GMT
I'm sorry, I guess my email got mutilated, I'll try again.

I'm running Windows Server 2003, Active Perl 5.8.3.809, & swish-e 2.4.1.
I'm using the spider.pl to spider my websites.  
 
I'm having a lot of difficulty filtering pdf and doc files.  From what I can
tell, the most important info regarding this problem from the log files is:
****************************************************************************
++Checking filter [SWISH::Filters::Pdf2HTML=HASH(0x1ea2d60)] for
application/pdf
5592 Warning - http://staging.nevp.com/fp/PDF/2004_dibetes_wllnss_day.pdf:
Use of uninitialized value in pattern match (m//) at
D:/Perl/lib/IO/Handle.pm line 348.
5592 Warning - http://staging.nevp.com/fp/PDF/2004_dibetes_wllnss_day.pdf:
Use of uninitialized value in concatenation (.) or string at
D:/Perl/lib/IO/Handle.pm line 358.
Problems with filter 'SWISH::Filters::Pdf2HTML=HASH(0x1ea2d60)'.  Filter
disabled:
 -> open2: Can't call method "close" on an undefined value at
D:/Perl/lib/IPC/Open3.pm line 338.
>> Starting to process new document: application/msword
 ++Checking filter [SWISH::Filters::Doc2txt=HASH(0x1e784a4)] for
application/msword
5592 Warning - http://staging.nevp.com/FORMS/DOC/Fax-Handwritten.doc: Use of
uninitialized value in pattern match (m//) at D:/Perl/lib/IO/Handle.pm line
348.
5592 Warning - http://staging.nevp.com/FORMS/DOC/Fax-Handwritten.doc: Use of
uninitialized value in concatenation (.) or string at
D:/Perl/lib/IO/Handle.pm line 358.
Problems with filter 'SWISH::Filters::Doc2txt=HASH(0x1e784a4)'.  Filter
disabled:
 -> open2: Can't call method "close" on an undefined value at
D:/Perl/lib/IPC/Open3.pm line 338.
****************************************************************************
All of my HTML files are indexed fine and XLS files are filtered without a
problem:
****************************************************************************
>> Starting to process new document: application/vnd.ms-excel
 ++Checking filter [SWISH::Filters::XLtoHTML=HASH(0x1e99054)] for
application/vnd.ms-excel
 ++ application/vnd.ms-excel *WAS* filtered by
SWISH::Filters::XLtoHTML=HASH(0x1e99054)
 

Final Content type for http://staging.nevp.com/DIR/RGdirectory.xls is
text/html
  >Filter SWISH::Filters::XLtoHTML=HASH(0x1e99054) converted from
[application/vnd.ms-excel] to [text/html]
****************************************************************************

My Perl skills are quite limited, but it almost seems like the filter can't
find the file.  Unfortunately, I don't know what to do about it.  This
happens on the very first PDF and DOC files that are found and after that,
the filter is disabled.
 
Any ideas would be greatly appreciated.
 
Thanks
Brian Philippus 

-----Original Message-----
From: swish-e@sunsite.berkeley.edu [mailto:swish-e@sunsite.berkeley.edu] On
Behalf Of Philippus, Brian
Sent: Thursday, March 18, 2004 8:50 AM
To: Multiple recipients of list
Subject: [SWISH-E] Swish-e Filtering on Win2003

Final Content type for http://staging.nevp.com/DIR/RGdirectory.xls
<http://staging.nevp.com/DIR/RGdirectory.xls>  is text/html
  >Filter SWISH::Filters::XLtoHTML=HASH(0x1e99054) converted from
[application/vnd.ms-excel] to [text/html]
----------------------------------------------------------------------------
---------------------
 
My Perl skills are quite limited, but it almost seems like the filter can't
find the file.  Unfortunately, I don't know what to do about it.  This
happens on the very first PDF and DOC files that are found and after that,
the filter is disabled.
 
Any ideas would be greatly appreciated.
 
Thanks
Brian Philippus



*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Thu Mar 18 08:57:29 2004