Skip to main content.
home | support | download

Back to List Archive

RE: I want to contrbute my problem of swish

From: Revillini, James <JRevillini(at)not-real.txcc.commnet.edu>
Date: Mon Jun 27 2005 - 14:03:53 GMT
Munga,

I was having the same (or similar) problems on a windows 2003 computer.
It is not very easy to set up on windows.

Catdoc, the default filter to create text output from word docs, is not
really the best filter out there.  Future versions of swish-e are going
to utilize wvware, I'm told.  I would highly recommend downloading and
installing wvware.  Just google for "wvware for windows".  Install that,
and add the path to the bin directory to your windows environment
variables > path.

You will also need the Doc2html.pm module to stick in your filters
directory under swish-e - something like:
e:\swish-e\lib\swish-e\perl\SWISH\Filters\.

You can obtain filter modules (for doc, xls, ppt) in a variety of
places, but I got my directly from the CVS:
http://cvs.sourceforge.net/viewcvs.py/swishe/swish-e/filters/SWISH/Filte
rs/
(note: you MAY have to install a variety of perl modules in order to
make these work.  To name a couple: MIME::Types,
Spreadsheet::ParseExcel)

Now run the swish-filter-test and make sure the doc2html module gets
loaded and that it finds wvware as the filter.  (swish-filter-test can
be run from the command like by invoking it as a parameter to perl, so:
e:\>perl swish-filter-test -verbose -content "e:/file.doc")

If you are indexing a file system, please read this entire thread:
http://swish-e.org/archive/2005-06/9727.html
As it covers many of the issues you may run into.

-Jim


> -----Original Message-----
> From: swish-e@sunsite3.berkeley.edu
[mailto:swish-e@sunsite3.berkeley.edu]
> On Behalf Of Munga Lal Shaw
> Sent: Saturday, June 25, 2005 10:48 AM
> To: Multiple recipients of list
> Subject: [SWISH-E] I want to contrbute my problem of swish
> 
> Hi All,
> 
> I had compile swish-e with libxml2 on windows XP using MinGW. It was
> successfully compile with some modification and I also remove the perl
> support form the source. After installing the swish-e, I found it was
not
> parsing the MS Office files like doc xls ppt etc using HTML2 parser.
Then
> after i used swish-e with catdoc modual but it generate error during
> indexing here i am giving the error
> 
> $ swish-e -c swish.conf -v 11
> Parsing config file 'swish.conf'
> Indexing Data Source: "File-System"
> Indexing "e:/docs/"
> 
> Checking dir "e:/docs"...
>   1.docThe filename, directory name, or volume label syntax is
incorrect.
>  - Using DEFAULT (HTML2) parser -  (no words indexed)
>   application.docThe filename, directory name, or volume label syntax
is
> incorre                 ct.
>  - Using DEFAULT (HTML2) parser -  (no words indexed)
>   Document.docThe filename, directory name, or volume label syntax is
> incorrect.
>  - Using DEFAULT (HTML2) parser -  (no words indexed)
>   M.docThe filename, directory name, or volume label syntax is
incorrect.
>  - Using DEFAULT (HTML2) parser -  (no words indexed)
>   qualifications.docThe filename, directory name, or volume label
syntax
> is inco                 rrect.
>  - Using DEFAULT (HTML2) parser -  (no words indexed)
>   winhttp.dll - Using DEFAULT (HTML2) parser -  (89 words)
> 
> Removing very common words...
> no words removed.
> Writing main index...
> Sorting words ...
> *And the Conf File is*
> 
> IndexDir e:/docs/
> FileFilter .doc       /e:/catdoc "-s8859-1 -d8859-1 '%p'"
> 
> *And the Swish version is 2.4.3.
> *
> But when i remove the perl folder form /usr/local/lib/swish-e/perl in
> Linux. It no matter for swish-e. It worked fine. it parses all Ms
Office
> document. But It not work On windows. Can you help me that how can I
parse
> or index MS Office document on windows using swish-e. Any help will be
> appreciable
> 
> 
> Munga.
> --
> Munga Lal Shaw  <munga@neolinuxsolutions.com>
> Systems Programer, NeoLinux Solutions.
> http://www.neolinuxsolutions.com.
> Blog: http://blogs.munganiitian.5gigs.com
> 
> Ph: +91-651-2532265
> 
> 
> 
>
------------------------------------------------------------------------
--
> -----
>  WARNING: The sender of this email is different from the email address
>  shown in the headers. The real sender of this message is: swish-
> e@sunsite3.berkeley.edu
>  If you want to add this sender to your Safe or Blocked Senders List,
you
>  would need to add swish-e@sunsite3.berkeley.edu
>
------------------------------------------------------------------------
--
> -----
Received on Mon Jun 27 07:03:55 2005