Hi All,
I have been successfully parsing doc,xls and excel files with swish-e with
the following compiled software
SWISH-E 2.4.3 on libxml2 2.6.11 on a Debian Woody System and have been
successfully able to parse the above mentioned files without any glitches
after removing the perl directories of swish-e and no catdoc or wvware
installed, but with pcre and zlib present on the system.
The said thing also happens on Fedora Core-2 with the same libxml2 version.
I am unable to figure out what is happenning. Is libxml2 taking care of
the parsing of the files or ...
This is the output. Notice the HTML2 parser being used. There are no
filters being used as well.
A sample output on Debian follows:
root@laptop:/tmp# /usr/local/swish-e/bin/swish-e -i /opt/work_data/Neolinux.doc -v 20
Indexing Data Source: "File-System"
Indexing "/opt/work_data/Neolinux.doc"
Checking file "/opt/work_data/Neolinux.doc"...
Neolinux.doc - Using DEFAULT (HTML2) parser - (290 words)
Removing vroot@laptop:/tmp# /usr/local/swish-e/bin/swish-e -i /opt/work_data/Neolinux.doc -v 20
Indexing Data Source: "File-System"
Indexing "/opt/work_data/Neolinux.doc"
Checking file "/opt/work_data/Neolinux.doc"...
Neolinux.doc - Using DEFAULT (HTML2) parser - (290 words)
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 193 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
193 unique words indexed.
4 properties sorted.
1 file indexed. 11,264 total bytes. 290 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
root@laptop:/tmp#ery common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 193 words alphabetically
Writing header ...
Writing index entries ...
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
193 unique words indexed.
4 properties sorted.
1 file indexed. 11,264 total bytes. 290 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
root@laptop:/tmp#
But here is the problem: I have been trying to do the exact thing on Windows
but have failed to do this so far. On windows there was no libpcre and I had
to do a lot of ugly hacks to get everything compiled properly under MinGW and
msys.
Could somebody try out with my sort of an enviornment and explain what is
the case. A barebones install of swish-e with the latest libxml2 set of
libraries.
Thanks in Advance,
Regards,
Animesh
Received on Mon Jun 27 12:05:21 2005