forgot to cc: the list...
I did a test, just to prove to myself that I understood it. I agree that
the docs need to be updated. The big thing: you can't do both -r and -u
at the same time.
here's my experience:
to create the initial index:
swish-e -f index.idx -c swish-e.config
to update files that have changed:
swish-e -u -f index.idx -c swish-e.config
to remove specific files:
swish-e -r -i filetoremove -f index.idx -c swish-e.config
NOTE that I think the IndexDir directive will probably screw things up
when trying to remove files with -r, since that requires a specific
input file name. So you might want to specify your input dir with the -i
option rather than in the config file.
I did discover that the -T index_all dump option gives confusing results
on the ReadAllDocProperties vs ReadSingleDocPropertiesFromDisk routines.
You can see the difference if you do this:
swish-e -i file1.xml file2.xml
swish-e -T index_all
alter contents of file1.xml
swish-e -i file1.xml -u
swish-e -T index_all
the file numbers get incremented when a doc changes, even though the
filename is the same. also, the two read routines seem to be getting
their info in different ways.
tmuetze@alanti.net wrote on 12/4/04 6:45 PM:
> Hi all,
> has anyone real-world knowledge about using -r and -u switches on a
> build which was done with "configure --enable-incremental"? I really
> don't know how those switches really affects the work of swish-e.
>
> My stripped down test configfile:
> IndexDir /some_dir/
> IndexOnly .txt .htm .html .doc .xls .pdf
> FileFilter .doc /usr/bin/catdoc "-s8859-1 -dcp1252 '%p'"
> FileFilter .pdf /usr/bin/pdftotext "-htmlmeta -nopgbrk '%p' -"
> IndexContents HTML .pdf
> IndexContents TXT .doc
>
> This is the command we issue:
> swish-e -u -r -f index.idx -c swish-e.config
>
> Whenever we issue the command the index is rebuilded from scratch. Maybe
> I just have misunderstood what -u and -r should actually do?
>
> When I now try the following:
> swish-e -u -r -N index.idx -f index.idx -c swish-e.config
>
> As expected, the index isn't rebuilded because of the timestamp check.
> When I now manually remove a file and reindex, this file isn't removed
> from the index.
>
> So please give me hint what we have done wrong because I'm lost right
> now ;-)
>
> Best Regards,
> Tilo
>
> Hi Peter,
>
> Peter Karman wrote:
>
>>I think if you do use it, you might consider yourself a "pilot
>>tester" and let us know what you discover. :)
>
>
> OK, we already have installed the latest build on one of our
> test-machines, so let's see how it really works in practice.
>
>
>>Even though the incremental feature has been available for a couple
>>release cycles (including the soon-to-be-announced 2.4.3), it really
>>needs more real-world exposure before the 'experimental' label is removed.
>>
>>So try it out, stress it, see if it breaks. The more people who do that,
>>the closer we can collectively come to calling it 'stable'.
>
>
> I will report my findings back to the list.
>
> Regards,
> Tilo
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
--
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Sat Dec 4 18:41:53 2004