Skip to main content.
home | support | download

Back to List Archive

RFC -> Re: patch for filesystem indexing

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Wed May 26 2004 - 20:21:46 GMT
This is a change in how swish-e selects files for indexing.

Currently swish marks files as "already indexed" when a file is skipped
via FileRules.  The problem is if that file is available under a
different name (as a symbolic link), it will not be indexed since the
file is flagged as "already indexed."

The new code will still only index a file once, but you can use
FileRules to set which file name will be selected for indexing.

The issue is basically this:  Say you have symlinks pointing to
something real.

dir $HOME/foo

    link1 -> ../real
    liin2 -> ../real
    link3 -> ../real

with the existing code you basically cannot decide which link* file is
used to index the real stuff (it will be link1).

With the new code you can do this:

    FileRules filename is link1
    FileRules filename is link3

and force it to index the contents of ../real under the file name link3

Same thing with directories.  Say you have this structure:

    April
         april_doc           (report for April)
         next -> ../May      (link to the next month)
    May
         may_doc
         next -> ../June
    June
         june_doc

The old swish-e way would index these docs:

    April/april_doc
    April/next_month/may_doc
    April/next_month/next_month/june_doc

And May and June directories are not processed because they already have
been by following the "next_month" symlinks.


If you tried to exclude "next_month" with

    FileRules dirname contains next_month

it would skip the "May" documents because "April/next_month"
was skipped by the FileRules setting, but also "May" was flagged as
already indexed because swish-e visited the symlink April/next_month
which points to May.

So, with the the updated code and the above FileRules to 
skip "next_month" it will index:

    April/april_doc
    May/may_doc
    June/june_doc

Now, there's currently no way to force swish-e to index the same file
twice from two different symlinks.  I don't really see that as a
problem.


-- 
Bill Moseley
moseley@hank.org
Received on Wed May 26 13:21:47 2004