Skip to main content.
home | support | download

Back to List Archive

Re: question about the SWISH-E index

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Aug 12 2002 - 21:43:11 GMT
At 01:54 PM 08/12/02 -0700, Spencer, Tammy wrote:
>My understanding is that Swish-e, as currently configured, does not meet
>this criterion.  In other words, if one starts with  just the Swish-e index,
>it is feasible to re-create the intellectual content of the original
>document.

That's basically true.  As I have recommended, if you don't want someone to
have access to your content, restrict access to it.  And the same thing
goes for the index -- if you don't want it reversed engineered then don't
allow access to that index.  Provide a secure front end.

>Thus, I am interested in discussing whether there might be a
>modification that could be introduced to Swish-e and/or the index.  I
>realize that such a modification might, to a degree, compromise the search
>capabilities against the index.
>
>For example, might it be possible to obscure the word count of the SWISH
>index (like dividing the word count number by two and rounding up) so that
>someone viewing the index couldn't determine the correct word sequence in
>the original document?  My understanding is that such a modification would
>compromise the capability to search literal strings like phrases, but one
>might still be able to design the search feature to allow "near" searches.

Right, that would remove the phrase searching ability of swish.  If you can
live with fewer features then that would be one solution, but I think
that's the wrong approach.  I find phrase searching very useful when I'm
looking for something specific, but don't know where it is.

Can you modify the positions to just do NEAR searches?  Perhaps.  But you
will still end up with relative positions -- perhaps not exact word
positions, but you will still be able to reverse engineer to some degree.
Can the (or some) content be gleaned from that?  If you knew these 10 words
belonged to the same sentence could you put that sentence back together?

>Please comment on the technological feasibility of modifying Swish-e and/or
>the index so that the intellectual content of the original document is not
>re-creatable from the index.

Not tested, but I imagine you can just open index.c in a text editor and
look for the addentry() function.  Then just add a line like:

    position = 1;

so all words end up with the same position value.

Obscuring word positions is not a great way to go, I feel -- security by
obscurity rarely is.  I would think it better to find some client/server
setup where you can control better what the clients can see.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Mon Aug 12 21:46:39 2002