Skip to main content.
home | support | download

Back to List Archive

Re: Proximity for Swish-e/SwishCtl

From: Herman Knoops <hk.sw(at)>
Date: Sat May 13 2006 - 20:32:18 GMT
> > We've added some required features:
> > - wildcard support for "?" (exactly one character)
> > - proximity search support with variable distance
> >   (e.g. bike near50 car)
> Excellent.

Remember, the current proximity search is implemented as
a special case of boolean "and". We did not bother yet
with complex searches (parenthesis). It works perfectly
well for indexing PDF files which are converted to XML.
    <title>  content  </title> 
    <author>  content  </author> 
      [text version of pdf filter] 
The need for proximity was high, because a simple "and"
is not really useful if you have many 100 page documents.
A simple "bike" and "car" would give a hit even if e.g.
"bike" is on page 1 and "car" is on page 100. Now we can
limit the number of returned documents drastically.

> > We've also modified SwishCtl.dll (Windows) slightly to:
> > - let it work in an ASP-environment on IIS (4.0 or higher);
> > - let it run smoothly from CD/DVD, without msgboxes
> >   and without registry access
> Can SwishCtl be run by a normal user instead of an Administrator?  One
> of the registry problems was that the DLL had to be registered by an
> Administrator before it could be run.

The SwishCtl is an ATL/COM component, which indeed must be
registered. In an ASP/IIS environment this is not a problem,
since the administrator has to register it only once.
In a CD environment, we use a tiny but very powerful ASP webserver
(third party, licensed), which takes care of so-called
"volatile registration" of COM components, so the user does
not have to install anything. Just insert the CD and all 
components are "registered" on-the-fly (even on very locked
down systems). Subsequently enter some search criteria,
do search, select the desired (pdf-)document, open it and
the search terms are highlighted in the native pdf-viewer.

We have modified the way SwishCtl determines the location of
the index, and the index itself. So no registry access is
required for this anymore (because was not handy for CD-users
in a locked-down environment). The call has 2 formats:
1. 	swishctl.Init('@@KM@@h:/test.kmt/kmt/idx/+docs.idx');
2. 	swishctl.Init('IndexFiles');
Option 1 is the new way, where the actual path is assembled
at run time, which makes sure you can run multiple search
applications from all different locations on a single machine.
The pattern "@@KM@@" is a hard-coded pattern, present in the
current source+binary of this SwishCtl.dll, so we can
differentiate between the "registry" way or the more flexible
"full path" way.
> > - removed several dependencies (zlib.dll and atl.dll),
> >   so now just SwishCtl is sufficient, for a search-only
> >   solution
> Adding ASP support and dropping the ATL requirement sound like great
> improvements.  I'll see about getting those changes integrated.
> I think the ATL dependency is why I couldn't build using the MinGW32
> compiler.  Perhaps I'll be able to provide SwishCtl along with the
> Windows builds again.

SwishCtl.dll has minimal changes. As far as the ATL stuff concerns,
we use the macro /D "ATL_STATIC_REGISTRY" in the MSVC6 configuration,
which makes sure some ATL code is statically incorporated (dependency
to ATL.dll is gone). For ZLIB we use the zlibstat.lib, which is also
statically linked. The final result is the 276KB SwishCtl.dll which
only has dependencies to the regular Windows system DLLs.

SwishCtl.dll itself does not have ASP stuff in it. We created a new
COM component, which is instantiiated in an ASP-file (CreateObject).
This component has all the ASP-stuff in it and creates and calls the
SwishCtl component. The COM component on top takes care of cleaning
up and validating the user input, before the "search criteria" and
boolean stuff is handed over to SwishCtl.

Hope this clarifies things a bit.

Herman Knoops
KnoMan b.v.
The Netherlands

Received on Sat May 13 13:32:24 2006