Skip to main content.
home | support | download

Back to List Archive

Re: contract work for a site search utility

From: Bill Moseley <moseley(at)>
Date: Wed Mar 03 2004 - 21:05:23 GMT
[back to the list]

On Wed, Mar 03, 2004 at 12:31:14PM -0800, Gil Vidals wrote:
> Well if it will only take  you a minute, could you do it cheaper than by the
> day ;-)If it's as easy as you say can you just show me how this is done?

How what is done?  Indexing?

moseley@bumby:~$ cat c
HTMLLinksMetaName links

moseley@bumby:~$ cat 1.html

text <a href="">abc site</a>


moseley@bumby:~$ swish-e -c c -i 1.html -T indexed_words -v0
    Adding:[1:swishdefault(1)]   'title'   Pos:2  Stuct:0x7 ( HEAD TITLE FILE )
    Adding:[1:links(10)]   'http'   Pos:5  Stuct:0x9 ( BODY FILE )
    Adding:[1:links(10)]   'www'   Pos:6  Stuct:0x9 ( BODY FILE )
    Adding:[1:links(10)]   'abc'   Pos:7  Stuct:0x9 ( BODY FILE )
    Adding:[1:links(10)]   'com'   Pos:8  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'text'   Pos:9  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'abc'   Pos:10  Stuct:0x9 ( BODY FILE )
    Adding:[1:swishdefault(1)]   'site'   Pos:11  Stuct:0x9 ( BODY FILE )

ok, so the link was indexed as three works (that can
be changed by WordCharacters but I like being able to search for
"" and still find it.

So to search:

moseley@bumby:~$ swish-e -w 'links=("")' -H0
1000 1.html "Title" 108

> It should search <a href> tags; however, javascript links should be searched
> as well.

All bests are off with javascript.  You need a javascript interpreter to
figure that out.  If they are simple you could filter the files and
convert the javascript links into something that swish-e can index (i.e.
convert it to a meta tag).

You can use the included swish.cgi or search.cgi examples for creating a
search interface.  Look at -- it has a way to
search "HTML Links".

> The code should search the entire site up to N pages deep.

Filter results by number of path segments.

> -----Original Message-----
> From: Bill Moseley []
> Sent: Wednesday, March 03, 2004 12:28 PM
> To: Gil Vidals
> Cc: Multiple recipients of list
> Subject: Re: contract work for a site search utility
> On Wed, Mar 03, 2004 at 12:12:27PM -0800, Gil Vidals wrote:
> > I've downloaded and studied Swish-e. My company, Position Research, has a
> > small project which involves locating a given URL on a given website. For
> > example, use Swish-e to see if the url is anywhere on the site
> > If it is, then return the page from where the
> link
> > to was found.
> You mean search href tags?
> > Let me know if you are interested and approximately how many hours of work
> > is required to produce the perl code.
> HTMLLinksMetaName links
> Less than a minute.  But I charge by the day.  Invoice to follow.
> Or do you mean something more custom than that?
> --
> Bill Moseley

Bill Moseley
Received on Wed Mar 3 13:05:24 2004