Re: swish.cgi results no path in title

From: <moseley(at)>
Date: Sun Sep 14 2003 - 14:30:30 GMT
On Sat, Sep 13, 2003 at 10:59:31AM -0700, Aaron Bazar wrote:
> I am not quite sure what you mean. Perhaps I was not clear.

Right.  What I mean is that you provide details so I can reproduce the 
problem.  It's not very efficient otherwise -- you have sent two emails 
and your problem still isn't solved, and I just spent 45 minutes trying 
various things and still couldn't reproduce your problem.

> I have an index with thousands of documents that I use swish.cgi to search.
> When results are returned, most show up fine. However, if the original
> HTML document did not have a title, then it shows up in the results
> list without a title... so there is nothing to "click-on"
> Here is an example:

Yes, I see that.  It's odd.  (And a 2.8M web page is a bit long, I'll 

Since you didn't send a way to reproduce it easily, I tried it myself:
I used "view source" and I could see the URL of the original page.  I 
fetched it with:

moseley@laptop:~/apache$ wget

Then indexed it:

moseley@laptop:~/apache$ cat c
Defaultcontents HTML*
StoreDescription HTML* <body> 100000
SwishProgParameters default http://localhost/apache/export.html

moseley@laptop:~/apache$ swish-e -S prog -i -c c
(geeze, takes a minute and a half to index that one page on my laptop!)

Now search:

moseley(at)not-real.laptop:~/apache$ GET http://localhost/apache/swish.cgi?query=word | grep rank:
        <dt>1 <a href="http://localhost/apache/export.html">export.html</a> <small>-- rank: <b>1000</b></small></dt>
And there's the path name used as the title -------------------^

So maybe something weird with spidering directly from that site.  So 
just to be sure I then used this config:

moseley@laptop:~/apache$ cat c
Defaultcontents HTML*
StoreDescription HTML* <body> 100000
#SwishProgParameters default http://localhost/apache/export.html
SwishProgParameters default

And started indexing.  After a few minutes I sent a SIGHUP to 
tell it to quit spidering:

moseley@laptop:~/apache$ kill -HUP 6556

And then searched as above and the title was there.

So what's different?  I have no idea.

Did you test to see which program is not returning the title (swish-e or 

Are you using some other configuration than I'm using?

Are you using something other than the default swish.cgi template
setting?  I tried all the templates that come with swish.cgi and they
all worked.

Again, if you want help you need to provide an easy way for me to see 
the problem and, hopefully, reproduce it on my machine.

Or better, since I provided all my steps above, try that, and if that 
works then see how your configuration is different.

> The second result is what I am talking about.
> Thanks!
> Aaron Bazar
> > Hi,
> >
> > I have run into an issue with the swish.cgi in version 2.4... Some html
> > pages that I index do not have a <title> tag .. as far as I know, if there
> > is no title then swish is supposed to use the docpath as the title.
> However,
> > this is not happening. I end up with nothing in the title... consequently
> > there is no link- just the rank and description. I have been trying to
> find
> > where in the perl code this is, with no luck. Basically, if there is no
> > swishtitle, I would like to put in a default like "Untitled" (or even the
> > docpath like it is supposed to work)
> Try and support what you are saying with examples.  Like this:
> moseley@laptop:~$ cat 1.html
> <html>
> <head>
> <title></title>
> </head>
> <body>
> bodyword
> </body>
> moseley@laptop:~$ swish-e -i 1.html -v0
> moseley@laptop:~$ swish-e -w bodyword
> # SWISH format: 2.4.0-pr1
> # Search words: bodyword
> # Removed stopwords:
> # Number of hits: 1
> # Search time: 0.003 seconds
> # Run time: 0.087 seconds
> 1000 1.html "1.html" 63
> .
