Skip to main content.
home | support | download

Back to List Archive

Re: swish.cgi results no path in title

From: <aaronb(at)not-real.spamcop.net>
Date: Sun Sep 14 2003 - 22:11:34 GMT
Thank you very much! I really appreciate the time you have spent on this and I 
can see what I need to do to investigate further. If I learn anything of value 
I will send it in. 

Thanks again,

Aaron Bazar



Quoting moseley@hank.org:

> On Sat, Sep 13, 2003 at 10:59:31AM -0700, Aaron Bazar wrote:
> > I am not quite sure what you mean. Perhaps I was not clear.
> 
> Right.  What I mean is that you provide details so I can reproduce the
> problem.  It's not very efficient otherwise -- you have sent two emails
> and your problem still isn't solved, and I just spent 45 minutes trying
> various things and still couldn't reproduce your problem.
> 
> > I have an index with thousands of documents that I use swish.cgi to search.
> >
> > When results are returned, most show up fine. However, if the original
> > HTML document did not have a title, then it shows up in the results
> > list without a title... so there is nothing to "click-on"
> >
> > Here is an example:
> >
> > http://www.healthfind.org/health/weight+loss
> 
> Yes, I see that.  It's odd.  (And a 2.8M web page is a bit long, I'll
> note.)
> 
> Since you didn't send a way to reproduce it easily, I tried it myself:
> I used "view source" and I could see the URL of the original page.  I
> fetched it with:
> 
> moseley@laptop:~/apache$ wget www.megafitness.com/export.html
> 
> Then indexed it:
> 
> moseley@laptop:~/apache$ cat c
> Defaultcontents HTML*
> StoreDescription HTML* <body> 100000
> SwishProgParameters default http://localhost/apache/export.html
> 
> moseley@laptop:~/apache$ swish-e -S prog -i spider.pl -c c
> (geeze, takes a minute and a half to index that one page on my laptop!)
> 
> Now search:
> 
> moseley(at)not-real.laptop:~/apache$ GET http://localhost/apache/swish.cgi?query=word |
> grep rank:
>         <dt>1 <a href="http://localhost/apache/export.html">export.html</a>
> <small>-- rank: <b>1000</b></small></dt>
>                                                            ^^^^^^^^^^^^
> And there's the path name used as the title -------------------^
> 
> 
> So maybe something weird with spidering directly from that site.  So
> just to be sure I then used this config:
> 
> moseley@laptop:~/apache$ cat c
> Defaultcontents HTML*
> StoreDescription HTML* <body> 100000
> #SwishProgParameters default http://localhost/apache/export.html
> SwishProgParameters default http://www.megafitness.com/export.html
> 
> And started indexing.  After a few minutes I sent spider.pl a SIGHUP to
> tell it to quit spidering:
> 
> moseley@laptop:~/apache$ kill -HUP 6556
> 
> And then searched as above and the title was there.
> 
> 
> So what's different?  I have no idea.
> 
> Did you test to see which program is not returning the title (swish-e or
> swish.cgi)?
> 
> Are you using some other configuration than I'm using?
> 
> Are you using something other than the default swish.cgi template
> setting?  I tried all the templates that come with swish.cgi and they
> all worked.
> 
> Again, if you want help you need to provide an easy way for me to see
> the problem and, hopefully, reproduce it on my machine.
> 
> Or better, since I provided all my steps above, try that, and if that
> works then see how your configuration is different.
> 
> 
> 
> 
> >
> > The second result is what I am talking about.
> >
> > Thanks!
> >
> > Aaron Bazar
> >
> >
> >
> > -----Original Message-----
> > From: swish-e@sunsite.berkeley.edu
> > [mailto:swish-e@sunsite.berkeley.edu]On Behalf Of moseley@hank.org
> > Sent: Saturday, September 13, 2003 1:23 PM
> > To: Multiple recipients of list
> > Subject: [SWISH-E] Re: swish.cgi results no path in title
> >
> >
> > On Sat, Sep 13, 2003 at 06:39:34AM -0700, Aaron Bazar wrote:
> > > Hi,
> > >
> > > I have run into an issue with the swish.cgi in version 2.4... Some html
> > > pages that I index do not have a <title> tag .. as far as I know, if
> there
> > > is no title then swish is supposed to use the docpath as the title.
> > However,
> > > this is not happening. I end up with nothing in the title... consequently
> > > there is no link- just the rank and description. I have been trying to
> > find
> > > where in the perl code this is, with no luck. Basically, if there is no
> > > swishtitle, I would like to put in a default like "Untitled" (or even the
> > > docpath like it is supposed to work)
> >
> > Try and support what you are saying with examples.  Like this:
> >
> > moseley@laptop:~$ cat 1.html
> > <html>
> > <head>
> > <title></title>
> > </head>
> > <body>
> > bodyword
> > </body>
> >
> > moseley@laptop:~$ swish-e -i 1.html -v0
> > moseley@laptop:~$ swish-e -w bodyword
> > # SWISH format: 2.4.0-pr1
> > # Search words: bodyword
> > # Removed stopwords:
> > # Number of hits: 1
> > # Search time: 0.003 seconds
> > # Run time: 0.087 seconds
> > 1000 1.html "1.html" 63
> > .
> >
> >
> > --
> > Bill Moseley
> > moseley@hank.org
> >
> 
> --
> Bill Moseley
> moseley@hank.org
> 
Received on Sun Sep 14 22:11:46 2003