Skip to main content.
home | support | download

Back to List Archive

Re: Sorting issues

From: Zambra - Michael <michael(at)not-real.zambra.com>
Date: Thu Jan 03 2002 - 18:11:09 GMT
> I guess I'm not following where you are now.
>
> Before you said:
>
> >Section 1 Header
> >  Results in subdirectory sectionname_a/a/
> >    Results ordered by rank
> >  Results in subdirectory sectionname_a/a/
> >    Results ordered by rank

Sorry, Bill. I mean "sectionname_a/a/", "sectionname_a/b", etc.
This is all the subdirectories below the first level subdirectory indexed
with the same "section" property.

So far I have added the following to my config-file:

ExtractPath seccion regex !^.*/htdocs/([^/]+)/.*$!$1!
PropertyNames seccion

I have done a trace regex. The final bit of the output is:
=====================================
Original String:
'/opt2/zambra/httpd/htdocs/revista/i/temas/week7/index.html'
replace /opt2/zambra/httpd/htdocs/revista/i/temas/week7/index.html =~
m[/opt2/zambra/httpd/htdocs][http://www.zambra.com]: Matched
replace /revista/i/temas/week7/index.html =~
m[/opt2/zambra/httpd/htdocs][http://www.zambra.com]: No Match
  Result String: 'http://www.zambra.com/revista/i/temas/week7/index.html'

Original String:
'/opt2/zambra/httpd/htdocs/revista/i/temas/week7/index.html'
replace /opt2/zambra/httpd/htdocs/revista/i/temas/week7/index.html =~
m[^.*/htdocs/([^/]+)/.*$][$1]: Matched
  Result String: 'revista'
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 7841 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
7841 unique words indexed.
6 properties sorted.
287 files indexed.  4249504 total bytes.
Elapsed time: 00:00:05 CPU time: 00:00:05
Indexing done!
==================================

When I do a search like:

swish-e -f ../idx/i1 -w guitarra -s seccion swishrank desc -p seccion -H 9

I get the following:

===================================
180 http://www.zambra.com/libreria/i/fiebre/cante/joselete.html "Joselete de
Linares en Zambra.com - La casa del flamenco en la web" 14583 ""
180 http://www.zambra.com/index.html.en "Zambra.com - The home of flamenco
on the web" 19974 ""
180 http://www.zambra.com/libreria/i/fiebre/cante/indio.html "Indio Gitano
en Zambra.com - La casa del flamenco en la web" 14706 ""
180 http://www.zambra.com/libreria/i/fiebre/cante/duquende.html "Duquende en
Zambra.com - La casa del flamenco en la web" 14181 ""

The "seccion" property seems to be empty?!

What am I doing wrong?

Miguel



> >
> >I would like to have all the results in subdirectory "sectionname_a" and
its
> >subdirectories ordered by rank.
>
> So, use ExtractPath to extract out "sectionname_a" to property, say,
> "section".
>
> ExtractPath section regex /^(section_.).+$/$1
> PropertyNames section
>
> Then you sort by -s section rank desc
>
> Then you will get all docs in section_a first, by rank, then all in
> section_b, by rank, and so on.
>
> If that's not what you mean then post some example URLs and ranks and how
> you want them sorted.
>
>
> At 01:08 PM 01/02/02 -0800, Zambra - Michael wrote:
> >
> >Hello again, Bill.
> >
> >BTW I forgot to mention that I'm already using a metaname with the
extracted
> >path for limiting the search to certain section of the indexed site. Here
is
> >an example results page:
> >
>
>http://www.zambra.com/cgi-bin/tda/sr.pl?base=zx&id=en&termino=Puebla&sort=s
> wishdocpath&sbm=all&start=0
> >
> >In the results page the problem is apparent. Results in each section are
not
> >being sorte by rank (because of the swishdocpath "precedence").
> >
> >Miguel
> >
> >
> --
> Bill Moseley
> mailto:moseley@hank.org
>
Received on Thu Jan 3 18:11:22 2002