Skip to main content.
home | support | download

Back to List Archive

Re:

From: Peter Karman <peter(at)not-real.peknet.com>
Date: Thu May 18 2006 - 13:48:43 GMT
If you are trying to limit ALL searches to docs with specified paths, 
better to not put anything else in your index in the first place. See 
the filter_content() callback in spider.pl docs.

SMALL, SHERIDAN scribbled on 5/18/06 5:59 AM:
> Hi,
> 
> =20
> 
> I am having some trouble using the select_by_meta directive in the
> swishcgi.conf file I have also tried using variations of ExtractPath
> with no sucsess.=20
> 
> =20
> 
> The problem is that I want to limit the search to paths with
> values"/courses/" and "/news/" not "courses" or "news".
> 
> The reason is that I have URLs like the following:-
> 
> =20
> 
> http://nymphswww.covcollege.ac.uk/email.php?title=3DA/AS%20levels%20-%20P=
> a
> rt%20Time%20Courses
> 
> =20
> 
> http://nymphswww.covcollege.ac.uk/courses/template.php?debug=3D1&cat=3Dnu=
> ll&
> code=3DIDAAD1F1&idx=3Dnull&title=3DBTEC%20Introductory%20Diploma%20in%20A=
> rt%20
> %20and%20Design
> 
> =20
> 
> I do not want the first included only the second including /courses/.
> 
> =20
> 
> And
> 
> http://nymphswww.covcollege.ac.uk/email.php?title=3DNews%20item%20-%20A%2=
> 0
> Slice%20of%20Success
> 
> =20
> 
> http://nymphswww.covcollege.ac.uk/sections/News/index.php?id=3D227&title=3D=
> A
> %20Slice%20of%20Success
> 
> =20
> 
> Note the /News/ which I want is preceeded by /sections/.
> 
> =20
> 
> I have tried using=20
> 
> =20
> 
> select_by_meta  =3D> {
> 
>             ***
> 
>             metaname    =3D> 'swishdocpath',     # Can't be a metaname
> used elsewhere!
> 
>             values      =3D> [qw '/courses/', '/news/' ],
> 
>             ***
> 
> But the "/"s are ignored.
> 
> =20
> 
> I am indexing over http using base_url    =3D>
> 'http://nymphswww.covcollege.ac.uk/index.php',
> 
> I have tried=20
> 
> =20
> 
> ExtractPath News regex
> !^http://nymphswww.covcollege.ac.uk/sections/([^/]+)/.*$!$1!=20
> 
> ExtractPath site regex
> !^http://nymphswww.covcollege.ac.uk/([^/]+)/.*$!$1!
> 
> =20
> 
> (Both work one at a time.)
> 
> =20
> 
> With both =20
> 
> =20
> 
> metaname    =3D> 'site', 'News',     # Can't be a metaname used =
> elsewhere!
> 
> =20
> 
> Or else with both of these together.
> 
> =20
> 
> select_by_meta  =3D> {
> 
>             ***
> 
>             metaname    =3D> 'site',     # Can't be a metaname used
> elsewhere!
> 
>             values      =3D> [qw/ courses /],
> 
>             ***
> 
> =20
> 
> select_by_meta  =3D> {
> 
>             ****
> 
>             metaname    =3D> 'News',     # Can't be a metaname used
> elsewhere!
> 
>             values      =3D> [qw/ News /],
> 
>             ****
> 
> But only the second gets displayed.
> 
> =20
> 
> And=20
> 
> =20
> 
> MetaNames swishdocpath swishtitle site
> 
> ExtractPath site regex
> !^http://nymphswww.covcollege.ac.uk/sections/([^/]+)/.*$!$1! ExtractPath
> site regex !^http://nymphswww.covcollege.ac.uk/([^/]+)/.*$!$1!
> 
> =20
> 
> With
> 
> select_by_meta  =3D> {
> 
>             ***
> 
>             metaname    =3D> 'site',     # Can't be a metaname used
> elsewhere!
> 
>             values      =3D> [qw/ courses News /],
> 
>             ***
> 
> =20
> 
> Only the first regex gets used.
> 
> =20
> 
> Is there a way to get the search I want?
> 
> =20
> 
> Thanks in advance,
> 
> =20
> 
> Sheridan Small
> 
> =20
> 
> ______________________________
> 
>  =20
> 
> Website Co-ordinator
> 
> City College Coventry
> Butts Centre
> The Butts
> COVENTRY
> CV1 3GD
> =20
> 024 7679 1540
> 
> ______________________________
> 
> =20
> 
> 
> 
> 
> *********************************************************************
> Due to deletion of content types excluded from this list by policy,
> this multipart message was reduced to a single part, and from there
> to a plain text message.
> *********************************************************************
> 

-- 
Peter Karman  .  http://peknet.com/  .  peter(at)not-real.peknet.com
Received on Thu May 18 06:48:44 2006