Hi,
=20
I must not have explained the problem clearly.
=20
I want to be able to search by swishdefault and swishtitle and then
limit the search using select_by_meta to limit the results to files in
my /courses/ directory or /news/ directory but not include other files
with courses or news in them.
=20
I have also tried=20
meta_groups =3D> {
site =3D> [qw/courses News/],
},
But I but this will not work as my select_by_meta metaname
=20
Here are swish.conf & a swishcgi.conf files which give me everything
except for limit a search to the News directory.
=20
swish.conf
=20
# Use the "spider.pl" program included with Swish-e
IndexDir spider.pl
IndexFile /var/www/search/index.swish-e
# Define what site to index
SwishProgParameters /var/www/search/swishspider.conf
IndexContents HTML* .htm .html .shtml .php
IndexContents TXT* .pdf .doc .ppt .xls
UndefinedMetaTags index
MetaNames swishdocpath swishtitle News courses
ExtractPath News regex
!^http://nymphswww.covcollege.ac.uk/sections/([^/]+)/.*$!$1!=20
ExtractPath courses regex
!^http://nymphswww.covcollege.ac.uk/([^/]+)/.*$!$1!
StoreDescription HTML* <body> 10000
StoreDescription TXT* 10000
=20
swishcgi.conf=20
=20
return {
title =3D> 'Search Website',
page_size =3D> 10,
sorts =3D> [qw/swishrank swishtitle /],
swish_binary =3D> '/usr/local/bin/swish-e',
swish_index =3D> '/var/www/search/index.swish-e',
metanames =3D> [qw/ swishdefault swishtitle /],
name_labels =3D> {
swishdefault =3D> 'Search All',
swishtitle =3D> 'Page Title',
swishrank =3D> 'Rank',
swishlastmodified =3D> 'Modified',
},
select_by_meta =3D> {
#method =3D> 'radio_group', # pick: radio_group,
popup_menu, or checkbox_group
method =3D> 'checkbox_group',
#method =3D> 'popup_menu',
columns =3D> 1,
metaname =3D> 'courses', # Can't be a metaname used
elsewhere!
values =3D> [qw/ courses /],
labels =3D> {
courses =3D> 'Only search courses: ',
=20
# News =3D> 'News',
},
description =3D> '',
},
template =3D> {
package =3D> 'SWISH::TemplateToolkit',
file =3D> 'swish.tt',
options =3D> {
INCLUDE_PATH =3D> '/var/www/search',
#PRE_PROCESS =3D> 'config',
},
},
}
=20
=20
Regards,
Sheridan=20
=20
-----Original Message-----
From: Peter Karman [mailto:peter@peknet.com]=20
Sent: 18 May 2006 14:48
To: SMALL, SHERIDAN
Cc: Multiple recipients of list
Subject: Re: [SWISH-E]
=20
If you are trying to limit ALL searches to docs with specified paths,=20
better to not put anything else in your index in the first place. See=20
the filter_content() callback in spider.pl docs.
=20
SMALL, SHERIDAN scribbled on 5/18/06 5:59 AM:
Hi,
=20
I am having some trouble using the select_by_meta directive in the
swishcgi.conf file I have also tried using variations of ExtractPath
with no sucsess.=20
=20
The problem is that I want to limit the search to paths with values
"/courses/" and "/news/" not "courses" or "news".
=20
The reason is that I have URLs like the following:-
=20
http://nymphswww.covcollege.ac.uk/email.php?title=3DA/AS%20levels%20-%20P=
a
rt%20Time%20Courses
=20
http://nymphswww.covcollege.ac.uk/courses/template.php?debug=3D1&cat=3Dnu=
ll&
code=3DIDAAD1F1&idx=3Dnull&title=3DBTEC%20Introductory%20Diploma%20in%20A=
rt%20
%20and%20Design
=20
I do not want the first included only the second including /courses/.
=20
And
=20
http://nymphswww.covcollege.ac.uk/email.php?title=3DNews%20item%20-%20A%2=
0
Slice%20of%20Success
=20
http://nymphswww.covcollege.ac.uk/sections/News/index.php?id=3D227&title=3D=
A
%20Slice%20of%20Success
=20
Note the /News/ which I want is preceeded by /sections/.
I have tried using=20
=20
select_by_meta =3D> {
=20
***
=20
metaname =3D> 'swishdocpath', # Can't be a metaname
used elsewhere!
=20
values =3D> [qw '/courses/', '/news/' ],
=20
***
=20
But the "/"s are ignored.
I am indexing over http using base_url =3D>
'http://nymphswww.covcollege.ac.uk/index.php',
=20
I have tried=20
=20
ExtractPath News regex
!^http://nymphswww.covcollege.ac.uk/sections/([^/]+)/.*$!$1!=20
=20
ExtractPath site regex
!^http://nymphswww.covcollege.ac.uk/([^/]+)/.*$!$1!
=20
(Both work one at a time.)
=20
With both =20
=20
metaname =3D> 'site', 'News', # Can't be a metaname used =
elsewhere!
=20
Or else with both of these together.
=20
select_by_meta =3D> {
=20
***
=20
metaname =3D> 'site', # Can't be a metaname used
elsewhere!
=20
values =3D> [qw/ courses /],
=20
***
=20
select_by_meta =3D> {
=20
****
=20
metaname =3D> 'News', # Can't be a metaname used
elsewhere!
=20
values =3D> [qw/ News /],
=20
****
=20
But only the second gets displayed.
=20
And=20
=20
MetaNames swishdocpath swishtitle site
=20
ExtractPath site regex
!^http://nymphswww.covcollege.ac.uk/sections/([^/]+)/.*$!$1! ExtractPath
site regex !^http://nymphswww.covcollege.ac.uk/([^/]+)/.*$!$1!
=20
With
=20
select_by_meta =3D> {
=20
***
=20
metaname =3D> 'site', # Can't be a metaname used
elsewhere!
=20
values =3D> [qw/ courses News /],
=20
***
=20
Only the first regex gets used.
=20
Is there a way to get the search I want?
=20
Thanks in advance,
=20
Sheridan Small
______________________________
=20
Website Co-ordinator
=20
City College Coventry
Butts Centre
The Butts
COVENTRY
CV1 3GD
=20
024 7679 1540
______________________________
=20
>=20
>=20
>=20
>=20
> *********************************************************************
> Due to deletion of content types excluded from this list by policy,
> this multipart message was reduced to a single part, and from there
> to a plain text message.
> *********************************************************************
>=20
=20
--=20
Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
=20
*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Thu May 18 07:17:45 2006