Skip to main content.
home | support | download

Back to List Archive

Re: MetaNames - XML

From: Antonio Barrera <abarrera(at)not-real.Princeton.EDU>
Date: Fri Oct 01 2004 - 18:58:45 GMT
Probably made a newbie error, but this is the result of adding the
MetaNamesAlias
MetaNames maintitle alttitle description longdescription keywords
MetaNamesAlias swishdefault keywords
MetaNamesAlias swishdefault description
MetaNamesAlias swishdefault longdescription
#UndefinedMetaTags index
PropertyNames maintitle title description longdescription link
ConvertHTMLEntities yes

"az.config" [converted] 141L, 5332C written
[antonio@libserv4 antonio]$ !sw
swish-e -c az.config
Bad directive on line #34 of file az.config: MetaNamesAlias swishdefault
keywords 
Bad directive on line #35 of file az.config: MetaNamesAlias swishdefault
description 
Bad directive on line #36 of file az.config: MetaNamesAlias swishdefault
longdescription 

-----Original Message-----
From: swish-e@sunsite3.berkeley.edu [mailto:swish-e@sunsite3.berkeley.edu]
On Behalf Of Peter Karman
Sent: Friday, October 01, 2004 2:40 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: MetaNames - XML

you probably want to alias some of your metanames into swishdefault. 
Otherwise you have to specify the metaname to search in. try this in your
config:

MetaNamesAlias swishdefault  keywords

that way content in <keywords> should get indexed under both metanames.



Antonio Barrera wrote on 10/1/04 1:34 PM:

> I'm using 2.4.2, here's my full test.
> 
> [antonio@libserv4 antonio]$ swish-e -c az.config -T indexed_words 
> Indexing Data Source: "File-System"
> Indexing "/home/antonio/azs"
> 
> Checking dir "/home/antonio/azs"...
>   143.xml - Using XML2 parser -     Adding:[1:swishdefault(1)]   'http'
> Pos:3  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   'libweb.princeton.edu'   Pos:4  Stuct:0x1
(
> FILE )
>     Adding:[1:swishdefault(1)]   'depart'   Pos:5  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   'fiscal'   Pos:6  Stuct:0x1 ( FILE )
>     Adding:[1:swishdefault(1)]   'photoservices.php'   Pos:7  Stuct:0x1 (
> FILE )
>     Adding:[1:alttitle(11)]   'copi'   Pos:11  Stuct:0x1 ( FILE )
>     Adding:[1:alttitle(11)]   'servic'   Pos:12  Stuct:0x1 ( FILE )
>     Adding:[1:maintitle(10)]   'photoservic'   Pos:15  Stuct:0x1 ( FILE )
>     Adding:[1:keywords(14)]   'copi'   Pos:23  Stuct:0x1 ( FILE )
>     Adding:[1:keywords(14)]   'photocopi'   Pos:24  Stuct:0x1 ( FILE )
>     Adding:[1:keywords(14)]   'photodupl'   Pos:25  Stuct:0x1 ( FILE )
>     Adding:[1:keywords(14)]   'photocopi'   Pos:26  Stuct:0x1 ( FILE )
>     Adding:[1:keywords(14)]   'reproduct'   Pos:27  Stuct:0x1 ( FILE )
>     Adding:[1:keywords(14)]   'xerox'   Pos:28  Stuct:0x1 ( FILE )
>     Adding:[1:keywords(14)]   'copier'   Pos:29  Stuct:0x1 ( FILE )
>  (15 words)
> 
> Removing very common words...
> no words removed.
> Writing main index...
> Sorting words ...
> Sorting 13 words alphabetically
> Writing header ...
> Writing index entries ...
>   Writing word text: Complete
>   Writing word hash: Complete
>   Writing word data: Complete
> 13 unique words indexed.
> 10 properties sorted.                                              
> 1 file indexed.  408 total bytes.  15 total words.
> Elapsed time: 00:00:00 CPU time: 00:00:00 Indexing done!
> [antonio@libserv4 antonio]$ cat azs/143.xml <?xml version="1.0" 
> encoding="ISO-8859-1"?> <record id='162'> 
> <link>http://libweb.princeton.edu/departments/fiscal/photoservices.php
> </link
> 
> <title>
> <alttitle>Copying services</alttitle>
> <maintitle>Photoservices</maintitle>
> </title>
> <description></description>
> <longdescription></longdescription>
> <keywords>copy, photocopying, photoduplication, photocopiers, 
> reproduction, xerox, copiers</keywords> </record>
> [antonio@libserv4 antonio]$ swish-e -f az.xml.index -w "xerox"
> # SWISH format: 2.4.2
> # Search words: xerox
> # Removed stopwords: 
> err: no results
> .
> [antonio@libserv4 antonio]$
> 
> -----Original Message-----
> From: swish-e@sunsite3.berkeley.edu 
> [mailto:swish-e@sunsite3.berkeley.edu]
> On Behalf Of Peter Karman
> Sent: Friday, October 01, 2004 1:09 PM
> To: Multiple recipients of list
> Subject: [SWISH-E] Re: MetaNames - XML
> 
> 
> 
> Antonio Barrera wrote on 10/1/04 10:18 AM:
> 
> 
>>Bill,
>>
>>Here are the search results using different MetaNames treatments.
>>
>>Using specified MetaTags:
>>- MetaNames maintitle alttitle brief_description long_description 
>>keywords
>>
>>[antonio@libserv4 antonio]$ swish-e -f az.xml.index -w "photoservices" 
>>-p maintitle link description # SWISH format: 2.4.2 # Search words: 
>>photoservices # Removed stopwords:
>>err: no results
>>.
> 
> 
> hmm. works for me with the latest CVS version (2.5.2):
> 
> karpet@cartermac 6% swish-e -i xml -c c -T indexed_words Indexing Data
> Source: "File-System"
> Indexing "xml"
>      Adding:[1:swishdefault(1)]   'http'   Pos:7  Stuct:0x9 ( BODY FILE )
>      Adding:[1:swishdefault(1)]   'libweb'   Pos:8  Stuct:0x9 ( BODY FILE
)
>      Adding:[1:swishdefault(1)]   'princeton'   Pos:9  Stuct:0x9 ( BODY 
> FILE )
>      Adding:[1:swishdefault(1)]   'edu'   Pos:10  Stuct:0x9 ( BODY FILE )
>      Adding:[1:swishdefault(1)]   'departments'   Pos:11  Stuct:0x9 ( 
> BODY FILE )
>      Adding:[1:swishdefault(1)]   'fiscal'   Pos:12  Stuct:0x9 ( BODY FILE
)
>      Adding:[1:swishdefault(1)]   'photoservices'   Pos:13  Stuct:0x9 ( 
> BODY FILE )
>      Adding:[1:swishdefault(1)]   'php'   Pos:14  Stuct:0x9 ( BODY FILE )
>      Adding:[1:alttitle(11)]   'copying'   Pos:18  Stuct:0x8B ( META 
> BODY TITLE FILE )
>      Adding:[1:alttitle(11)]   'services'   Pos:19  Stuct:0x8B ( META 
> BODY TITLE FILE )
>      Adding:[1:maintitle(10)]   'photoservices'   Pos:22  Stuct:0x8B ( 
> META BODY TITLE FILE )
>      Adding:[1:keywords(14)]   'copy'   Pos:31  Stuct:0x89 ( META BODY 
> FILE )
>      Adding:[1:keywords(14)]   'photocopying'   Pos:32  Stuct:0x89 ( 
> META BODY FILE )
>      Adding:[1:keywords(14)]   'photoduplication'   Pos:33  Stuct:0x89 ( 
> META BODY FILE )
>      Adding:[1:keywords(14)]   'photocopiers'   Pos:34  Stuct:0x89 ( 
> META BODY FILE )
>      Adding:[1:keywords(14)]   'reproduction'   Pos:35  Stuct:0x89 ( 
> META BODY FILE )
>      Adding:[1:keywords(14)]   'xerox'   Pos:36  Stuct:0x89 ( META BODY 
> FILE )
>      Adding:[1:keywords(14)]   'copiers'   Pos:37  Stuct:0x89 ( META 
> BODY FILE )
> Removing very common words...
> no words removed.
> Writing main index...
> Sorting words ...
> Sorting 17 words alphabetically
> Writing header ...
> Writing index entries ...
>    Writing word text: Complete
>    Writing word hash: Complete
>    Writing word data: Complete
> 17 unique words indexed.
> 4 properties sorted.
> 1 file indexed.  408 total bytes.  18 total words.
> Elapsed time: 00:00:00 CPU time: 00:00:00 Indexing done!
> karpet@cartermac 7% cat xml
> <?xml version="1.0" encoding="ISO-8859-1"?> <record id='162'> 
> <link>http://libweb.princeton.edu/departments/fiscal/photoservices.php
> </link
> 
> <title>
> <alttitle>Copying services</alttitle>
> <maintitle>Photoservices</maintitle>
> </title>
> <description></description>
> <longdescription></longdescription>
> <keywords>copy, photocopying, photoduplication, photocopiers, 
> reproduction, xerox, copiers</keywords> </record> karpet@cartermac 8% 
> swish-e -w photoservices # SWISH format: 2.5.2 # Search words: 
> photoservices # Removed
> stopwords:
> # Number of hits: 1
> # Search time: 0.006 seconds
> # Run time: 0.037 seconds
> 1000 xml "Copying services Photoservices" 408
> 
> 
> 
> 
>>Using unspecified MetaTags:
>>UndefinedMetaTags index
>>
>># SWISH format: 2.4.2
>># Search words: photoservices
>># Removed stopwords: 
>># Number of hits: 1
>># Search time: 0.000 seconds
>># Run time: 0.025 seconds
>>1000 /home/antonio/az/143.xml "143.xml" 408 "Photoservices"
>>"http://libweb.princeton.edu/departments/fiscal/photoservices.php" ""
>> 
>>
>>
>>Antonio
>>
>>-----Original Message-----
>>From: swish-e@sunsite3.berkeley.edu
>>[mailto:swish-e@sunsite3.berkeley.edu]
>>On Behalf Of Bill Moseley
>>Sent: Friday, October 01, 2004 9:57 AM
>>To: Multiple recipients of list
>>Subject: [SWISH-E] Re: MetaNames - XML
>>
>>On Fri, Oct 01, 2004 at 06:40:30AM -0700, Antonio Barrera wrote:
>>
>>
>>>Problem occurs with the MetaNames, some of them are not being indexed.
>>
>>
>>I guess I'm not following what's not working.  Can you index using -T 
>>indexed_words and point out what's missing?
>>
>>I'm not that happy with how indexing XML works -- for example if you 
>>tell swish to ignore a tag it ignores everything inside that tag even 
>>if you specify a metaname or property.  Plus, should be able to ignore 
>>metatags and properties separately.
>>
>>
>>--
>>Bill Moseley
>>moseley@hank.org
>>
>>Unsubscribe from or help with the swish-e list: 
>>   http://swish-e.org/Discussion/
>>
>>Help with Swish-e:
>>   http://swish-e.org/current/docs
>>   swish-e@sunsite.berkeley.edu
>>
> 
> 
> --
> Peter Karman  -  http://www.cray.com/craydoc/ -  karman(at)not-real.cray.com

--
Peter Karman  -  http://www.cray.com/craydoc/ -  karman(at)not-real.cray.com
Received on Fri Oct 1 11:58:54 2004