Skip to main content.
home | support | download

Back to List Archive

Re: MetaNames - XML

From: Antonio Barrera <abarrera(at)not-real.Princeton.EDU>
Date: Fri Oct 01 2004 - 18:35:28 GMT
I'm using 2.4.2, here's my full test.

[antonio@libserv4 antonio]$ swish-e -c az.config -T indexed_words
Indexing Data Source: "File-System"
Indexing "/home/antonio/azs"

Checking dir "/home/antonio/azs"...
  143.xml - Using XML2 parser -     Adding:[1:swishdefault(1)]   'http'
Pos:3  Stuct:0x1 ( FILE )
    Adding:[1:swishdefault(1)]   'libweb.princeton.edu'   Pos:4  Stuct:0x1 (
FILE )
    Adding:[1:swishdefault(1)]   'depart'   Pos:5  Stuct:0x1 ( FILE )
    Adding:[1:swishdefault(1)]   'fiscal'   Pos:6  Stuct:0x1 ( FILE )
    Adding:[1:swishdefault(1)]   'photoservices.php'   Pos:7  Stuct:0x1 (
FILE )
    Adding:[1:alttitle(11)]   'copi'   Pos:11  Stuct:0x1 ( FILE )
    Adding:[1:alttitle(11)]   'servic'   Pos:12  Stuct:0x1 ( FILE )
    Adding:[1:maintitle(10)]   'photoservic'   Pos:15  Stuct:0x1 ( FILE )
    Adding:[1:keywords(14)]   'copi'   Pos:23  Stuct:0x1 ( FILE )
    Adding:[1:keywords(14)]   'photocopi'   Pos:24  Stuct:0x1 ( FILE )
    Adding:[1:keywords(14)]   'photodupl'   Pos:25  Stuct:0x1 ( FILE )
    Adding:[1:keywords(14)]   'photocopi'   Pos:26  Stuct:0x1 ( FILE )
    Adding:[1:keywords(14)]   'reproduct'   Pos:27  Stuct:0x1 ( FILE )
    Adding:[1:keywords(14)]   'xerox'   Pos:28  Stuct:0x1 ( FILE )
    Adding:[1:keywords(14)]   'copier'   Pos:29  Stuct:0x1 ( FILE )
 (15 words)

Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 13 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
13 unique words indexed.
10 properties sorted.                                              
1 file indexed.  408 total bytes.  15 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
[antonio@libserv4 antonio]$ cat azs/143.xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<record id='162'>
<link>http://libweb.princeton.edu/departments/fiscal/photoservices.php</link
>
<title>
<alttitle>Copying services</alttitle>
<maintitle>Photoservices</maintitle>
</title>
<description></description>
<longdescription></longdescription>
<keywords>copy, photocopying, photoduplication, photocopiers, reproduction,
xerox, copiers</keywords>
</record>
[antonio@libserv4 antonio]$ swish-e -f az.xml.index -w "xerox"
# SWISH format: 2.4.2
# Search words: xerox
# Removed stopwords: 
err: no results
.
[antonio@libserv4 antonio]$ 

-----Original Message-----
From: swish-e@sunsite3.berkeley.edu [mailto:swish-e@sunsite3.berkeley.edu]
On Behalf Of Peter Karman
Sent: Friday, October 01, 2004 1:09 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: MetaNames - XML



Antonio Barrera wrote on 10/1/04 10:18 AM:

> Bill,
> 
> Here are the search results using different MetaNames treatments.
> 
> Using specified MetaTags:
> - MetaNames maintitle alttitle brief_description long_description 
> keywords
> 
> [antonio@libserv4 antonio]$ swish-e -f az.xml.index -w "photoservices" 
> -p maintitle link description # SWISH format: 2.4.2 # Search words: 
> photoservices # Removed stopwords:
> err: no results
> .

hmm. works for me with the latest CVS version (2.5.2):

karpet@cartermac 6% swish-e -i xml -c c -T indexed_words Indexing Data
Source: "File-System"
Indexing "xml"
     Adding:[1:swishdefault(1)]   'http'   Pos:7  Stuct:0x9 ( BODY FILE )
     Adding:[1:swishdefault(1)]   'libweb'   Pos:8  Stuct:0x9 ( BODY FILE )
     Adding:[1:swishdefault(1)]   'princeton'   Pos:9  Stuct:0x9 ( BODY 
FILE )
     Adding:[1:swishdefault(1)]   'edu'   Pos:10  Stuct:0x9 ( BODY FILE )
     Adding:[1:swishdefault(1)]   'departments'   Pos:11  Stuct:0x9 ( 
BODY FILE )
     Adding:[1:swishdefault(1)]   'fiscal'   Pos:12  Stuct:0x9 ( BODY FILE )
     Adding:[1:swishdefault(1)]   'photoservices'   Pos:13  Stuct:0x9 ( 
BODY FILE )
     Adding:[1:swishdefault(1)]   'php'   Pos:14  Stuct:0x9 ( BODY FILE )
     Adding:[1:alttitle(11)]   'copying'   Pos:18  Stuct:0x8B ( META 
BODY TITLE FILE )
     Adding:[1:alttitle(11)]   'services'   Pos:19  Stuct:0x8B ( META 
BODY TITLE FILE )
     Adding:[1:maintitle(10)]   'photoservices'   Pos:22  Stuct:0x8B ( 
META BODY TITLE FILE )
     Adding:[1:keywords(14)]   'copy'   Pos:31  Stuct:0x89 ( META BODY 
FILE )
     Adding:[1:keywords(14)]   'photocopying'   Pos:32  Stuct:0x89 ( 
META BODY FILE )
     Adding:[1:keywords(14)]   'photoduplication'   Pos:33  Stuct:0x89 ( 
META BODY FILE )
     Adding:[1:keywords(14)]   'photocopiers'   Pos:34  Stuct:0x89 ( 
META BODY FILE )
     Adding:[1:keywords(14)]   'reproduction'   Pos:35  Stuct:0x89 ( 
META BODY FILE )
     Adding:[1:keywords(14)]   'xerox'   Pos:36  Stuct:0x89 ( META BODY 
FILE )
     Adding:[1:keywords(14)]   'copiers'   Pos:37  Stuct:0x89 ( META 
BODY FILE )
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 17 words alphabetically
Writing header ...
Writing index entries ...
   Writing word text: Complete
   Writing word hash: Complete
   Writing word data: Complete
17 unique words indexed.
4 properties sorted.
1 file indexed.  408 total bytes.  18 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00 Indexing done!
karpet@cartermac 7% cat xml
<?xml version="1.0" encoding="ISO-8859-1"?> <record id='162'>
<link>http://libweb.princeton.edu/departments/fiscal/photoservices.php</link
>
<title>
<alttitle>Copying services</alttitle>
<maintitle>Photoservices</maintitle>
</title>
<description></description>
<longdescription></longdescription>
<keywords>copy, photocopying, photoduplication, photocopiers, reproduction,
xerox, copiers</keywords> </record> karpet@cartermac 8% swish-e -w
photoservices # SWISH format: 2.5.2 # Search words: photoservices # Removed
stopwords:
# Number of hits: 1
# Search time: 0.006 seconds
# Run time: 0.037 seconds
1000 xml "Copying services Photoservices" 408



> 
> Using unspecified MetaTags:
> UndefinedMetaTags index
> 
> # SWISH format: 2.4.2
> # Search words: photoservices
> # Removed stopwords: 
> # Number of hits: 1
> # Search time: 0.000 seconds
> # Run time: 0.025 seconds
> 1000 /home/antonio/az/143.xml "143.xml" 408 "Photoservices"
> "http://libweb.princeton.edu/departments/fiscal/photoservices.php" ""
>  
> 
> 
> Antonio
> 
> -----Original Message-----
> From: swish-e@sunsite3.berkeley.edu 
> [mailto:swish-e@sunsite3.berkeley.edu]
> On Behalf Of Bill Moseley
> Sent: Friday, October 01, 2004 9:57 AM
> To: Multiple recipients of list
> Subject: [SWISH-E] Re: MetaNames - XML
> 
> On Fri, Oct 01, 2004 at 06:40:30AM -0700, Antonio Barrera wrote:
> 
>>Problem occurs with the MetaNames, some of them are not being indexed.
> 
> 
> I guess I'm not following what's not working.  Can you index using -T 
> indexed_words and point out what's missing?
> 
> I'm not that happy with how indexing XML works -- for example if you 
> tell swish to ignore a tag it ignores everything inside that tag even 
> if you specify a metaname or property.  Plus, should be able to ignore 
> metatags and properties separately.
> 
> 
> --
> Bill Moseley
> moseley@hank.org
> 
> Unsubscribe from or help with the swish-e list: 
>    http://swish-e.org/Discussion/
> 
> Help with Swish-e:
>    http://swish-e.org/current/docs
>    swish-e@sunsite.berkeley.edu
> 

--
Peter Karman  -  http://www.cray.com/craydoc/ -  karman(at)not-real.cray.com
Received on Fri Oct 1 11:35:54 2004