On Monday 10 June 2002 23:01, you wrote:
> At 05:38 AM 06/10/02 -0700, Andrew Lord wrote:
> >In the meantime, I've been working on getting swishdev to index my .php
> >files via the http method. No problems creating the index but a search of
> >the index only provides a result when looking for words contained in the
> >title of the .php file. Those contained in any MetaNames are not found at
> >search.
> Best way to get help is to provided some samples:
Hi Bill,
No worries. Example follows below:
bz23.php generates the following html
-------------------------------------------------------
<html>
<body>
<a href="/bz23.php?id=1">testdoc orderin system</a><br>
<a href="/bz23.php?id=1">nodoc diorder anarch</a><br>
</body>
</html>
---------------------------------------------
bz23.php?id=1 generates the following content from a mysql database.
---------------------------------------------
<html>
<head>
<title>Crazy Thang</title>
<meta NAME="Meta1" VALUE="testword, aardvark">
<meta NAME="Meta2" VALUE="nothing at all">
</head>
<body BACKGROUND="bkgnd.gif" BGCOLOR="#FFFFFF">
<p><a HREF="#Anchor"><font SIZE="2" FACE="Arial">
etc..
bz23.php?id=2 generates the following content from a mysql database.
---------------------------------------------
<html>
<head>
<title>Sane Yang</title>
<meta NAME="Meta1" VALUE="">
<meta NAME="Meta2" VALUE="">
</head>
<body BACKGROUND="bkgnd.gif" BGCOLOR="#FFFFFF">
<p><a HREF="#Anchor"><font SIZE="2" FACE="Arial">
etc..
Swish ====> swish-e-2.1-dev-10-06-02
---------swishdev.conf---------------------------------------
IndexComments yes
ReplaceRules replace "/home/httpd/html/" "//localhost.localdomain/"
MinWordLimit 3
WordCharacters abcdefghijklmnopqrstuvwxyz&SŲ0123456789_\|/-+=?!@$%^'
IgnoreLimit 50 1000
IndexComments 0
IndexReport 1
IndexName swishdev
IndexFile swishdev.swish
IndexDir http://localhost.localdomain/bz23.php
EquivalentServer http://localhost.localdomain/
MaxDepth 10
Delay 1
SpiderDirectory /home/httpd/html/swishdev/src
------------------------------------------------------------------
Indexing is performed at the command line as follows
/home/httpd/html/swishdev/src/swish-e -S http -c
/home/httpd/html/indexes/swishdev.conf -f
/home/httpd/html/indexes/swishdev.swish -T indexed_words
Indexing Data Source: "HTTP-Crawler"
Indexing "http://localhost.localdomain/bz23.php"
Adding:[1:swishdefault(1)] 'testdoc' Pos:1 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'orderin' Pos:2 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'system' Pos:3 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'nodoc' Pos:4 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'diorder' Pos:5 Stuct:0x1 ( FILE )
Adding:[1:swishdefault(1)] 'anarch' Pos:6 Stuct:0x1 ( FILE )
Adding:[2:swishdefault(1)] 'crazy' Pos:1 Stuct:0x7 ( HEAD TITLE FILE )
Adding:[2:swishdefault(1)] 'thang' Pos:2 Stuct:0x7 ( HEAD TITLE FILE )
Adding:[3:swishdefault(1)] 'sane' Pos:1 Stuct:0x7 ( HEAD TITLE FILE )
Adding:[3swishdefault(1)] 'yang' Pos:2 Stuct:0x7 ( HEAD TITLE FILE )
Removing very common words. . .
Getting IgnoreLimit stopwords: Complete
no words removed.
Writing main index. . .
Sorting words . . .
Sorting 10 words alphabetically
Writing header . . .
Writing index entries . . .
Writing word text: Complete
Writing word hash: Complete
Writing word data: Complete
10 unique words indexed.
2 properties sorted.
3 files indexed. 3844 totalbytes. 10 total words.
Elapsed time: 00:00:03 CPU time: 00:00:00
Indexing done!
Searching for the word "testword" was done as follows:
/home/httpd/html/swishdev/src/swish-e -f
/home/httpd/html/indexes/swishdev.swish -w testword
Result was:
# SWISH format: 2.1-dev-25
# Search words: testword
err: no results
Searching for the word "crazy" was done as follows:
/home/httpd/html/swishdev/src/swish-e -f
/home/httpd/html/indexes/swishdev.swish -w crazy
Result was:
# SWISH format: 2.1-dev-25
# Search words: crazy
# Number of hits: 1
# Search time: 0.001 seconds
# Run time: 0.043 seconds
1000 http://localhost.localdomain/bz23.php?id=1 "testdoc orderin system"
________________________________________________________________
Please let me know if you require any further info. to help pinpoint the
problem with indexing of MetaNames.
Cheers,
Andrew Lord
Received on Mon Jun 10 15:25:47 2002