Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Change the indexed 'title'

From: <josh(at)not-real.relativelysane.com>
Date: Thu Oct 25 2007 - 15:28:58 GMT
>On 10/25/2007 10:07 AM, josh@relativelysane.com wrote:
>
>> 
>> The weird thing is that its grabbing and populating flavor, and I know thats 
>from the ProperyName string because when I remove it from there; flavor isn't 
>in the dump like the one above.
>> 
>
>can you copy/paste what your config and example docs look like so we can try
>and duplicate what you are seeing?
>
>-- 
>Peter Karman  .  peter(at)not-real.peknet.com  .  http://peknet.com/
>

Sure, they are literally identical to what you used in your example (with the exception of the IndexDir field in my config). Full dumps of the files, the indexing status, the -T INDEX_ALL, and the search query are below. 

[josh@josh]# cat index.cfg
IndexDir test
ExtractPath flavor regex !test/doc-(normal|strong|href)/.*$!$1!
PropertyNames flavor strong a

[josh@josh test]# ls
doc-href  doc-normal  doc-strong

[josh@josh doc-href]# cat docswith-ahref.html
<html>
<head><title>real title</title></head>
<body><a href="bar">title i want</a></body>
</html>

[josh@josh doc-normal]# cat docsthatarenormal.html
<html>
<head><title>real title is the title i want</title></head>
<body><a href="bar">link text</a><strong>strong text</strong> blah </body>
</html>

[josh@josh doc-strong]# cat docswith-strong.html
<html>
<head><title>real title</title></head>
<body><strong>title I want</strong></body>
</html>


[josh@josh]# swish-e -c index.cfg
Indexing Data Source: "File-System"
Indexing "test"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 12 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
12 unique words indexed.
7 properties sorted.
3 files indexed.  344 total bytes.  25 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!


[josh@josh]# swish-e -T INDEX_ALL
# Name:
# Saved as: index.swish-e
# Total Words: 12
# Total Files: 3
# Removed Files: 0
# Total Word Pos: 25
# Removed Word Pos: 0
# Indexed on: 2007-10-25 11:24:19 EDT
# Description:
# Pointer:
# Maintained by:
# MinWordLimit: 1
# MaxWordLimit: 40
# WordCharacters: 0123456789abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
# BeginCharacters: 0123456789abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
# EndCharacters: 0123456789abcdefghijklmnopqrstuvwxyzªµºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
# IgnoreFirstChar:
# IgnoreLastChar:
# StopWords:
# BuzzWords:
# Stemming Applied: 0
# Soundex Applied: 0
# Fuzzy Mode: None
# IgnoreTotalWordCountWhenRanking: 1


-----> METANAMES for index.swish-e <-----
        swishdefault : id= 1 type= 1  META_INDEX  Rank Bias=  0
       swishreccount : id= 2 type=42  META_INTERNAL META_PROP:NUMBER
           swishrank : id= 3 type=42  META_INTERNAL META_PROP:NUMBER
        swishfilenum : id= 4 type=42  META_INTERNAL META_PROP:NUMBER
         swishdbfile : id= 5 type=38  META_INTERNAL META_PROP:STRING(case:compare) SortKeyLen: 100
        swishdocpath : id= 6 type= 6  META_PROP:STRING(case:compare) SortKeyLen: 100  *presorted*
          swishtitle : id= 7 type=70  META_PROP:STRING(case:ignore) SortKeyLen: 100  *presorted*
        swishdocsize : id= 8 type=10  META_PROP:NUMBER *presorted*
   swishlastmodified : id= 9 type=18  META_PROP:DATE *presorted*
              flavor : id=10 type= 1  META_INDEX  Rank Bias=  0
              flavor : id=11 type=70  META_PROP:STRING(case:ignore) SortKeyLen: 100  *presorted*
              strong : id=12 type=70  META_PROP:STRING(case:ignore) SortKeyLen: 100  *presorted*
                   a : id=13 type=70  META_PROP:STRING(case:ignore) SortKeyLen: 100  *presorted*


-----> WORD INFO in index index.swish-e <-----

blah
 Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:12/9

href
 Meta:10 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:1/1

i
 Meta:1 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:4/9
 Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:6/7
 Meta:1 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:4/49

is
 Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:3/7

link
 Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:8/9

normal
 Meta:10 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:1/1

real
 Meta:1 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:1/7
 Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:1/7
 Meta:1 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:1/7

strong
 Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:10/49
 Meta:10 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:1/1

text
 Meta:1 test/doc-normal/docsthatarenormal.html Freq:2 Pos/Struct:9/9,11/49

the
 Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:4/7

title
 Meta:1 test/doc-href/docswith-ahref.html Freq:2 Pos/Struct:2/7,3/9
 Meta:1 test/doc-normal/docsthatarenormal.html Freq:2 Pos/Struct:2/7,5/7
 Meta:1 test/doc-strong/docswith-strong.html Freq:2 Pos/Struct:2/7,3/49

want
 Meta:1 test/doc-href/docswith-ahref.html Freq:1 Pos/Struct:5/9
 Meta:1 test/doc-normal/docsthatarenormal.html Freq:1 Pos/Struct:7/7
 Meta:1 test/doc-strong/docswith-strong.html Freq:1 Pos/Struct:5/49


-----> FILES in index index.swish-e <-----
Dumping File Properties for File Number: 1
 (No Properties)

ReadAllDocProperties:
          swishdocpath: 6 ( 33) S: "test/doc-href/docswith-ahref.html"
            swishtitle: 7 ( 10) S: "real title"
          swishdocsize: 8 (  4) N: "98"
     swishlastmodified: 9 (  4) D: "2007-10-25 11:21:41 EDT"
                flavor:11 (  4) S: "href"

ReadSingleDocPropertiesFromDisk:
          swishdocpath: 6 ( 33) S: "test/doc-href/docswith-ahref.html"
            swishtitle: 7 ( 10) S: "real title"
          swishdocsize: 8 (  4) N: "98"
     swishlastmodified: 9 (  4) D: "2007-10-25 11:21:41 EDT"
                flavor:11 (  4) S: "href"

Dumping File Properties for File Number: 2
 (No Properties)

ReadAllDocProperties:
          swishdocpath: 6 ( 38) S: "test/doc-normal/docsthatarenormal.html"
            swishtitle: 7 ( 30) S: "real title is the title i want"
          swishdocsize: 8 (  4) N: "149"
     swishlastmodified: 9 (  4) D: "2007-10-25 11:22:43 EDT"
                flavor:11 (  6) S: "normal"

ReadSingleDocPropertiesFromDisk:
          swishdocpath: 6 ( 38) S: "test/doc-normal/docsthatarenormal.html"
            swishtitle: 7 ( 30) S: "real title is the title i want"
          swishdocsize: 8 (  4) N: "149"
     swishlastmodified: 9 (  4) D: "2007-10-25 11:22:43 EDT"
                flavor:11 (  6) S: "normal"

Dumping File Properties for File Number: 3
 (No Properties)

ReadAllDocProperties:
          swishdocpath: 6 ( 36) S: "test/doc-strong/docswith-strong.html"
            swishtitle: 7 ( 10) S: "real title"
          swishdocsize: 8 (  4) N: "97"
     swishlastmodified: 9 (  4) D: "2007-10-25 11:23:35 EDT"
                flavor:11 (  6) S: "strong"

ReadSingleDocPropertiesFromDisk:
          swishdocpath: 6 ( 36) S: "test/doc-strong/docswith-strong.html"
            swishtitle: 7 ( 10) S: "real title"
          swishdocsize: 8 (  4) N: "97"
     swishlastmodified: 9 (  4) D: "2007-10-25 11:23:35 EDT"
                flavor:11 (  6) S: "strong"


[josh@josh]# swish-e -w title AND flavor=strong -x '"<strong>" "<swishtitle>" "<flavor>"\n'
# SWISH format: 2.4.5
# Search words: title AND flavor=strong
# Removed stopwords:
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.009 seconds
"" "real title" "strong"




josh
_______________________________________________
Users mailing list
Users@lists.swish-e.org
http://lists.swish-e.org/listinfo/users
Received on Thu Oct 25 11:29:01 2007