Skip to main content.
home | support | download

Back to List Archive

AW: Re: Getting a description out of the html <body>

From: Markus Strickler <markus(at)not-real.braindump.ms>
Date: Wed Mar 27 2002 - 15:31:06 GMT
Hi-

Yes swish is returning everything after the <body as description.
See attached config and HTML files.

--------------
Z:\>D:\swish-e\swish-e.exe -c D:\swish-e\conf\test.config -f
D:\swish-e\tmp_test
.index
Indexing Data Source: "File-System"
Indexing "D:/temp/indextest"
Removing very common words...
no words removed.
Writing main index...
Sorting words ...
Sorting 20 words alphabetically
Writing header ...
Writing index entries ...
  Writing word text: Complete
  Writing word hash: Complete
  Writing word data: Complete
20 unique words indexed.
5 properties sorted.
1 file indexed.  289 total bytes.  28 total words.
Elapsed time: 00:00:00 CPU time: 00:00:00
Indexing done!
---------------
D:\swish-e>swish-e -w This -f test.index -p swishdescription
# SWISH format: 2.1-dev-25
# Search words: This
# Number of hits: 1
# Search time: 0.000 seconds
# Run time: 0.030 seconds
1000 D:/temp/indextest/index2.html "index2.html" 289
"class=3D&quot;portal&quot; b
gcolor=3D&quot;#F0F0F0&quot; text=3D&quot;#000000&quot;
link=3D&quot;#000000&quot; vli
nk=3D&quot;#000000&quot; alink=3D&quot;#000000&quot;
leftmargin=3D&quot;0&quot; rightm
argin=3D&quot;0&quot; topmargin=3D&quot;0&quot; =
bottommargin=3D&quot;0&quot;
marginwid
th=3D&quot;0&quot; marginheight=3D&quot;0&quot;> This is a test..."
.
-----------------

This is on Win2000 SP2 with the swish-e-2.1-dev-25-2002-03-22-win32.exe
binary distro.

-markus

-----Urspr=FCngliche Nachricht-----
Von: swish-e@sunsite.berkeley.edu [mailto:swish-e@sunsite.berkeley.edu]
Im Auftrag von Bill Moseley
Gesendet: Mittwoch, 27. M=E4rz 2002 15:43
An: Multiple recipients of list
Betreff: [SWISH-E] Re: Getting a description out of the html <body>


At 04:30 AM 3/27/2002 -0800, Markus Strickler wrote:

>For exampleif my html contains:
><body class=3D"portal" bgcolor=3D"#F0F0F0" text=3D"#000000" =
link=3D"#000000"=20
>vlink=3D"#000000" alink=3D"#000000" leftmargin=3D"0" rightmargin=3D"0"=20
>topmargin=3D"0" bottommargin=3D"0" marginwidth=3D"0" =
marginheight=3D"0">
>
>Swishdescription will start with:
>class=3D"portal" bgcolor=3D"#F0F0F0" text=3D"#000000" link=3D"#000000"=20
>vlink=3D"#000000" alink=3D"#000000" leftmargin=3D"0" rightmargin=3D"0"=20
>topmargin=3D"0" bottommargin=3D"0" marginwidth=3D"0" =
marginheight=3D"0">

Are you saying that swish is returning that for the description?

>Is this a bug? Or did I something wrong in the config?

Yes, look at line 23 of your config file.  I hope my ESP powers are
working! Post your config AND a the HTML file, as you shouldn't see the
contents of the tag.

Oh regarding <span>, you might look at XMLClassAttributes in the 2.1
docs. I though I had added a config option to let you define what
attributes to use (instead of hard-coding "class"), and I can't imagine
(at this moment) why that couldn't be extended to HTML parsing.



Bill Moseley
mailto:moseley@hank.org



*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Wed Mar 27 15:31:09 2002