Skip to main content.
home | support | download

Back to List Archive

Re: metaname limit?

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Mon Oct 11 2004 - 00:14:06 GMT
On Sun, Oct 10, 2004 at 04:46:03PM -0700, Mark Greenaway wrote:
> I thought I did show everything. You must mean the output:

I mean everything.  Cat all your input files, show the commands you
run for indexing and their output and the command you use for testing.



Ok, so in your last email you had (my reformatting):

File: nacl.config 

MetaNames outputs organisation strategy domain mission hqcountry countries web email 
PropertyNames outputs organisation strategy domain mission hqcountry countries web email 
SwishProgParameters nacl.pl
IndexDir spider.pl IndexFile index.nacl

> /swish-e -c nacl.config -i site4.html -v0 -T properties

Ok, but that's not indexing using spider.pl.  That's why I want to see
everything you type.

>            swishdocpath: 6 ( 10) S: "site4.html"
>            swishdocsize: 8 (  4) N: "724"
>       swishlastmodified: 9 (  4) D: "2004-10-09 09:51:16 EST"
>                 outputs:19 ( 43) S: "Papers Journals Newsletters Policy 
> Research"
>            organisation:20 (  5) S: "Site4"
>                strategy:21 ( 18) S: "research education"
>                  domain:22 ( 23) S: "government politics law"
>                 mission:23 ( 28) S: "To influence decision makers"
>               hqcountry:24 (  9) S: "Australia"
>               countries:25 (  9) S: "Australia"
>                     web:26 ( 23) S: "http://www.site4.org.au"
>                   email:27 ( 16) S: "jim@site4.org.au"

Well, that's very odd, isn't it.

moseley(at)not-real.bumby:~$ wget --quiet http://incres.anu.edu.au/site4.html

moseley@bumby:~$ ls -l site4.html
-rw-r--r--  1 moseley moseley 724 2004-10-07 20:50 site4.html

moseley@bumby:~$ grep hqcountry site4.html
<meta name="hqcountry" content="Australia">

moseley@bumby:~$ cat nacl.config 
MetaNames outputs organisation strategy domain mission hqcountry countries web email 
PropertyNames outputs organisation strategy domain mission hqcountry countries web email 
SwishProgParameters nacl.pl
IndexDir spider.pl 
IndexFile index.nacl

moseley@bumby:~$ swish-e -c nacl.config -i site4.html -v0 -T properties
          swishdocpath: 6 ( 10) S: "site4.html"
            swishtitle: 7 ( 24) S: "Site4 - confusion reigns"
          swishdocsize: 8 (  4) N: "724"
     swishlastmodified: 9 (  4) D: "2004-10-07 20:50:38 PDT"
               outputs:19 ( 43) S: "Papers Journals Newsletters Policy Research"
          organisation:20 (  5) S: "Site4"
              strategy:21 ( 18) S: "research education"
                domain:22 ( 23) S: "government politics law"
               mission:23 ( 28) S: "To influence decision makers"
             hqcountry:24 (  9) S: "Australia"
             countries:25 (  9) S: "Australia"
                   web:26 ( 23) S: "http://www.site4.org.au"
                 email:27 ( 16) S: "jim@site4.org.au"

Also note the difference in the time of the file between your output
and mine?  Why the difference?

moseley(at)not-real.bumby:~$ HEAD http://incres.anu.edu.au/site4.html | grep Last-Mod
Last-Modified: Fri, 08 Oct 2004 03:50:38 GMT

moseley@bumby:~$ TZ=UT swish-e -c nacl.config -i site4.html -v0 -T properties | grep swishlast
     swishlastmodified: 9 (  4) D: "2004-10-08 03:50:38 "

But your example has a different time.  So now I'm wondering if I'm
indexing the same file you are indexing.  I imagine it is because the
file sizes match up, but it's odd the dates are different.

Anyway, you can see how it works for me.  If I were you I'd try from
another machine.  Your description of installation seems a bit more
complex than needed -- make install should be all you need.  No need
to copy files from the src directory any place.

moseley@bumby:~$ md5sum site4.html
9595829207c6a4c5890206becaf8ad68  site4.html

moseley@bumby:~$ cat site4.html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<meta name="organisation" content="Site4">
<meta name="strategy" content="research education">
<meta name="domain" content="government politics law">
<meta name="outputs" content="Papers Journals Newsletters Policy Research">
<meta name="countries" content="Australia">
<meta name="hqcountry" content="Australia">
<meta name="mission" content="To influence decision makers">
<meta name="web" content="http://www.site4.org.au">
<meta name="email" content="jim@site4.org.au">
<TITLE>Site4 - confusion reigns</TITLE>
</HEAD>
<BODY>
<H1>Site4 - NACL Matrix test site</H1>
<hr>
<a href="http://incres.anu.edu.au/nacl/index.html">link</a>
<hr>
</BODY>
</HTML>


-- 
Bill Moseley
moseley@hank.org

Unsubscribe from or help with the swish-e list: 
   http://swish-e.org/Discussion/

Help with Swish-e:
   http://swish-e.org/current/docs
   swish-e@sunsite.berkeley.edu
Received on Sun Oct 10 17:14:20 2004