besides the frame issue that I'll solve later I think that some problems
resides in the http spider..
for example
i put as start page http://www.glamm.com/latosx.htm
(you can check it just to see how is formatted)
this is what I got when indexing
IndexFile /online/www/home/glamm/myindex1
"glammhttp.config" 192 lines, 7399 characters
# /usr/local/bin/swish-e -c glammhttp.config -S http
Indexing Data Source: "HTTP-Crawler"
retrieving http://www.glamm.com/latosx.htm (0)...
(138 words)
Removing very common words... no words removed.
Writing main index... 96 unique words indexed.
Writing file index... 1 file indexed.
Running time: 1 minute, 1 second.
Indexing done!
As a matter of fact it indexed just that page
even if the variable
MaxDepth 5
is set to 5
and that file (latosx.htm) contains some links.
Why it doesn't follow those links?
Thanks
Matteo
*********** REPLY SEPARATOR ***********
On 28/06/99 at 8.06 Roy Tennant wrote:
>The way we have handled this is to use the regular expressions capability
>to replace the indexed file name with the frameset. That is, if
>"mypage.html" is the page that sets up the frames and calls the other page
>fragments, then name the page fragments uniquely and rename them in the
>index using "ReplaceRules" in your configuration file.
>
>mypage.html indexed under its own name
>frag1.html indexed as "mypage.html"
>frag2.html indexed as "mypage.html"
>
>Thus all of the pieces point to the frameset.
>Roy
>
>On Mon, 28 Jun 1999, Dan Brickley wrote:
>
>> On Mon, 28 Jun 1999, Matteo Barbieri wrote:
>>
>> > I successfully created my first index file in filesystem mode..
>> > In http mode I found that the robot doesn't traverse the site
>> > but stops on the first html
>> > I don't get back any error so I am wondering if the spider is
>> > frame aware.
>>
>> As an aside, it's difficult in the general case building a
>> frameset-aware robot and search tool, since the composite-frameset
>> doesn't have its own URL, so you'd need to auto-generate the appropriate
>> frameset and populate it with the two or three appropriate URLs if you
>> wanted to present users with the pages they'd found. (otherwise you can
>> show them the page, but they'd lose all navigational context from the
>> surrounding frame parts)
>>
>> Dan
>>
>>
>>
>>
>> --
>> Daniel.Brickley@bristol.ac.uk
>> Institute for Learning and Research Technology http://www.ilrt.bris.ac.uk/
>> University of Bristol, Bristol BS8 1TN, UK. phone:+44(0)117-9287096
>>
>>
==================================================================
Dott. Matteo Barbieri (matteo@glamm.com)
http://www.glamm.com
GLAMM Interactive
V.le Corsica 7, 20133 Milano
Tel. +39 - 2 - 74.81.171 Fax. +39 - 2 - 74.81.1726
===================================================================
Received on Mon Jun 28 08:37:38 1999