At 02:39 PM 04/11/02 -0700, Michael wrote:
>Is it possible to index dynamic sites that use URL's of the form
>
>http://somewhere.com/page.cfm?object=1234
>a new page is represented by
>http://somewhere.com/page.cfm?object=1235
>
>but I believe that swish-e thinks
>
>http://somewhere.com/page.cfm
>
>has already been indexed and move on without looking at the new page.
>Is this what happens??
No, it will index each url.[1]
>While
>
>User-agent *
>Disallowed: \
http://www.robotstxt.org/wc/exclusion-admin.html
Disallow: /
[1] You can try it yourself:
#!/usr/local/bin/perl -w
use strict;
use CGI;
my $cgi = CGI->new;
print $cgi->header, $cgi->start_html;
if ( $cgi->param('object') ) {
print $cgi->param('object');
} else {
print <<EOF;
Hello main page!
<a href="p.cgi?object=1234">word1234</a>
<a href="p.cgi?object=ABCD">wordABCD</a>
EOF
}
print $cgi->end_html;
> ./swish-e -w not dkdk
# SWISH format: 2.1-dev-25
# Search words: not dkdk
# Number of hits: 3
# Search time: 0.001 seconds
# Run time: 0.040 seconds
1000 /p.cgi?object=ABCD "Untitled Document" 277
1000 /p.cgi?object=1234 "Untitled Document" 277
1000 /p.cgi "Untitled Document" 373
I tried with 2.0.5 and current CVS using both -S prog and -S http.
--
Bill Moseley
mailto:moseley@hank.org
Received on Thu Apr 11 22:48:55 2002