When I use this command to spider my site,
Swish-e -S http -I http://www.youngcomposers.com
It takes awhile to spider. I think I would have to wait about a month
for it to finish everything at that rate. It seems to print a neater
temp file though, but there seems to be no way to configure this
(example, can't seem to use a swish.conf file)
Yet, when I use this command
Swish-e -S -c swish.conf
Where swish.conf equals:
SwishProgParameters default http://www.youngcomposers.com
Metanames swishtitle swishdocpath
StoreDescription TXT* 10000
StoreDescription HTML* <body> 10000
I can configure it, but it seems to print out garbage in the temp files,
and the temp files seem to blow up. It also seems to take awhile to
Now you mentioned that swish-e -S http -I http://www.mysite.com is
depreciated, but it is better to use than the following method. I am
not quite sure I follow. What is the common way to spider a site? I'm
confused which method to use. By the way, I was confused when I said I
wanted to spider a database. Both the methods I mention seem to spider
my whole site.
How long does it typically take to spider a site that has about 90,000
[mailto:email@example.com] On Behalf Of Bill Moseley
Sent: Friday, November 04, 2005 3:28 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: spider a database
On Fri, Nov 04, 2005 at 03:16:27PM -0500, Michael Porcaro wrote:
> Please bear with me here and thank you for your patience. I looked at
> your link and searched around. By searching, I assume that swish-e
> spider databases, I wasn't really sure about this before. I came
> this document. Is this the right thing to read, in order to figure
> how to spider my dynamic pages?
Sorry, I was confused as I thought you wanted to index docs in a
database without using http. Which is it?
If you want to index stuff in a database then search for the MySQL.pl
file in the swish-e distribution.
> Also, I am confused as to where I should direct the config file to
> spider the dynamic links. Let's say I want to spider this particular
How does the spider, of anyone for that matter, if that's a static
file or a dynamically generated file?
> Piano-Music-f50.html is actually a php generated file with an html
> alias, but I don't know where to direct swish-e to spider this file.
I have no idea what an html alias is in that context, but you point
the spider to the same place you would point anyone else. To its url.
> When I spider the files under /home/yc/www/forum (my local site for
> www.youngcomposers.com), all it does is spider the files that run the
> forum, not the actual content dynamic pages, such as
> "Piano-Music-f50.html" or equivalently
The term "spider" implies you are spidering your web site, most likely
with the oddly named program "spider.pl". That would be spidering
like google does -- by accessing your documents via the web.
Please go back and look at the docs again.
> So I guess my basic question would be, what is the address of my
> files? A very poor guess is, my database files are located here:
> But is this the address to spider? Or do I spider /home/yc/www/forum
Maybe better is someone else answers that one.
Unsubscribe from or help with the swish-e list:
Help with Swish-e:
Received on Fri Nov 4 20:07:54 2005