+ Reply to Thread
Results 1 to 5 of 5

Thread: Webcrawler with yahoo.net

  1. #1
    Client Beery_Asst's Avatar
    Join Date
    Feb 2006
    Location
    Houston, TX
    Posts
    189
    Squirrelcart version
    v3.4.0

    Webcrawler with yahoo.net

    I just looked at my Downloads log file.

    My website is http://www.USWingNuts.com .

    Is there anyway to prevent webcrawlers from downloading files in the public area I have setup from my webstore?

  2. #2
    Client coastalrugs's Avatar
    Join Date
    Dec 2007
    Posts
    171
    Squirrelcart version
    v3.2.0
    If it is publicly accessible, there is not much you can do to absolutely prevent download. But any proper spider should respect robots.txt directives.

    You could likely add a robots.txt file with something like:
    User-agent: *
    Disallow: /*.zip$

    This should prevent *.zip files. One could be added for pdfs or any other file type you want ignored.

    For more info, see:
    http://www.google.com/support/webmas...&answer=156449

    I haven;t looked into this, but there may be a mechanism in SC to allow the modules to inject their own headers into the main page's output. If that is the case, it may be possible to have the download page add a NO-FOLLOW meta tag so spiders would not follow the actual download links (though they would also not follow any links back out of the page to the rest of your site)


    Allow there could be solutions that are actually built into SC. I saw some solutions suggested of converting download links into POST requests instead of GET requests, requiring sessions for the user before being able to download, or things like these. Anything that a normal browser would do/have, but a spider ignores.

  3. #3
    Client Beery_Asst's Avatar
    Join Date
    Feb 2006
    Location
    Houston, TX
    Posts
    189
    Squirrelcart version
    v3.4.0
    Thanks. I already had a robots.txt file that I had blocking a specific directory, but not this particular one and not the pdf files I had.

    Thanks again.

    Beery

  4. #4
    Client coastalrugs's Avatar
    Join Date
    Dec 2007
    Posts
    171
    Squirrelcart version
    v3.2.0
    Just a note of clarification:
    It is very likely that even if you had a directive preventing access to sc_data, the spiders will see the file as coming from:
    ...../store/store.php?downloads=1&id=4
    or some derivative thereof.

    A more narrow directive targeting this page might be in order. Hopefully the zip/pdf/etc... route is enough to squash the unnecessary downloads.

  5. #5
    Client Beery_Asst's Avatar
    Join Date
    Feb 2006
    Location
    Houston, TX
    Posts
    189
    Squirrelcart version
    v3.4.0
    I'm going to continue following my logs for a few days and see if I have stopped the majority of the webcrawling. Thanks for the feedback.

    Beery

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

     

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts