Dashboard > Digital Library Infrastructure > ... > Web Spiders > sample robots.txt
Digital Library Infrastructure Log In   View a printable version of the current page.
sample robots.txt
DIDO -- Digital Images Delivered Online DLC -- Digital Library of the Commons Hohenberger Photographs Lilly Sheet Music U.S. Steel Cushman Victorian Women Writers Project Variations2 Wright American Fiction Project Hoagy Carmichael Collection
Added by Sarah Schmiechen, last edited by Sarah Schmiechen on Apr 18, 2007  (view change)

System:

ACM

#Yahoo Search
User-agent: Slurp
Crawl-delay: 80
Disallow: /results.cfm
Disallow: /biblist.cfm
Disallow: /authors.cfm
Disallow: /reviewers.cfm
Disallow: /ccs.cfm
Disallow: /subjects.cfm
Disallow: /nouns.cfm

#Don't allow indexing
User-agent: *
Disallow: /

UofC DL

User-agent: *
Disallow: /TestInfo/
Disallow: /Test/
Disallow: /StaffInfo/
Disallow: /staffweb/
Disallow: /dldc/
Disallow: /~chas/
Disallow: /cgi-bin/
Disallow: /e/keith/
Disallow: /keith/staffweb/
Disallow: /~keith/staffweb/
Disallow: /archives/
Disallow: /e/chas/
Disallow: /bus/
Disallow: /e/busecon/macroecon/
Disallow: /e/qinchen/
Disallow: /e/nross/
Disallow: /phplib/
Disallow: /staffweb/depts/ils/reports/
Disallow: /staffweb/depts/ils/projects/
Disallow: /staffweb/depts/ils/projects/corinthian/
Disallow: /staffweb/depts/ils/projects/faceted-browsing/

UPenn

User-Agent: *
Disallow: /digital/oup
Disallow: /verity

Berkeley

User-agent: *
Disallow: /cgi-bin/
Disallow: /cgi/
Disallow: /webstats/berkmapper/
Disallow: /ftp/
Disallow: /imgs/512x768/

Amazon

  1. Disallow all crawlers access to certain pages.

User-agent: *
Disallow: /exec/obidos/account-access-login
Disallow: /exec/obidos/change-style
Disallow: /exec/obidos/flex-sign-in
Disallow: /exec/obidos/handle-buy-box
Disallow: /exec/obidos/tg/cm/member
Disallow: /gp/cart
Disallow: /gp/flex
Disallow: /gp/product/e-mail-friend
Disallow: /gp/product/product-availability
Disallow: /gp/product/rate-this-item
Disallow: /gp/sign-in

Google:

User-agent: *
Allow: /searchhistory/
Disallow: /news?output=xhtml&
Allow: /news?output=xhtml
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /catalogs
Disallow: /catalogues
Disallow: /news
Disallow: /nwshp
Disallow: /?
Disallow: /addurl/image?
Disallow: /pagead/
Disallow: /relpage/
Disallow: /relcontent
Disallow: /sorry/
Disallow: /imgres
Disallow: /keyword/
Disallow: /u/
Disallow: /univ/
Disallow: /cobrand
Disallow: /custom
Disallow: /advanced_group_search
Disallow: /advanced_search
Disallow: /googlesite
Disallow: /preferences
Disallow: /setprefs
Disallow: /swr
Disallow: /url
Disallow: /m?
Disallow: /m/search?
Disallow: /wml?
Disallow: /wml/search?
Disallow: /xhtml?
Disallow: /xhtml/search?
Disallow: /xml?
Disallow: /imode?
Disallow: /imode/search?
Disallow: /jsky?
Disallow: /jsky/search?
Disallow: /pda?
Disallow: /pda/search?
Disallow: /sprint_xhtml
Disallow: /sprint_wml
Disallow: /pqa
Disallow: /palm
Disallow: /gwt/
Disallow: /purchases
Disallow: /hws
Disallow: /bsd?
Disallow: /linux?
Disallow: /mac?
Disallow: /microsoft?
Disallow: /unclesam?
Disallow: /answers/search?q=
Disallow: /local?
Disallow: /local_url
Disallow: /froogle?
Disallow: /froogle_
Disallow: /print
Disallow: /books
Disallow: /patents?
Disallow: /scholar?
Disallow: /complete
Disallow: /sponsoredlinks
Disallow: /videosearch?
Disallow: /videopreview?
Disallow: /videoprograminfo?
Disallow: /maps?
Disallow: /translate?
Disallow: /ie?
Disallow: /sms/demo?
Disallow: /katrina?
Disallow: /blogsearch?
Disallow: /blogsearch/
Disallow: /blogsearch_feeds
Disallow: /advanced_blog_search
Disallow: /reader/
Disallow: /uds/
Disallow: /chart?
Disallow: /transit?
Disallow: /mbd?
Disallow: /extern_js/
Disallow: /calendar/feeds/
Disallow: /calendar/ical/
Disallow: /cl2/feeds/
Disallow: /cl2/ical/
Disallow: /coop/directory
Disallow: /coop/manage
Disallow: /trends?
Disallow: /trends/music?
Disallow: /notebook/search?
Disallow: /music
Disallow: /browsersync
Disallow: /call
Disallow: /archivesearch?
Disallow: /archivesearch/url
Disallow: /archivesearch/advanced_search
Disallow: /base/search?
Disallow: /base/reportbadoffer
Disallow: /base/s2
Disallow: /urchin_test/
Disallow: /movies?
Disallow: /codesearch?
Disallow: /codesearch/feeds/search?
Disallow: /wapsearch?
Disallow: /safebrowsing
Disallow: /reviews/search?
Disallow: /orkut/albums
Disallow: /jsapi

Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.5.4 Build:#809 Jun 12, 2007) - Bug/feature request - Contact Administrators