Thursday, May 6, 2010

Access Websites As Google Bot

Access Websites As Google Bot: "

Google bot is the general term for Google’s automated web crawling service that is linked to the Google search engine. Google sends out requests to webpages that use a Google Bot user agent. This specific user agent is used for several purposes including identification and restrictions.

Webmasters can for instance filter out Google Bot from their website statistics to get a better picture of how many real users visit the site in a given time.

Some webmasters and services on the other hand try to cheat by allowing Google Bot access to all of their contents while they display a registration or buy page to users who want to access the same information.

That’s not allowed according to Google’s terms of use but some webmasters do it nevertheless.

Some users had now the idea to pose as Google Bot to access the information without buying or registering first.

Be The Bot is a website that simplifies the process. It contains a form where a web address can be entered. The user can also select to pose as Google Bot or Yahoo Bot. The requested url will then be displayed on the same screen.

bethebot google bot

Have you ever been googleing something, and you see exactly what you need in the preview, but when you click the link it doesnt show you what you want to see?
This is because the owners of the site are trying to trick you into buying something, or registering. It’s a common tactic on the internet. When Google visits the site, it gives something called a “Header”. This header tells the site who the visitor is. Google’s header is “Googlebot”. The programmers of the site check to see if the header says “Googlebot”, and if it does, it opens up all of its content for only googles eyes.

This works on all pages that allow Google Bot or Yahoo Bot complete access to their website but block visitors by asking them to register or buy first.

It works for instance on the Washington Post website which asks visitors to register before they can read the contents that are posted on the site. Copying the url from the website of the Post or opening washingtonpost.com in the url form at Be The Bot will provide unrestricted immediate access to the contents. (via Online Tech Tips)


© Martin for gHacks technology news, 2010. | Permalink | Add to del.icio.us, digg, facebook, reddit, twitter
Post tags: , , , ,

"

No comments:

Post a Comment