How robots.txt can define your domain to search engines?
robots.txt is a casual file that is applied in the root/base directory of your web server. An illustration of placing is, http://www.example.com(Your domain)/robots.txt. This file is predominantly used to advert search engines & crawler robots to which part(sphere) of the website needs to be visited & indexed.
robots.txt can be placed only in the root/base directory & can be used once. That is, where your main/index page is.
When robots.txt is placed, it can be accessed by http://www.yourwebsite.com/robots.txt. It will not work if the robots.txt is placed as http://www.yourwebsite.com/sub-domain/robots.txt
Most of the search engine takes robots.txt into account. Obviously, spam botters(e-mail crawlers) cannot understand this. But it is advisable to place secured files in protected folders rather than trusting robots.txt to do the job completely. robots.txt is more about advising search engines rather than for security.
How does a robots.txt looks like?
# robots.txt created by http://www.pixelbytelab.com
User-agent: *
Disallow:
The above information tells the search engines to crawls & index all the directories and no need to omit any directories.
# robots.txt created by http://www.pixelbytelab.com
User-agent: *
Disallow: /
The above information tells the search engines to not to crawl & index any directory.
As it is already told robots.txt is very simple, a simple “/” could make search engines not crawl & certainly would affect the SEO rankings.
As per one’s needs, after uploading robots.txt in the root directoy do not forget to set permissions for the search engines to crawl.
If you have any issue with the timing in the web server, then relay timing can be set in order make each crawler to access the web directory with a delay.
Most commonly known search engines bots are :
Googlebot, MSNBot (Bing), Yahoo slurp, Ask Teoma, Gigabot, Scrubby, Robozilla, Twiceler
The below example is a combination of allowing & disallowing bots,
# robots.txt created by http://www.pixelbytelab.com
User-agent: Googlebot (Allows google)
Disallow:
User-agent: MSNBot (Allows Bing)
Disallow:
User-agent: Slurp (Allows Yahoo)
Disallow:
User-agent: Teoma (DisAllow ASK)
Disallow: /
User-agent: Gigabot (DisAllow gigablast)
Disallow: /
User-agent: Scrubby (DisAllow scrub the web)
Disallow: /
User-agent: Robozilla (DisAllow Dmoz)
Disallow: /
Crawl-delay: 60 (This is the timer used for delay)
Disallow: /log/ (directory to disallow)