News, learn, share and discuss about Africa & other life issues with over 250, 000 members worldwide & thousands of discussion going on. CLICK HERE TO JOIN FREE and get access to write, reply, use private message & much more free!. CLICK HERE TO SAY HELLO
AfricaTopForum
May 27, 2012, 10:12:18 AM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
 
   Home   Help Rules Search Login Register  
Pages: [1]   Go Down
  Send this topic  |  Print  
Author Topic: How to keep robots out of your web site  (Read 67 times)
0 Members and 1 Guest are viewing this topic.
Perfect
Administrator
*****
Online Online

Gender: Male
Posts: 6035



Activity
7%



« on: November 29, 2011, 02:08:11 AM »
ReplyReply


The robots.txt file

You know that search engines have been created to help people find information quickly on the Internet and search engines gain much of their information by robots (also known as spiders or crawlers), looking for the pages web for them.

Spiders or robots crawlers scan the Web for recording and all kinds of information. Usually start with URL submitted by users, or the links on the website, the map files or the top level of a site.

Once the robot accesses the home page recursively accesses all pages linked from that page. But the robot can also see all the pages found on a particular server.

After the robot finds a website that does the indexing of titles, keywords, text, etc, but sometimes you may want to prevent search engines from indexing some of its websites, including press releases, and specially marked Web pages (in the example: affiliate pages), but if the individual robots pursuant to these agreements is pure volunteerism.

The robot exclusion PROTOCOL

So if you want robots to prevent the entry of some of its web pages, you can ask robots to ignore web pages that you do not want indexed, and for that you can place a robots.txt file in the root of your local server website.

In the example you have a directory called e-books and wants to make robots to keep out of it, the robots.txt file should say:

User-agent: * Disallow: e-books /

When you do not have enough control over the server to create a robots.txt file, you can try adding a META tag to the head section of any HTML document.

In the example, a label like this tells robots not to index and follow links on a page in particular:

meta name = "ROBOTS" content = "noindex, nofollow"

Support for the robots meta tag is not as frequent as the Robots Exclusion Protocol, but most major web indexes now support it.

NEWS OFFERS

To keep the search engines your news announcements, you can create an "X-No-Archive" in the line of its publications' headings:

X-no-archive: yes

However, although common news clients allow you to add a line X-no-archive to the headers of your news announcements, some of them do not allow.

The problem is that most search engines assume that all the information they find is public unless otherwise noted.

So be careful, because although the robot and file exclusion rules can help keep the material out of the major search engines there are others who do not respect these rules.

If you are very concerned about the privacy of your e-mail and Usenet messages, you must use some anonymous remailers and PGP. You can read about it here:

http://www.io.com/ http://www.well.com/user/abacard/remail.html ~ combs / htmls / crypto.html
http://world.std.com/ ~ franl / pgp /

Even if they are not particularly concerned about privacy, remember that anything you write will be indexed and archived somewhere in eternity, in order to use the robots.txt file as much as you need it.
Logged
AfricaTopForum
   

 Logged
Pages: [1]   Go Up
  Send this topic  |  Print  
 
Jump to:  


Related Topics
Subject Started by Replies Views Last post
How Do I Choose the Right Site For Me? « 1 2 »
LOVE, RELATIONSHIP & ROMANCE CHAT ROOM
Perfect 17 1631 Last post December 10, 2011, 02:23:57 PM
by Webmaster
Gambling Site 101
GAMES DISCUSSION BOARD
Perfect 0 256 Last post October 15, 2009, 09:35:05 AM
by Perfect
I like this site very much
LEAVE A BRIEF GREETING HERE
Bdsmfreaks 3 402 Last post October 21, 2009, 01:36:31 PM
by Webmaster
My new site
WEBDESIGN and GRAPHICS DISCUSSION BOARD
Allenlysarl 1 238 Last post February 10, 2011, 01:15:11 PM
by Perfect
A Good Web Site Is a Web Site That Works!
WEBDESIGN and GRAPHICS DISCUSSION BOARD
Perfect 0 247 Last post April 12, 2011, 03:13:15 AM
by Perfect

If you require any help or if you have any questions, challenges, comments, suggestions or criticism please don’t hesitate Click here to write,
if it is sensitive send Personal Message to Global Captain or Admin. We love to hear from members and general public.

Contact |African Discussion Forum | Powered by SMF | SMF © 2006-2011, Simple Machines