[Date Prev][Date Next][Thread Prev] [Thread Next][Date Index] [Thread Index]

Thoughts on SEO


  • From: Gunnar Hjalmarsson  
  • Date: Sun, 03 Jul 2005 06:53:02 +0200

Hi, all!

I have started to think about how Ringlink may help in promoting the
participating member sites with respect to search engine optimization
(SEO). This is an area I don't know much about, so I'm sharing these
thoughts primarily because I'd like input from others.

robots.txt files
----------------
The first observation I have made is that possible robots.txt files may
play a significant role: http://www.robotstxt.org/wc/norobots.html

Since Ringlink is a CGI script, the URLs to most installations include
'/cgi-bin/'. It seems to be common to prevent the robots via the
robots.txt file from spidering *everything* in the cgi-bin:

User-agent: *
Disallow: /cgi-bin/

That's what I (accidentally) have done myself up to now, and this is an
example of what it has resulted in:
http://www.google.com/search?q=site%3Aquotationring.net

I fear that all those URLs without titles or descriptions means that the
robots.txt file has not only prevented Google from publishing indexes of
those pages, but it may also have prevented them from spidering the
target pages. Not good, if that's the case. After all, Ringlink is a
redirect program.

I've now changed my robots.txt file so that it no longer disallows the
Ringlink program. Hopefully that will make a difference during the next
few months...

Redirection status code
-----------------------
Another observation I made relates to HTTP status codes. A client, e.g.
a browser or a spider, receives a bunch of HTTP headers along with the
actual content when it requests a resource on the web. Unlike the
content, which you can study by viewing the HTML source through your
browser, the HTTP headers are handled behind the scenes.

Status codes is one kind of HTTP headers:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

As a starting point for explaining my thoughts in this respect, let's
take a look at one of Pete's webrings. Unlike me, Pete hasn't played
with a robots.txt file without considering the consequences carefully
enough. ;-)

If you click http://www.google.com/search?q=site%3Awebring.cc, the first
URL with a title and description is
http://www.webring.cc/cgi-bin/ringlink/home.pl?ringid=oceanliner;siteid=maritime
The URL that you are redirected to if you click the link, i.e. the
actual URL of the "Ocean Liner Ringlink" homepage, is
http://www.webring.cc/ringlink/oceanliners.html

Now to the interesting part: If you click
http://www.google.com/search?q=site%3Awebring.cc%2Fringlink%2Foceanliners.html

you'll see that Google has not listed the actual homepage URL!

Why is that? My theory is that it's because if you request
http://www.webring.cc/cgi-bin/ringlink/home.pl?ringid=oceanliner;siteid=maritime

you receive HTTP status "302 Found". That code is intended for temporary
redirections, and according to the HTTP specification "the client SHOULD
continue to use the Request-URI for future requests".

It's not because the Ringlink program sends it you receive a 302 status
code. I believe the reason is that Apache uses that code by default.

Furthermore, I believe that if Ringlink would output a "301 Moved
Permanently" status code instead, Google (and other search engines)
would list the actual target URL instead of the redirecting Ringlink
URL. According to the HTTP specification, when a client receives the 301
code, "any future references to this resource SHOULD use one of the
returned URIs".

Somebody may now object and point out that nothing has been moved.
That's true. OTOH, we have reasons to believe that webrings and webring
programs was not a main concern when the HTTP specification was formulated.

What do others do? When clicking e.g. a 'next' link, WebRing.com outputs
the 302 status code and redirects to the HUB page! The actual
redirection seems to be handled by cookies, i.e. if you try to navigate
a WebRing.com ring with cookies disabled, you end up at the HUB page
whatever you do!! (I don't know why they do it that complicated way;
maybe it is in order to be absolutely sure that they don't help their
members to get listed with the search engines?)

RingSurf outputs the 302 status code as well. I wasn't able to evaluate
the effect of it because of their weird reordering of the ring sites all
the time...

Anyway, Ringlink represents a decentralised approach to webrings, and it
was designed to serve the ringmasters and the participating member
sites. I believe that the accurate thing to do is letting it output the
301 status code when redirecting, and I have preliminary made that
change in the development copy of Ringlink (which I'm using for
http://www.gunnar.cc/ringlink.html as well as "Ringlink Demo").

Measures to prevent redundant search engine listings
----------------------------------------------------
The page that comes up if you click for instance
http://www.ringlink.org/cgi-bin/demo/next.pl?ringid=demo;siteid=site15
is an error page. To prevent that such pages are spidered by the search
engines, I have changed Ringlink so that it's accompanied by the "404
Not Found" status code instead of "200 OK".

In my opinion, there is no point in having the pages for managing rings
and sites, e.g. http://www.gunnar.cc/cgi-bin/ringlink/siteadmin.pl,
spidered by the search engines. Accordingly, those pages now include the
HTML header '<meta name="robots" content="noindex,follow" />'.


Okay, so those were my SEO thoughts. Your comments would be very much
appreciated.

/ Gunnar


Follow-Ups from:
Pete

[Date Prev][Date Next][Thread Prev] [Thread Next][Date Index] [Thread Index]