PHP5: Articles, News, Tutorials, Interviews, Software and more
Featured Article:
Learning PHP Data Objects
Sun, 28 Nov 2021
 Home   About   Contribute   Contact Us   Polls 
Top Tags
ajax article codeigniter conference dom namespace news onphp5 oop php5 poll prado security solar sqlite symfony unicode zend core zend framework zend platform
More tags »

Not logged in
Login | Register


Some SEO Tips You Would Not Like to Miss

« PHP5 More Secure than PHP4 Solar 0.26.0 Released »

By dennisp on Friday, 02 February 2007, 21:11
Published under: article   seo
Views: 26913, comments: 0

Of course, you all are experts in search engine optimization. But there are some points that even experienced webmasters miss, and they can help improve your search engine rankings

Search Engine Optimization is often a trial and error affair. You do your best to get quality incoming links, properly interlink your site, remove extra keywords so that you don't look like a spam site just to see that in a month you have much worse SE rankings. Here we will deal with some common errors that webmasters often are not aware of, but correcting these cannot hurt your search engine placement. These errors are overlooked duplicate content on your own site.

You kidding! I don't have duplicate content!

Of course you as the webmaster know what content you publish, so you are pretty sure than there are no two identical articles or pages on your site. However, you might not even realize that search engines have two mirrors of your site! A quick test will answer your question: if your domain name is www.example.com, type example.com in your browser. If you can see your site, then you can be sure your site has two copies in search engine memories. These two domains look like two separate sites (indeed, you can configure your web server to have two different virtual hosts under each of these host names, or even point the www. subdomain to another IP).

So, you must assure that one of these domains redirects to another. It must be noticed here that this must be an external redirect (ie, when you go to example.com, the server must reply with HTTP/1.1 301 Moved Permanently status code). You can achieve this in two ways:

  • using your webserver's URL rewriting feature:
    assuming you use Apache, you can achieve this by placing the following directive into .htaccess or server's configuration file:

    RewriteEngine on
    RewriteCond %{http_host} ^example\.com [NC]
    RewriteRule (.*) http://www.example.com/$1 [L,R=301]

    These three lines will tell Apache to enable URL rewriting for the virtual host or your site's document root directory, then to see if the Host header matches example.com (ignoring case), and, if it does, to issue an external redirect to www.example.com and skip possible following rewrite rules.

  • If, however, you don't have access to the server configuration files or .htaccess or URL rewriting are disabled by your hosting provider, you can simulate this behavior with PHP:

    if(strtolower($_SERVER['HTTP_HOST']) == 'example.com') {
    header("HTTP/1.1 301 Moved Permanently");
    header("Location: http://www.example.com$_SERVER[REQUEST_URI]");

    Of course, these lines must occur before any output is sent to the browser, and for this protection to be effective, this code block must be executed for every single page you generate (please note the PHP method won't work for static content). So you should place this into the beginning of any common include file you might have, or paste it into every PHP file if you don't have such common include or don't use the front controller pattern.

Please note that these two methods will also redirect any GET (and HEAD) requests, not just the requests for the home page. Indeed, as you can see, the redirection accounts for any request URI you type, so going to example.com/mypage.php will ultimately redirect to www.example.com/mypage.php. On the other hand, POST requests will not get properly redirected as their data will be lost on such redirection. So you should take care that your POST forms are either redirecting to correct domain or they just use relative paths (without the http://example.com part).

Duplicate home page

Just like the mirrored whole site, your home page in most cases can be accessed either http://www.example.com or http://www.example.com/index.php. This can be also solved either with the help of URL rewriting performed by the server or within PHP:

RewriteCond %{request_uri} ^/$
RewriteCond %{request_method} GET [NC]
RewriteRule ^/$ /index.php [L,R=301]

Here we check whether the request URI is a single slash (when user just types the domain name), and if the request method is GET (in case you use front controller pattern, your POST forms' action attributes may be set to /), and if both conditions hold, we again issue the external redirect. The equivalent PHP code would look:

header("HTTP/1.1 301 Moved Permanently");
header("Location: http://www.example.com/index.php");

Of course, just like in the previous example, this code must be executed before any output. You can put it into your common include as well, or just leave it in the index.php file. In case when your home page is static HTML, then URL rewriting is your only option.

Other possible dupe pages

Unfortunately, not only your home page is vulnerable to duplication. Many sites that present dynamically generated content (the best examples can be product catalogs or script directories), usually paginate it. Often the first page of the directory can be invoked either as /list.php or /list.php?page=1. These will be considered dupes. Also some sites don't react properly when a non-existing page number is requested - they may show the last page. While search engines normally will not link to such pages (since your site probably won't have links to them), there still may exist such links on other sites (consider your competitors!), with the sole intent to worsen your SE rankings. So in your scripts you should check for such situations and simply respond with HTTP/1.1 404 Not Found.

While the scenario just depicted is not very harmful (it affects only a very few number of pages on your site), it brings us to the conclusion that virtually any page on your site can be intentionally duplicated. Indeed, consider the following URLs: www.example.com/index.php and www.example.com/index.php?x=abc. If your home page does not use the x parameter to render different content, these pages will be considered dupes. To make the things worse, any page on your site can be a source of duplicate content. If an attacker crawls your site to get the list of all valid URLs, he can then link to every of them multiple times (using different URL parameters for a single valid URL), you site can seriously get penalized by seacrh engines. Often the sites themselves can multiply dupe pages when constructing pagination links with say the following code:

for($i 1$i <= $numPages$i++) {
$links[$i] = preg_replace('~(\?|&)page=\d+~''$1page=' $i$_SERVER['REQUEST_URI']);

Now, when the attacker links just to say /list.php?page=1&junk=spam, your own site will duplicate all the pages in the example directory.

While the depicted situation is really rare to occur, it can seriously affect your SE rankings if this technique is consciously used against your site. To fight this possible situation, a following scheme can be recommended:

  • your site must only use rewritten URLs

  • the rewrite rules must be strict enough so that they won't allow any invalid URL by redirecting them to a 404 page

  • all the links that are generated by your site, must be passed through special function that will recognize all valid URLs and rewrite them accordingly (so that the server rewrite rules can be applied to them)

  • all pages that render their content based on URL parameters must check their values (so that they are within permitted bounds; for our paginated directory example they must be between 1 and number of pages; your URL rewriting function should not generate links like /list.php?page=1, just /list.php for page #1)

Related articles

Advocating Namespaces
Exceptions in __autoload()
Issues with Non-ASCII Chars in URLs
Learning PHP Data Objects
SimpleXML, DOM and Encodings
Clickable, Obfuscated Email Addresses
i18n with PHP5: Pitfalls
Sorting Non-English Strings with MySQL and PHP (Part 1)

Post your comment

Your name:


Protection code:

Note: Comments to this article are premoderated. They won't be immediately published.
Only comments that are related to this article will be published.

© 2021 onPHP5.com