Notice: Undefined index: HTTPS in /home/onphp5/public_html/index.php5 on line 66
onPHP5.com - Clickable, Obfuscated Email Addresses
 

onPHP5.com

PHP5: Articles, News, Tutorials, Interviews, Software and more
  
Featured Article:
Learning PHP Data Objects
 
 
Wed, 17 Dec 2014
 Home   About   Contribute   Contact Us   Polls 
Top Tags
ajax article book conference mysqli news onphp5 oop pdo php5 poll prado security seo solar symfony unicode zend core zend framework zend platform
More tags »

Not logged in
Login | Register

den_hotmail@fbzz

Clickable, Obfuscated Email Addresses

« Most Important Feature of PHP 5? PHP Conference UK 2007 Registration Open »

By dennisp on Friday, 19 January 2007, 18:21
Published under: article   security
Views: 88196, comments: 24

This short article will show you how to make email addresses clickable and working in your pages but still keep them obfuscated for spam bots


Of course you all know that keeping email addresses in plain form in web pages means give them for free to spam bots. Almost every website owner uses images to hide real (textual) representations of emails from spam bots - only human can read them from the image (well, a good spam bot that can OCR images will break them easily but this will require a good deal of time and processing power, so this security measure is almost bulletproof).

This approach, while strong, leaves your pages without very convenient functionality, though. Your visitors are no more able to click on the mailto: links and send emails from the comforts of their mail clients. Moreover, they are unable to copy and paste the address. And, visitors nowadays are demanding enough comfort while browsing so that they can ignore the address and save themselves from retyping it from screen. And now imagine how many potential clients are lost if your Contact Us page obfuscates your email address with an image?

This means that another method, non-image protection of email addresses must be used, which will be able to preserve the benefits of clickable email addresses that, however, are safe from spam bots.

Character substitution


One such approach, though not strong, is substituting the characters in the href attribute of the <a> tag with their ASCII codes with the help of the &#xx; notation. Simple spam bots won't be able to match the address since it will be totally obscure - no '@', no user name and no domain name, even no dots! However, more clever bots may be capable of converting the entities back to characters, especially in tag attributes, so this method is not so strong. But it should be noted that still this is much better than no protection at all.

So in your code you could use a function like this:

<?php

function hideEmail($email) {
  
$rv '';
  for(
$i 0$i strlen($email); $i++) {
    
$rv .= '&#' ord($email[$i]) . ';';
  }
  return 
$rv;
}

// Sample usage:

$email 'me@example.com';
$href hideEmail('mailto:' $email);
$email hideEmail($email);

echo 
"<a href=\"$href\">$email</a>
  
?>


This would produce the following result (all on the same line, of course):

<a href="&#109;&#97;&#105;&#108;&#116;&#111;&#58;&#109;&#101;&#64;&#101;&#120;
&#97;&#109;&#112;&#108;&#101;&#46;&#99;&#111;&#109;">&#109;&#101;&#64;&#101;
&#120;&#97;&#109;&#112;&#108;&#101;&#46;&#99;&#111;&#109;</a>


Please note that while email addresses contain English letters only (those with ASCII codes below 127 to be correct), this function will work for any encoding since in every encoding these English characters are represented by the same code.

Good old JavaScript


Besides serving our AJAX and other trendy client-side stuff, JavaScript can be very helpful in fulfilling our task. We can use document.write in place of the real email address to generate the whole <a> tag. We will create a more advanced function that will allow us to specify the attributes of the tag, as well as the subject line:

<?php
/**
 * Generate JS that will render obfuscated email address into the document
 * @param  string $to  the email address
 * @param  string $subject  the subject header
 * @param  array $attrs  optional array of additional attrs for the resulting <a> tag
 * @return  string  the resulting JavaScript
 */
function jsEmail($to$subject null$attrs = array()) {
  
// Split the email into user name and domain
  
list($u$d) = explode('@'$to);
  
  
// Form the href attribute
  
$href "mailto:$u@$d";
  if(
$subject) {
    
$href .= '?Subject=' rawurlencode($subject);
  }
  
// Now also split the href attribute
  // We split them so that they do not contain the email address in one string literal
  // - otherwise the bot will have no trouble finding the email address
  
list($hu$hd) = explode('@'$href);
  
  
// If we have more attributes, prepare JS for them
  
$attr '';
  foreach(
$attrs as $k=>$v) {
    
$v '"' $v '"';
    
$attr .= "document.write('$k=$v');\n";
  } 
  
  
// Generate return value
  
$rv = <<<EOT
  <script>
     document.write('<a href="$hu' + '@');
     document.write('$hd' + '"');
     $attr
     document.write('>$u' + '@');
     document.write('$d</a>');
  </script>
EOT;
  return 
$rv;
}

// Example of use

echo "Contact email: ";
echo 
jsEmail('me@example.com''Sales enquiry form submission'
  array(
'class' => 'link'));
?>


This would produce the following output:

Contact email: <script>
document.write('<a href="mailto:me' + '@');
document.write('example.com?Subject=Sales%20enquiry%20form%20submission' + '"');
document.write('class="link"');

document.write('>me' + '@');
document.write('example.com</a>');
</script>


As you can see, the function is quite useful - you can supply the subject line that will automatically appear in your email client, as well as additional attributes (like class attribute in the above example).

Combining the two


To further strengthen the protection, we can combine the two approaches - use JavaScript for link generating and converting all symbols to &#xx; notation. To achieve this, we will simply use the hideEmail() function from the first example, so that our jsEmail() function will look like:

<?php
/**
 * Generate JS that will render obfuscated email address into the document
 * @param  string $to  the email address
 * @param  string $subject  the subject header
 * @param  array $attrs  optional array of additional attrs for the resulting <a> tag
 * @return  string  the resulting JavaScript
 */
function jsEmail($to$subject null$attrs = array()) {
  
// Split the email into user name and domain
  
list($u$d) = explode('@'$to);
  
  
// Form the href attribute
  
$href "mailto:$u@$d";
  if(
$subject) {
    
$href .= '?Subject=' rawurlencode($subject);
  }
  
// Now also split the href attribute
  // We split them so that they do not contain the email address in one string literal
  // - otherwise the bot will have no trouble finding the email address
  
list($hu$hd) = explode('@'$href);
  
  
// Hide letters
  
$u hideEmail($u);
  
$d hideEmail($d);
  
$hu hideEmail($hu);
  
$hd hideEmail($hd);

  
// If we have attributes, prepare JS for them
  
$attr '';
  foreach(
$attrs as $k=>$v) {
    
$v '"' $v '"';
    
$attr .= "document.write('$k=$v');\n";
  } 
  
  
// Generate return value
  
$rv = <<<EOT
  <script>
     document.write('<a href="$hu' + '&#64;');
     document.write('$hd' + '"');
     $attr
     document.write('>$u' + '&#64;');
     document.write('$d</a>');
  </script>
EOT;
  return 
$rv;
}
?>


And the output might look like:

<script>
document.write('<a href="&#109;&#97;&#105;&#108;&#116;&#111;&#58;&#109;&#101;' + '&#64;');
document.write('&#101;&#120;&#97;&#109;&#112;&#108;&#101;&#46;&#99;&#111;&#109;
&#63;&#83;&#117;&#98;&#106;&#101;&#99;&#116;&#61;&#83;&#97;&#108;&#101;&#115;&#37;
&#50;&#48;&#101;&#110;&#113;&#117;&#105;&#114;&#121;&#37;&#50;&#48;&#102;&#111;
&#114;&#109;&#37;&#50;&#48;&#115;&#117;&#98;&#109;&#105;&#115;&#115;&#105;&#111;
&#110;' + '"');
document.write('class="link"');

document.write('>&#109;&#101;' + '&#64;');
document.write('&#101;&#120;&#97;&#109;&#112;&#108;&#101;&#46;&#99;
&#111;&#109;</a>');
</script>


Finishing up


It must be understood that described technique is not as secure as the image protection. Sooner or later some spam bot developer may modify it to parse the document.write calls and read &#xx; sequences. If the security is more important than usability, then you should stick to image protection. Also you may implement a neat feature: require the user to enter valid captcha somewhere to display textual email addresses, otherwise display images.

Related articles

Exceptions in __autoload()
Issues with Non-ASCII Chars in URLs
PHP5 More Secure than PHP4
Error On devzone.zend.com
Advocating Namespaces
Learning PHP Data Objects
Some SEO Tips You Would Not Like to Miss
i18n with PHP5: Pitfalls
SimpleXML, DOM and Encodings
Sorting Non-English Strings with MySQL and PHP (Part 1)

Comments

#1  By Ajay on Monday, 22 January 2007, 06:42
Good one


#2  By Anonymous on Thursday, 25 January 2007, 18:23
I've been using this forever and it's been incredibly effective. I figure it's only a matter of popularity before spammers start getting smarter.

I think the next step would be to ajaxify it in a way that forced the spammer to support xmlhttp requests.


#3  By Anonymous on Sunday, 28 January 2007, 09:53
IMHO document.write is wrong way. Use document.createElement :)


#4  By dennisp (editor) on Sunday, 28 January 2007, 10:43
In reply to #2:
Yes, this technique is vulnerable to spammers that will analyze how it works. Next step could be creating a string that contains junk letters before and after the user and domain name, then some trivial calculation (in JS code) of offsets so that they are "hidden" (JS block must be actually executed to get them). Also some random calculations not related to these offsets (like viruses randomly stick opcodes not related to their functionality into their bodies). This would really strengthen the security.

In reply to #3:
document.createElement() has the same vulnerabilities as document.write(), so that spammer can easily counterpart it.


#5  By Alex@Net on Tuesday, 30 January 2007, 10:02
I use just
<a href="#do_nothing" onclick="this.href='amliotw:leocema@elaxnttec.mo'.replace(/(.)(.)/g, '$2$1');">Email me</a>


Works very good.


#6  By Wilco on Tuesday, 06 February 2007, 07:46
I don't quite like either one of your solutions. Sure the ascii might fool a collector, but only with those who haven't figured out yet this is being done. Writing about it doesn't help that either.

As for javascript. You really wouldn't want people who have no javascript to be unable to contact you. This is especially poor since if people come accross problems because they have javascript disabled they can't even ask for help. This hardly seems as a solution.


#7  By dennisp (editor) on Tuesday, 06 February 2007, 08:08
In reply to #6:
Of course, disabled JavaScript will completely hide the email address. But nowadays people don't usually disable JS. On the other hand, you can disable images, too, so you can't be 100% sure with either method.


#8  By Robin Haswell on Tuesday, 06 February 2007, 09:27
I call bullshit on "Almost every website owner uses images to hide real (textual) representations of emails from spam bots". I have never, ever seen this technique in practise and I have seen tens of thousands of websites as part of my job.

Note to anyone who cares about web standards: <script> is not a valid child element of <body>.


#9  By dennisp (editor) on Tuesday, 06 February 2007, 11:10
In reply to #8:
Robin, what technique are you talking about? Images or the method described in this article?

Regarding validity - here it says:
The SCRIPT element places a script within a document. This element may appear any number of times in the HEAD or BODY of an HTML document.


#10  By Anonymous on Tuesday, 06 February 2007, 18:22
I have been using this technique for years.
See "How to Avoid Being Harvested by Spambots"
http://www.projecthoneypot.org/how_to_avoid_spambots_3.php


#11  By Anonymous on Wednesday, 07 February 2007, 10:21
> document.createElement() has the same vulnerabilities as document.write(), so that spammer can easily counterpart it.

Yeah but document.write went out with the 90s; DOM scripting is a lot more flexible.

> Note to anyone who cares about web standards: <script> is not a valid child element of <body>

W3C HTML 4.01 spec says: "The SCRIPT element places a script within a document. This element may appear any number of times in the HEAD or BODY of an HTML document."


#12  By Anonymous on Friday, 09 February 2007, 00:17
when a spambot scans a page it doesn't have a builtin browser the output of which it could scan for emails. it just gets the sources. document.write can mean anything. document.createElement gives away the trick.

as for whether talking about the technique causes spammers to know about the technique, we cannot forget some very important facts:

it's twice as easy to break a technique than to make it.
it's three times as easy to break a technique than to fix it.
when you hide the technique from the world you make it twice as hard for the cracker for your particular site.
when you hide the technique from the world you make it five times as hard to reproduce it for your particular site.
when you hide the technique from the world you make it seven times as hard to fix it for your particular site.

therefore always remember that when you hide information you make it twice as hard for the bad guys and five or seven times as hard for the good guys. the end result is that the bad guys get a huge lead.


#13  By Denver on Sunday, 25 February 2007, 01:01
If you encrypt the mailto: portion of the A HREF tag, the result is a bunch of broken links archived by the search engines, which generates a lot of Error 404 pages.


#14  By Anonymous on Sunday, 25 February 2007, 10:37
I don't put ANY email addresses in the sites I build -- haven't for years! I provide a contact us form with a drop down list of possible contacts. Now always protected by captcha. Even 'guest' addresses are handled the same way -- via a direct link to the form, with the correct recipient already selected. The data is always passed to another file for the actual mailing, via redirect with session checks, then finally on to a printable 'receipt' page. I use regular expression filters to block all the potential attacks I can think of.

Even the form used to write these comments is based on this principal... I'm not sending this in as an email via my mail server... and I don't feel uncomfortable.

Other than dealing with a few minor security upgrades over the years, this method has worked without a hitch. No one has ever complained that they felt deprived of "the comforts of their mail clients". In fact many have appreciated the anonymity. I suppose I could enable BBCode or an editor like FCK with a simplified 'safe' menu, if really necessary - to help make them feel real comfortable and all.

For situations where logged in members have a legitimate need to upload pictures, pdf files, etc. I include file browse/upload capability -- again heavily filtered -- and always checked for viruses.

But on the other hand, it might just be a lot easier for some folks to obfuscate and hope the bots don't get 'em. To each their own.

Now I sure wish I could enjoy those supposed comforts of my email client... mainly its been a dam chore, with all too frequent side orders of heaping headaches.


#15  By dennisp (editor) on Sunday, 25 February 2007, 18:46
In reply to #13:
It seems you missed the point a bit. There are no <a> tags in the source of the page. They are written with JS so search bots won't archive any broken links.


#16  By Kostyantyn Shakhov on Monday, 12 March 2007, 04:19
The following php5 class could be useful. It was written based on this article sample:

<?php
class email_protector {
    private 
$_email;
    private 
$_subject;
    private 
$_attributes;
    private 
$_tag//also can be used 'area' of an images map, for instance
    
private $_tag_type//can be either 'single' or 'paired'
    
    
public function __construct($email$tag 'a'$tag_type 'paired'$subject null$attributes = array()) {
        
$this->_email $email;
        
$this->_tag $tag;
        
$this->_tag_type $tag_type;
        
$this->_subject $subject;
        
$this->_attributes $attributes;
    }
    
    public function 
obfuscate() {
        
//split the email into a username and a domain
        
list($username$domain) = explode('@'$this->_email);
        
        
//form the href attribute
        
$href 'mailto:' $username '@' $domain;
          
        if(!
is_null($this->_subject)) {
            
$href .= '?Subject=' rawurlencode($this->_subject);
        }
        
        
//split the href attribute in order it doesn't contain the email address in one string literal
        
list($href_start$href_end) = explode('@'$href);
        
        
//hide letters
        
foreach(array('username''domain''href_start''href_end') as $v) {
            $
$v $this->_str_to_ascii($$v);
        }
        
        
//if we have attributes, prepare java script for them
        
$attributes_js '';
        if(!
is_null($this->_attributes)) {
            foreach(
$this->_attributes as $name => $value) {
                
$value '"' $value '"';
                
$attributes_js .= 'document.write(\'' $name '=' $value '\');' "\n";
            }
        }
        
        
//generate java script
        
$js "<script type=\"text/javascript\">
              document.write('<{$this->_tag} href=\"$href_start' + '&#64;');
              document.write('$href_end' + '\"');
              $attributes_js"
;
        if(
'single' == $this->_tag_type) {
            
$js .= "document.write(' />');</script>";
        } elseif (
'paired' == $this->_tag_type) {
            
$js .= "document.write('>$username' + '&#64;');document.write('$domain</{$this->_tag}>');</script>";
        }
        return 
$js;
    }
    
    public function 
__toString() {
        return 
$this->obfuscate();
    }
    
    private function 
_str_to_ascii($string) {
        
$ascii_string '';
        
        for(
$i 0$k strlen($string); $i $k; ++$i) {
            
$ascii_string .= '&#' ord($string{$i}) . ';';
        }
        return 
$ascii_string;
    }
}
?>


#17  By kshakhov at yahoo dot com on Tuesday, 13 March 2007, 12:08
There are two notices re your hideEmail() function:

1) it's better to use $email{$i} instead of $email[$i];
2) don't call strlen($email) in each loop circle. The following loop is much faster:
<?php
for($i 0$k strlen($email); $i $k; ++$i) {
  ...
}
?>


#18  By dennisp (editor) on Wednesday, 14 March 2007, 07:34
In reply to #17:
The $string{$i} syntax is deprecated, however the idea of precalculating the string length is really good!


#19  By kshakhov at yahoo dot com on Saturday, 17 March 2007, 00:19
They suggested to use {} in the Zend PHP 4 Certification Study Guide but to use strings as arrays in Zend PHP 5 Certification Study Guide. So, it seems you are right. Could you point out where it is said that this syntax is deprecated? I'm not completely sure yet. Thanks.


#20  By dennisp (editor) on Saturday, 17 March 2007, 13:03
In reply to #19:
Note: They may also be accessed using braces like $str{42} for the same purpose. However, using square array-brackets is preferred because the {braces} style is deprecated as of PHP 6.
Found here, just above example 11.5


#21  By kshakhov at yahoo dot com on Saturday, 17 March 2007, 18:00
You are completely right. Many thanks for the link.


#22  By John on Thursday, 05 April 2007, 05:50
As an addition, you may want to capture the whole script output (ie, the whole page) and then do preg_replace_callback() to substitute plain emails with the jsEmal() function - so doing you won't have to rewrite many pages if they have emails hardcoded.


#23  By Anonymous on Tuesday, 24 April 2007, 10:43
GOOD TUTORIALS


#24  By Letitia on Thursday, 13 October 2011, 05:50
That's not even 10 mnueits well spent!

Post your comment

Your name:

Comment:

Protection code:
 

Note: Comments to this article are premoderated. They won't be immediately published.
Only comments that are related to this article will be published.


© 2014 onPHP5.com