Login   Register  
PHP Classes
elePHPant
Icontem

8 defensive programming best practices to prevent breaking your sites - PHP Classes blog

Recommend this page to a friend!
Stumble It! Stumble It! Bookmark in del.icio.us Bookmark in del.icio.us
  Blog PHP Classes blog   RSS 1.0 feed RSS 2.0 feed   Blog 8 defensive programmi...   Post a comment Post a comment   See comments See comments (11)   Trackbacks (15)  
<< Previous: Searchonomy, folksono...>> Next: More defensive progra...

Author: Manuel Lemos

Posted on:

Categories: PHP Tutorials, PHP Performance, PHP Security

This article describes software development practices that have been used to prevent problems that can break Web sites.

This message also explains recent changes that were made to the site newsletter user options to reduce the site bandwidth usage to keep the hosting costs on budget.




Contents:

* Newsletter delivery option changes
* Distributed newsletter system
* If anything can go wrong, it will
* Using defensive programming to avoid things that may go wrong
- 1. Handle unexpected conditions
- 2. Process external systems data properly
- 3. Test your code
- 4. Monitor your site errors and act upon them
- 5. Do not disclose errors to the users
- 6. Damage control
- 7. Backup
- 8. Do what you can as you can never get defensive enough
* Recommended reading
* Share your defensive practices


* Newsletter delivery option changes

Last week there was a problem in the PHPClasses site newsletter delivery system that made it stop for several days. However, thanks to defensive programming techniques, no harm was done. It only caused a delay in the delivery of the newsletters that were left in the queue for several days.

Almost at the same time I got a warning from the hosting people that the site is taking over 350GB of bandwidth per month. Therefore, I decided to make a few changes to keep the site hosting expenses on budget.

The site sends about 5 million newsletter messages every month. 4 million of those messages are the new class daily digest alert messages. So, I have disabled the option to get such messages for all users that had it set.

From now on, the new class alert messages will only be sent to the users that go on the user options page and re-enable the option labeled "New class alerts".

http://www.phpclasses.org/user_options.html

This option is only being reset for users that had it set to receive the digest version of the new class alert messages. Users receiving the individual version are not affected.

If you prefer to get a weekly digest of all published packages, just get the main site newsletter by setting the option labeled "Site content newsletters".


* Distributed newsletter system

But back to the original subject, defensive programming, let me tell you a bit more about the PHPClasses site newsletter system.

Today the whole site runs on a single dedicated server. This includes the Apache Web server, a MySQL database server and qmail e-mail server.

In the past the site used to run on a shared server. Once the site reached a significant number of subscribers, it became inviable to send so many newsletter from a shared server.

I would need to upgrade to an hosting plan that would allow greater bandwidth usage. That was not financially viable as the site did not have advertising or any other form of generating revenue then.

The solution that I found was to off-load the newsletter delivery by delivering the newsletters using a separate machine. Initially I used my desktop machine at home from my dial-up connection.

The newsletters are generated automatically by scripts running on the main site. Since the messages would be delivered by different machine, I had to implement a solution for queuing and distributing the newsletter delivery jobs to be run by my home desktop machine.

I implemented an interesting solution that used POP3 mailboxes as queues. The main site would send a message to a private mailbox with the newsletter content and the list of subscribers e-mail addresses. Then the home desktop machine would poll regularly that mailbox, using POP3 to retrieve the newsletter body and the recipients list.

It is a solution that is a bit odd, but it worked well for a while. Since the volume of newsletters kept increasing, this was not really acceptable for my home connection ISP, despite I already had a domestic ADSL connection by then.

I have asked the site users for volunteers willing to provide a spare server to deliver the newsletter. Soon, I got an offer from Larry Rosenman that kindly offered an account in a machine that could be used for the newsletter delivery.

http://www.phpclasses.org/blog/post/3-Mirroring-the-site.htm ...

Later, the site hosting was moved to a better host. It could be used to handle the newsletter distribution from the same server on which the site was running.

http://www.phpclasses.org/blog/post/4-New-host-for-PHP-Class ...

Despite there was no longer the need to distribute the newsletter delivery process among different machines, the POP3 mailbox based queuing is still in use until today.


* If anything can go wrong, it will

One problem of this sort of distribution system, is that POP3 mailboxes are associated to public mail addresses. If somebody sends a message to that mailbox, it could cause undesired effects.

Since the mailbox address was never published, it is very unlikely that anybody will send messages to that address. However, one day, when I was testing the delivery system in my development machine, it sent a message to the newsletter queue mailbox. That made a several users receive unintended newsletters.

Fortunately, I could stop the delivery of those newsletters soon enough to prevent annoying a great part of the users. Still some users got upset and rightfully complained .

This reinforces the the belief that the Murphy's Law is right:

http://www.murphys-laws.com/murphy/murphy-laws.html


* Using defensive programming to avoid things that may go wrong

After the incident above, the robustness of the newsletter system was improved. I made the system ignore messages that do not contain special authentication information. This prevents arbitrary messages to be processed as if they were real newsletters.

This practice of developing software systems that gracefully handle unexpected situations is called defensive programming.

Applying defensive practices to your programming can be very simple but you have to apply these practices systematically. To know if you are already applying defensive practices you should ask yourself several important questions.


- 1. Handle unexpected conditions

Are you handling all the possible conditions under which your programs will run?

For instance, do you always have a "default" case in you "switch" statements?

switch($some_value)
{
case 1:
$another_value = 1;
break;
case 2:
$another_value = 4;
break;
}
return(1 / $another_value);

What if $some_value is not 1 nor 2?

Notice: Undefined variable: another_value

Warning: Division by zero

What about "if" conditions? Do you have an "else" code section to all important "if" statements?

If your program is not expecting certain conditions but those conditions are not impossible to occur, having simple calls to error_log may help you to be aware of the problems under unexpected situations.

switch($some_value)
{
case 1:
$another_value = 1;
break;
case 2:
$another_value = 4;
break;
default:
error_log('unexpected some_value '.$some_value.' found ');
exit;
}

See below about monitoring errors.


- 2. Process external systems data properly

Are you processing data from external systems with proper care?

External systems, often called actors, include everything that interacts with your software. It may be an user, or a remote server, or even a database with information that was not produced by your application.

Most application security problems arise from missing or inappropriate handling of data obtained from external systems.

Not every problem caused by the lack of defensive practices lead to security bugs. However, eventual security bugs are more problematic because they may lead to system abuses by people aware of the security holes.

There are two main defensive practices that are widely recommended:

a) Validate your input

What do you do with the data entered by the users? What about data of files or data retrieved from remote sites? Are you verifying whether it comes in the format that your application expects?

I recommend to always validate external data in any case. This means that you should always check if data comes in the expected format, and do not proceed in case the data is not valid.

In the PHPClasses site and other Web projects I use this popular forms generation and validation class. It can be used to perform most common types of validation checks.

http://www.phpclasses.org/formsgeneration

Besides that, it can also discard invalid values and restore safe defaults when validation conditions are not met. This is an important detail, for instance to minimize eventual damage that could be caused, not by real users submitting invalid form values, but rather by robots spoofing data to be submitted via hidden inputs.

Another interesting feature implemented recently by a plug-in named secure_submit is meant to avoid CSRF (Cross-Site Request Forgery) attacks.

I am not going into much detail about these attacks, but they can be used to make your browser access sites and perform actions in your behalf even when you do not want to execute such actions.

This kind of attacks was used in the past to forge votes of users in certain Digg posts and even adding unwanted items to carts in shopping sites.


b) Encode your output

Are you properly encoding all your data when you serve HTML pages?

Some text characters need to be properly encoded when they are displayed in HTML pages. If you neglect that such characters may appear in the data that you want to display, you may be risking to introduce cross-site scripting security holes.

I have already talked about these security problems and how to prevent them in a past post:

http://www.phpclasses.org/blog/post/55-Improved-browsing-and ...

In many cases the solution may be as simple as using the PHP function HtmlSpecialChars.

http://www.php.net/htmlspecialchars


Are you using the same encoding of data taken from a database to be displayed in your site Web pages?

Nowadays, most browsers and databases support Unicode encodings such as UTF-8. If you take data from a database encoding in UTF-8, it must be displayed in your Web pages as UTF-8 or be converted to whatever encoding is used in the Web page.


Are you escaping literal text values when you execute database queries?

Text literal values used in SQL queries are usually delimited by single quote ' characters. Some less experienced developers just take whatever text they want to use in a query and add single quotes before and after the text.

If the text also has single quotes it may either fail in error or execute a query that is different than what is intended. This is often exploited by crackers to perform SQL injection attacks.

The solution is to either use text escaping functions or prepared queries. The PHPClasses uses the Metabase database abstraction package. It provides a database independent API to encode text literal values and also supports prepared queries. If you use a different database API, you should also look for how to use equivalent features.

http://www.phpclasses.org/metabase


- 3. Test your code

Regression tests are great. You just build scripts that execute your application components and then you verify whether the results are what is expected.

Usually you run regression tests before you install or update your application in production. If you changed something that breaks your application behavior, you will be able to fix the application (or the tests) before the eventual bugs that were introduced cause major problems to your site.

But I have to be honest with you. Regression tests are boring and expensive to produce. Often you do not have the time and the patience to write good regression tests.

There are some tools to make that task easier, but it is still boring and expensive to produce tests that cover all situations that your applications have to handle.

If you like to build test scripts and whoever pays your salary can afford the additional time that producing test scripts will take you, congratulations! Otherwise, I will not blame you for not bothering with that.

I have written many unit test scripts but those were mostly to test base components, like the database abstraction package that I use, or the e-mail composing components, forms generation, etc...

I see a lot of preaching towards test driven development. But often I also notice that many regression test implementations are mostly for base components and were only produced after the fact, i.e. after someone reported a bug that probably already caused trouble to a site. That is better than nothing, but is not by far exactly what the test driven development theory preaches.

Alternatively, you can publish your application or its components as Open Source and let the users help you with the testing. If there are any problems, chances are that users will report the problems to you.

That is one of the reasons why I created the PHPClasses site: have my PHP components be tested by as many people as possible.


- 4. Monitor your site errors and act upon them

No matter how much you try to prevent possible errors, you should always be prepared to handle them.

You should monitor your site and your server to check its health. If possible, act pro-actively.

In the PHPClasses site there are several scripts that run periodically from the cron program that check things like the available disk space.

Exhausting the disk space is not a normal thing to happen with this site. So I do not wait till the disk is full to do something about it. Whenever the disk space is below a certain threshold, the script sends me an e-mail so I can promptly check what is going on.

When the site is executing CPU intensive tasks, like delivering newsletters, I use a small class, yet unreleased, that monitors the CPU usage.

When the CPU usage reaches a very high value, the script that is running is forced to rest for a while. I used the PHP sleep function from within the script. Later it checks whether the CPU usage is below a threshold before resuming.

This prevents making the site too slow to for the users that are browsing the site, while heavy tasks without priority are being run in the background.

These cases above are well anticipated situations. When unexpected situations happen, they must be detected and notified so I can do something about them.

What I do is to enable PHP error log setting the options in php.ini like these:

error_reporting = E_ALL
display_errors = Off
display_startup_errors = Off
log_errors = On
log_errors_max_len = 0
ignore_repeated_errors = On
ignore_repeated_source = Off
report_memleaks = On
track_errors = On
html_errors = Off
error_log = /path/to/php_error_log

Then I use a small class named Log Watcher to keep monitoring the PHP error log file.

http://www.phpclasses.org/logwatcher

This class composes and sends a message to me with the latest lines added to the PHP error log.

This way I can act promptly whenever an error occurs. I lost count of how many times this simple class save me from major trouble.


- 5. Do not disclose errors to the users

If a task is executed by a script that serves a page to the users unexpectedly fails in error, just present an user-friendly message like this: "Sorry, for the time being the site is not available. The site administration is already aware of the problem. Please come back later."

Do not disclose any details of the error to the user, nor tell him to contact you. It will only make the user panic and the situation will not be solved telling the user to send you panic messages.

If you need to be notified, make your error handling code send you a message with the error details.

I have seen many sites that display ridiculous stack traces with all the names of functions, classes, parameters that have been called wherever the error occurred.

It is OK to dump that information to the page if you are running the site in your development environment. Do not do that in the production environment.

If you disclose too much sensitive error information in your site pages, that information may be used by malicious users to abuse of your site.


- 6. Damage control

No matter how much care you take, bad things may still happen. Since you do not know yet what may go wrong, at least you should be concerned about minimizing losses.

For instance, sites are always subject of Denial Of Service attacks by flooding your server with excessive requests. In some cases you can take preventive measures.

That happened some time ago to the PHPClasses site. I noticed there were too many users, that love the site so much, that they wanted to mirror it in their own computers.

Unfortunately that is not viable because it takes too much bandwidth and slows down the server for every user accessing the site at the same time. Therefore I had to take some precautions to prevent excessive downloads. I also asked the users to just download what they need, as I explained in this past post.

http://www.phpclasses.org/blog/post/43-Site-growing-pains.ht ...

In other cases it may not be possible to do much. If the site takes too many Web requests, one solution is to refuse connections from the machines that performing too much requests. I have used Apache mod_throttle in the past, but it was not quite stable.

Still I need to log in the server machine and take some actions. The problem is if the site takes too many requests, it may exhaust the server memory with excessive database connections, until it becomes unresponsive.


To prevent that problem, I had to configure Apache to not accept requests above a limit according to the available memory with a directive like this:

Maxclients 200

In this other article I explain a bit more about this and other directives:

http://www.meta-language.net/metabase-faq.html#7.2.2

If the site remains unresponsive for too long, there is also a script that restarts the Web server automatically. It is not an ideal solution but at least the site will not remain stuck for too long, especially when I am away and I cannot do anything about it.


- 7. Backup

Data loss may be one of the bad things that may happen. It may be caused by a buggy application, erroneous database schema upgrade, damaged disks, or even invasion of your servers by crackers. In any case it is always good to have a fresh backup at hand, so you can minimize the loss.

If you use MySQL, using the mysqldump program or a similar script at least once a day is better than not doing any backup at all. If possible transfer the backup files to several other machines, eventually in a different data center where the site server is running.

But I have an additional tip. Sometime ago I read an article about a security company that had backup tapes of credit card databases that were robbed during transportation in a security vehicle to different building. It seemed to me like some insider knew about what the vehicle was transporting and arranged the robbery.

All the hassles could have been avoided if the backup data was encrypted. That is what my backups scripts do. The important detail is that the backups are generated using GnuPG/PGP.

The data is encrypted with the public key of a special recipient. So, if the backup data is stolen, it cannot be recovered by a person that does not have the the private key of the recipient. Obviously the private key file is not in the server where the backup is taken.


- 8. Do what you can as you can never get defensive enough

Despite all my efforts to develop and run a site that does not give me any troubles, sometimes I have to deal with eventual problems because I have not used practices that are defensive enough.

The problem I mentioned in the beginning of this article, was that the newsletter service got jammed by a spam message . I suppose that the spammer guessed the address of the that the newsletter service queue mailbox.

The newsletter service skipped that message. However, it also prevented to process all the subsequent newsletters that were queued after that. The result was over 100 newsletters and alert messages that were left pending for 5 days.

Nothing was lost, but I only realized what was going on when some users that missed their newsletters wrote me asking if there was a problem with their accounts.

All the newsletters were processed in the weekend. I would like to apologize to all users that missed their newsletters and only got them all at once several days after.

Meanwhile the system was improved. Although the newsletter system jamming is no longer possible, the site will send me alert messages when there are more than 10 messages in the queue to be processed. The system may still get stuck for instance due to not enough disk space.

The bottom line is: relax, do what you can, and handle new problems later as they happen.


* Recommended reading

I think I have covered most of the defensive practices that I apply. Still I would like to point you to a couple of pages where you can learn more about this subject. One is the Wikipedia page about this matter:

http://en.wikipedia.org/wiki/Defensive_programming

The other is a chapter of the Getting Real book written by the fine folks of 37 Signals. They prefer developing in Ruby. I prefer PHP. That does not mean we cannot agree on important matters.

http://gettingreal.37signals.com/ch09_Get_Defensive.php


* Share your defensive practices

In this article I shared a lot of what I know about defensive programming practices developing Web sites. I am sure there are many situations that the sites I worked are not yet ready to handle.

I think that many of you that have read thing long article up to this point, have understood the defensive spirit, and also have your own bag of tricks.

If you are one of those people, it would be great if you could share your tactics, so we could all learn from you. Feel free to post a comment in this article forum.

Manuel Lemos

You need to be a registered user or login to post a comment

1,296,296 PHP developers registered to the PHP Classes site.
Be One of Us!

Login Immediately with your account on:

Facebook ConnectGmail or other Google Account
Hotmail or Microsoft Windows LiveStackOverflow
GitHubYahoo


Comments:

8. cpu usage class - VAugustin (2007-10-12 00:48)
where can i find it?... - 1 reply
Read the whole comment and replies

7. page head - coolfog (2007-05-29 15:32)
can someone build a shell for this?... - 0 replies
Read the whole comment and replies

6. Nice Post - David Noel (2007-05-03 00:01)
In response to the error logging.... - 0 replies
Read the whole comment and replies

5. Defensive programing - Gerry Danen (2007-04-26 06:17)
Additional tip... - 1 reply
Read the whole comment and replies

4. Great article - peili cui (2007-04-25 21:34)
This is a ecxcellent artical worthing read... - 1 reply
Read the whole comment and replies

3. Great Tips - m2guru (2007-04-25 16:54)
Here are a few more links I found helpful... - 0 replies
Read the whole comment and replies

2. Nice article - Vitor Almeida da Silva (2007-04-25 16:53)
Nice, really true... - 0 replies
Read the whole comment and replies

1. Thankyou - Paul Thomson (2007-04-25 16:53)
Thanks for such a great post... - 0 replies
Read the whole comment and replies


Trackbacks:

15. What is the best Linux distribution for PHP programming? (2011-07-06 19:51)
If you want to limit the memory used by PHP scripts and you are using Apache, make sure you set Apache MaxClients option to a reasonable number, as explained in this article I wrote:...

14. Defensive PHP (2010-04-12 13:22)
PHP can do cool things, even if it does suck. (Whitespace errors, anyone?)...

13. Defensive CSS coding (2007-09-12 15:59)
Ne-am distrat, am comentat, am baut bere … e cazul sa ne intoarcem cu picioarele pe pamant...

12. 8 Defensive Programming Best Practices To Prevent Breaking Your Sites (2007-06-04 00:53)
Manuel Lemos posted a very interesting article on defensive programming techniques...

11. 8 regole per gestire in sicurezza una web-application (2007-05-25 03:38)
8 semplici regole per evitare danni durante lo sviluppo e la manutenzione di una web-application (o anche una normale applicazione), riassumibili in una sola: “stai sulla difensiva”...

10. 8 defensive programming best practices to prevent breaking your sites « Veronica s Lore (2007-05-23 17:26)
This article describes some software development best practices that have been used to prevent problems that can break Web sites.

9. defensive mechanism 2 prevent breaking ur site (2007-05-22 03:53)
article describes some software development best practices that have been used to prevent problems that can break Web sites.

8. Best PHP tools of the month (2007-05-16 15:11)
It doesn't matter if you're a beginner or an advanced PHP programmer, if you're looking for something or just anything. This is a list to bookmark if you're into PHP. Frameworks, AJAX applications that speak using PHP, highlighters, parsers, video tutorials and many more in this month's "Best PHP tools of the month". [gal=http://www.codeigniter.com/][img]http://www.roscripts.com/uploads/galleries/php_march/codeigniter.gif[/img][/gal][galurl=http://www.codeigniter.com/]CodeIgniter

7. PHP: 8 práticas defensivas (2007-05-14 18:47)
Em artigo disponibilizado pelo PHPClasses.org, Manuel Lemos fornece algumas práticas para o desenvolvimento de software que devem ser usadas para previnir problemas que podem parar sites Web…

6. Defensive Programming Best Practices (in PHP) (2007-04-30 16:23)
Manuel Lemos of PHPClasses.org writes about his eight defensive programming best practices focusing on the context of PHP/web newsletter delivery system...

5. 8 Defensive Programming Practices - PHP (2007-04-30 08:46)
Manuel Lemos, the guru over at phpclasses.org has written an article on keeping your site from getting haxxored...

4. Savaitgalio skaitiniai #24 (2007-04-28 00:50)
-

3. PHPClasses.org: 8 defensive programming best practices to prevent breaking your sites (2007-04-26 13:14)
As anyone who’s been developing applications (web or otherwise) knows, there are certain things that you just don’t do when you’re doing things like adding features or changing the code of a production application...

2. Programming Tips to Avoid Breaking Your Site (2007-04-26 06:50)
-

1. 8 Defensive Programming Techniques (2007-04-26 06:25)
I came across this article on digg.com. Basically it provides 8 tips on how to prevent your website from breaking...


<< Previous: Searchonomy, folksono...>> Next: More defensive progra...

  Blog PHP Classes blog   RSS 1.0 feed RSS 2.0 feed   Blog 8 defensive programmi...   Post a comment Post a comment   See comments See comments (11)   Trackbacks (15)