Archive for the ‘PHP’ Category

Announcement: php[tek] 2017 Conference Talks

Thursday, January 12th, 2017

I’m pleased to announce that in May, I will be giving two talks at the excellent php[tek] conference in Atlanta, GA. One will be a technical talk on computational algorithmic complexity. The other is a comparison of long-distance hiking and software development, which I developed over the course of my Appalachian Trail thru-hike, and which I’m particularly looking forward to giving.

I missed last year’s php[tek] because I was on the Appalachian Trail at the time. This was the first tek I’ve missed since 2010, so I’m happy that I get to go this year partially to talk about why I wasn’t there last year!

For more information on my Appalachian Trail thru-hike, please feel free to see my hiking blog, longstride.net.

PHP Misleading Error: Maximum execution time of 0 seconds exceeded

Friday, December 2nd, 2016

Yesterday, on freenode #phpc, someone posted this curious error message:

PHP Fatal error: Maximum execution time of 0 seconds exceeded

Jokes aside of how, only eight hours into December, they had exceeded their monthly allotment of PHP time, this is a rather curious error message to receive: PHP’s set_time_limit() function and associated max_execution_time ini setting define a limit of 0 to mean no limit. A script with no limit should not be hitting a maximum execution time of zero seconds!

While several of us tried brainstorming possible causes, additional details were provided, including that the script, running on PHP 7 on unix, under php-fpm, was twice terminated after running for roughly two hours.

I set out to look at PHP’s source code, certain that there would be something that would shed light on this mystery.

Let’s investigate PHP internals

The first thing I did was search for “Maximum execution time” in the PHP sources, hoping to find exactly one hit. Ignoring test cases, that string appears exactly once in the source, in the zend_timeout() function in zend_execute_API.c.

When zend_timeout() is called, PHP calls a function specified by the SAPI (the interface between PHP and the web server) if one is defined, and then emits an error and exits.

Knowing now where the error message is generated (and that it only comes from one place), I needed to determine what calls zend_timeout(), so I searched the code for that function’s name. Excluding the function’s own definition, and its declaration in a header file, there were four results. Two of those results were specific to Windows support, and could be ignored.

The other two results were conveniently located next to each other in zend_set_timeout(), both used as the callback function to a signal handler.

This means that, in normal use, the execution time limit message can only be generated in response to receiving a signal.

Signals

In UNIX-like systems, signals provide for a limited form of inter-process communication, in the form of sending and receiving an interrupt with a numeric identifier. The Wikipedia page on signals provides additional detail.

There C standard defines six signals, but many operating systems define and use many additional signals.

Some signals sent to a process cause its unconditional termination (e.g. SIGKILL) or have other operating-system level effects (such as SIGSTOP and SIGCONT). Most other signals can be caught by the target process if it has installed a signal handler to receive that notification. (If a signal handler is not installed, a default handler is used, which often, but not always, results in the self-termination of the process.)

How PHP’s execution time limit works

So, then, what signal triggers zend_timeout()? A few lines up indicates the answer: SIGPROF. (Alternatively, SIGALRM is used when running under Cygwin on Windows.)

SIGPROF was a new one to me. Wikipedia says:

The SIGALRM, SIGVTALRM and SIGPROF signal is sent to a process when the time limit specified in a call to a preceding alarm setting function (such as setitimer) elapses. SIGALRM is sent when real or clock time elapses. SIGVTALRM is sent when CPU time used by the process elapses. SIGPROF is sent when CPU time used by the process and by the system on behalf of the process elapses.

And indeed, there is a call to setitimer() before PHP installs the signal handler on SIGPROF. The man page for setitimer() describes how to use it to set a timer that counts time the process spends executing, triggering a SIGPROF after the timer elapses.

Some more searching through PHP’s code makes clear how the entire process works: when PHP is loaded, set_time_limit() is called, or max_execution_time is changed, PHP clears the timer (if set), and re-sets the timer (if the timeout is non-zero). Additionally, during request start-up, the signal handler on SIGPROF is installed (regardless of whether a timeout is actually set).

When the timer set by setitimer fires, the SIGPROF handler, zend_timeout() runs, displays the error message with the number of seconds filled in from the current configuration, and exits.

Reading between the lines

That answers the question of how the execution time limit error was displayed. But it doesn’t answer why: one would naturally assume that since the timer isn’t set, the operating system won’t send SIGPROF, and the script wouldn’t be terminated.

The answer lies in one further detail of the UNIX signal mechanism: any process can send any other process any signal (except where disallowed by security policy, such as for processes owned by a different user).

This means that a script doesn’t actually need to hit the time limit to be killed. PHP just needs to be told that it has.

You can test this yourself by doing something similar to this:

$ php -r 'posix_kill(getmypid(), SIGPROF);'

Fatal error: Maximum execution time of 0 seconds exceeded in Command line code on line 1

So, what really happened?

Unfortunately, a signal does not come with the identity of its source. Were that the case, it would have been easy to determine what was killing the process and what configuration to change to stop it. In this case, the resolution was to modify the php code to “only” take 20 minutes to run, so the timeout was no longer an issue.

If we make the assumption that there wasn’t a malicious user on the server sending unwanted signals as a denial of service attack, the documentation for max_execution_time hints at one possibility:

Your web server can have other timeout configurations that may also interrupt PHP execution. Apache has a Timeout directive and IIS has a CGI timeout function. Both default to 300 seconds. See your web server documentation for specific details.

With some further research, I found that neither Apache nor nginx explicitly send SIGPROF, though. And while nginx does use setitimer, its use triggers a SIGALRM.

My best guess is that there likely was some sort of watchdog process on the server that killed off the process running php after it consumed too many resources (either memory or time).

It's probably a bug (or at least, undesirable confusion) in PHP that a SIGPROF that doesn’t arise strictly from a timer expiration displays the same message as if the timer did expire, but this looks to be correctable.

20 Years of PHP

Monday, June 8th, 2015

Today is PHP’s 20th birthday. Ben Ramsey has called on us to blog about our history with PHP, so here’s mine.

Way back in 1999, still in college, I got my first real software development job at a small company in DC, the predecessor to my current employer. My job was to write an Apache log analyzer, because the software package we were using at the time was very slow and produced inconsistent results between runs.

So I wrote it in C++, because that’s what I knew, and what I was using for my personal projects. But, we were a web services company, so why shouldn’t our log analyzer be accessible via the web?

We were using a couple of different web languages at the time. Some of our early stuff was in PERL, which I had tried before and didn’t like. We also had a site using this awful language called SQLWEB. But, it was suggested to me that I write the web interface using this scripting language called PHP. I had never heard of it before, but I quickly learned it (because, frankly with PHP 3, there wasn’t much to learn), and quickly became enamored with this language.

Sure, it didn’t have many of the features we’ve come to take for granted in modern PHP, such as OOP or closures, or even the foreach keyword (hooray for PHP 4!). But its key feature was that it didn’t need to be compiled. Up until then, every program I’d ever written had a slow write-compile-debug cycle, because compiling a new build and relaunching the app to test was always slow. But here, with PHP, all I needed to do was change my code and refresh the browser window, and the changes were immediately visible. PHP may have been slower than C, but I was way more productive.

We no longer use that log analyzer, but PHP is the foundation for every website we currently manage, and is the vast majority of the code I’ve written over my professional career. And since then, the PHP community has become so much bigger, with several different application frameworks, thousands of open source libraries made easily available through Composer and Packagist, more conferences every year than one person could possibly attend, and a great community that I’m happy to be a part of.

Happy birthday, PHP! Here’s to another 20 years of powering the web.

BowerBundle Released

Monday, November 24th, 2014

Last night, I released BowerBundle, a super-simple bundle for Symfony that enables running Bower update/install automatically after running Composer update/install. I’ve used this for several different Symfony apps, and now it’s available for your use as well.

You can download it via GitHub, or via jbafford/bower-bundle in Composer.

Slides for “Writing OOP Modules for Drupal 7” Posted

Wednesday, November 19th, 2014

The slides for “Writing OOP Modules for Drupal 7”, my first talk at last week’s php[world], have been posted to Speaker Deck.

Slides for “Stupid PHP Tricks” Posted

Tuesday, November 18th, 2014

The slides for “Stupid PHP Tricks”, my second talk at last week’s php[world], have been posted to Speaker Deck.

Twitterslurp open source release

Tuesday, June 30th, 2009

Last month, I wrote about Twitterslurp, the twitter searching tool I developed at The Bivings Group, which displays a constantly-updating stream of tweets, as well as a leaderboard and stats graphs.

Today, we are very happy to release it as open source. You can download Twitterslurp from its Google Code project page at http://twitterslurp.googlecode.com/.

Since last month, I’ve made a lot of changes to improve the quality (and ease of configuration) of the Twitterslurp code. Twitterslurp’s error handling has been improved, and I added the ability to start and stop the tweet stream and show more than the most recent 20 tweets. Our graphics team also created a spiffy logo.

Yesterday and today, Twitterslurp has been driving a video wall of tweets at the Personal Democracy Forum conference in NYC. The conference, which just ended, had over 17,000 tweets in the last two days.

Previously, we ran test versions of Twitterslurp during mysqlconf and php|tek, and officially on behalf of the Dutch PHP Conference. Twitterslurp started as a project for a client to allow them to track tweets, and give members of their website rewards for tweeting with a particular hashtag.

We’ve also set up a copy of Twitterslurp tracking itself.

We’d love for you to check out Twitterslurp, and we’re open to any and all feedback.

Tracking php|tek Tweets With Twitterslurp

Wednesday, May 20th, 2009

For a client at work a few months ago, I created a Twitter search tool, Twitterslurp, that put all the tweets related to the client’s project on their webpage, updated in (close to) real-time via AJAX.

I’ve since added a lot of features, including a set of graphs, and we’ve set up a version of Twitterslurp for php|tek 2009.
(more…)

Pickens Plan Scaling Talk Slides Are Available

Friday, May 15th, 2009

On Wednesday night, I gave a talk at the DC PHP developer’s group regarding how The Bivings Group scaled the Pickens Plan website after it stopped working following a national advertising campaign after the first 2008 presidential debate drew thousands of people to the website.

The slides are now available for download.

Also, I want to thank everyone who showed up. It was great talking to all of you!

MySQL Performance Benefits of Storing Integer IP Addresses

Monday, March 9th, 2009

On ##php on freenode over the weekend, there was a brief discussion regarding the performance benefits of storing IP addresses in a database as an integer, rather than as a varchar. One commenter was making the argument that the numeric-to-string and string-to-numeric conversions cost involved in storing IPs as integers was more significant than the space savings.

While the space savings are easily apparent, and I’ve seen demonstrations of how much faster search operations are, I could not recall anyone ever doing an analysis of how long it takes to insert IPs as strings or integers into the database. Naturally, I had to determine an answer to this so I’d know for future reference.

My findings show that, beyond the disk space savings and the much faster index queries possible with integer-based IP addresses, even with the database doing the string-to-numeric conversion, it is 9-12% faster to store IP addresses into the database as integers, rather than strings.

Background

There are two ways you can store an IPv4 address in a database. In its native format, an IP address is a four-byte long integer, and can be represented as an int unsigned.

In its human-readable format, an IP address is a string, with a minimum length of 7 characters (0.0.0.0) and a maximum length of 15 (255.255.255.255). This gives it an average length (assuming uniform random distribution) of 13.28 characters. Accordingly, one would store it in a database field of type varchar(15). In order for the database to keep track of exactly how much data is in the column, an additional byte of data must be added to store the length of the string. This brings the actual data-storage costs of an IP represented as a string to an average of 14.28 bytes (assuming the characters can be represented by one byte per character, as with latin1 or utf8).

This means that storing an IP address as a string requires, on average, about 10 bytes of extra data. In an application that saves access logs, that 10 bytes of data will eventually add up, and that should be reason enough to store IPs as an integer, rather than a string.

Disk Space Is Cheap, So No Big Deal, Right?

There are other costs associated with having larger data fields. If the column is indexed, the index will be larger as well. Larger indexes tend to perform slower than smaller indexes. Additionally, while disk space is plentiful and cheap, RAM is considerably more limited, so more memory will be used to cache the data or indexes, potentially pushing other, more valuable content out of the cache.

Further, while disks have been gradually getting faster, it still takes a relatively long time to read data from (and even longer to write to) a disk, and CPUs have gotten faster much more quickly than disks. The more data that has to be moved around, the more time the CPU wastes moving that data instead of performing more interesting work.

So, What’s the Actual Cost?

I wrote a PHP script that generates and inserts 1,000,000 random IP addresses into four mysql tables and timed the results. All tests were done with PHP 5.3alpha2 and MySQL 5.0.67 on a 2.16 GHz MacBook Pro (Intel Core Duo), with MyISAM tables.

The test uses four tables as described below. Each table has three columns: an id int unsigned not null auto_increment primary key column, and ip and s as described below:

table ip s description
inetbench_long int unsigned not null char(1) not null fixed row length
inetbench_long_dynamic int unsigned not null varchar(1) not null dynamic row length
inetbench_varchar varchar(15) not null varchar(1) not null dynamic row length; IP stored as string
inetbench_varchar_utf8 varchar(15) not null varchar(1) not null same as inetbench_varchar, but charset = utf8

The PHP benchmark script generates 1 million random IP addresses, and then inserts that data into MySQL To make MySQL do as much work as possible, for the inetbench_long tables, the sql query string uses INET_ATON() to convert the IP to an integer, rather than attempting to do it in PHP.

The results:

table insert time avg row length data length index length total length
inetbench_long 132.35 sec 10 bytes 10,000,000 22,300,672 32,300,627
inetbench_long_dynamic 132.71 sec 20 bytes 20,000,000 22,300,672 42,300,627
inetbench_varchar 144.44 sec 24 bytes 24,504,148 36,341,760 60,845,908
inetbench_varchar_utf8 148.86 sec 24 bytes 24,504,148 36,341,760 60,845,908

MySQL is adding 1 byte of overhead per row to inetbench_long, 10 bytes/row to inetbench_long_dynamic, and an average of 5 bytes/row of overhead to inetbench_varchar and inetbench_varchar_utf8.

The s column was added to test the performance difference of fixed vs. dynamic row storage, but it turns out there’s not much difference.

You can download the code used to generate these results here.

Analysis

Storing IPs as a string, besides requiring more disk space, takes 9% longer than storing them as integers, even with the overhead of converting the IP from a string to an integer. If the table uses utf8 encoding, it’s 12% slower. (This should not be surprising: UTF-8 is inherently slower to process than a strictly 8-bit encoding.) Storing data as an integer in a table with a dynamic row length is not appreciably slower. The indexes on the string tables are 63% larger.

For good measure, I tested a few select queries against the tables. The results were actually somewhat interesting.

Doing a search where you’re looking for a specific IP, or a range of IPs that can be satisfied with a like clause resulted in no significant difference between the integer and varchar storage. (This was somewhat surprising to me.)


select benchmark(10000000, (select count(*) from inetbench_long where ip between inet_aton('172.0.0.0') and inet_aton('172.255.255.255')));
1 row in set (0.41 sec)

select benchmark(10000000, (select count(*) from inetbench_varchar where ip like '172.%'));
1 row in set (0.42 sec)

However, as expected, when the search range can’t be represented with a simple like query, the speed difference between numeric and string indexes really show:


select benchmark(35000000, (select count(*) from inetbench_long where ip between inet_aton('172.16.0.0') and inet_aton('172.31.255.255')));
1 row in set (1.43 sec)

select count(*) from inetbench_varchar where inet_aton(ip) between inet_aton('172.16.0.0') and inet_aton('172.31.255.255');
1 row in set (1.47 sec)

select count(*) from inetbench_varchar_utf8 where inet_aton(ip) between inet_aton('172.16.0.0') and inet_aton('172.31.255.255');
1 row in set (1.72 sec)

This results in an integer search that’s about 35 million times faster than the string search. Also, the utf8 table is about 17% slower than the latin1 table (which isn’t too surprising, given UTF-8’s overhead).

In Conclusion…

With “only” a difference of 12 microseconds per insert query, it may not make sense to change an existing database if you’re not doing many queries against stored IPs. If you’re doing IP range queries, though, you probably want to convert your tables. Any new development should be storing IP addresses in the database as integers by default. The space and time savings are worth it.

Once IPv6 becomes more prevalent, the savings will only become larger: a 128-bit (16 byte) IPv6 address can be up to 39 characters long when represented in a “human readable” format. (Storing IPv6 addresses in the database is going to be a bit more difficult, as MySQL doesn’t have a native 16-byte-wide data type.)

The load on MySQL when inserting integer IPs could likely be slightly reduced by doing that conversion in your application, rather than using MySQL’s INET_ATON() function.