Caching Using Zend

Monday, October 22nd, 2007

The Zend Framework provides an interesting set of PHP5 libraries for caching. There’s a nice architecture to it, providing a number of different backends and frontends for caching. However, I recently found it very frustrating to try and figure out a decent caching strategy for the new version of a site I was working on.

And the documentation did not help at all. So, allow me to elaborate for the benefit of the huddled masses.

The introduction in the caching section of the manual gives a decent enough overview of the basics, if I want to cache a page with a nice simple ID, such as “Page1″ with a set lifetime I can do so in a few lines of code.

However, another page goes on to mention how you can also “tag” records with multiple tags. Another page talks about how to clear the cache by a single tag, or combination of tags.

But nothing explains the relationships between tags and ids, and how the clear works with tags or ids.

Now, I’m working on a system which has two views of a music catalogue for a radio station. There is the requests system and there is the discography system. They both present differing views of the same data. The request system filters the list of artists, albums and tracks on the station to those that can be requested and displays a “request optimised” set of screens with some of the data. The discography section shows everything about an artist, all their albums, all tracks, reviews and so forth, without the cruft needed for the request sub-system.

We cache these screens for obvious reasons. What we need is the ability to clear the cache of an artist, album or track possibly within the requests or the discography, or both. So, I figured on a system of unique keys like artist_123 and album_123 etc then to use the tags to “lump” things together. So album_123 would have the artist_123 tag in both discography and request view plus the discography tag in the discography and the requests tag in the requests view. Something like:

$cache->load("album_123", array("album_123", "artist_123", "requests", "en_GB", "album");

I could then simply invalidate the cache of all albums, request pages, artist_123 pages or British English generated pages or any combination of those items

This does not work.

The first important thing is that the ID is the unique key. Not the combination of the ID and the tag(s). So if you save page1 to the cache with the tags tag1 and tag2, then try and load page1 from the cache with tags tag3 and tag4, you’ll get the result of saving with tag1 and tag2!

Insane, but true. Try it. The tag has no effect whatsoever it seems on the load code. If it can find an item by ID, then it loads it, irrespective of tags. I’m not sure if this is behaviour by design, or a bug on my system using the file backend, but it is consistent. I just think it’s mad.

To get what I desire, I’m going to have to cache with:

$cache->load("requests_artist_123_album_123_en_GB", array("requests", "artist_123", "album_123", "en_GB", "albums");

Then I can still do a clear on the appropriate tags, if for example I want to remove all albums from the requests cache:

$cache->clean(Zend_Cache::CLEANING_MODE_MATCHING_TAG, (array("requests","albums"));

It’s a bit of a pain in the rear, but, once you’ve figured out the $tags argument on the load method is pointless, it’s fine.

Of course, figuring all this out was further compicated by the fact that the examples in the manual often use load() with $key and $tags and save() with no arguments. I assumed therefore that the point of the $tags argument on load() was to set the tags that would be used auto-magically on save(). Only, if you don’t pass $tags to save() it saves with no tags. Which is also silly, since it does respect the $key used in the load() method.

Popularity: 84% [?]

Parsing XML with PHP5

Sunday, July 8th, 2007

Recently, I had cause to re-visit some old code I had lying around. The code in question is designed to extract data from the administrative XML interface to a SHOUTcast radio station and display the latest track listings. The code in question was written in PHP4 using the core PHP parsing functions.

That API set is pretty horrible. Essentially, you set a couple of call back functions for starting tags, ending tags and tag content etc, then you have to have a huge if statement and use global variables to co-ordinate which tag you’re looking at. I decided that piece of code needed a 100% re-write, in PHP5 using the DOM Document API. i.e. do it the proper way. They way I would in any other language.

Unfortunately, the php.net documentation is a little wrong. Or out of date. The documents seem to be relevant for PHP4.3 only. Which results in lots of errors when you write code referencing the documents. So, without further rambling, here’s a quick rule which appears to help:

Where the documentation states function_name() instead, use an attribute functionName. In fact.

This is not true of all things. Some things are still functions, and I think they are mostly functionName() instead of the documented function_name.

Look at the documentation for any DOM XML components from other vendors, such as the Microsoft XML 4.0 object, it seems PHP’s methods match that.

So, here’s what I came up with using the manual:

$values = array();
$dom = new DOMDocument();
$dom->loadXML($xml);
$nodes = $dom->document_element()->first_child()->child_nodes();
foreach($nodes as $node) {
    $childNodes = $node->child_nodes();
    $value = $childNodes[1]->node_value();
    $values[] = $value;
}

Which of course is all-sorts of not working. The correct version is in fact:

$values = array();
$dom = new DOMDocument();
$dom->loadXML($xml);
$nodes = $dom->documentElement->firstChild->childNodes;
foreach($nodes as $node) {
    $childNodes = $node->childNodes;
    $value = $childNodes->item(1)->nodeValue;
    $values[] = $value;
}

I hope this saves at least one person a few lost minutes.

Popularity: 40% [?]

Internationalisation Take 2 - Zend vs Cheap-o Arrays

Saturday, June 23rd, 2007

In response to my emails to the Zend Framework I18N list and my previous post, Thomas, the author of the Zend_Translate framework items mailed back to the list here:

> 2) gettext is a more expensive version of using the arrays backend.

No… it’s a less expensive version. What takes time is reading the original
source. Your processor is always faster than your harddrive.
It is better to do some computations than reading a bigger file. And mo
files are much smaller than the same sized array files.

This still seems wrong to me, so I’ve done a bit more analysis. I have now got XDebug up and running in my portable environment, so I can really see the details of the costs. Now, to caveat all this, I’m running all this from a USB key on a laptop that’s doing a number of other background tasks, so, the performance is not isolated. Due to this I’ll be looking at percentages of time in Wincachegrind, not actual execution times.

Now, to test this I have generated two files. One of which is a .po containing 1000 phrases which I have compiled to a .mo file. The other is a PHP array in a PHP file containing the same 1000 translations. I generated this with a script, the translations are a bit simple:

From the .po file:

msgid “String 0″
msgstr “String Translated 0″

From the .php file:

‘String 0′=> ‘String Translated 0′,

I have then written a simple PHP file which translates 50 of these items. A reasonable enough test I think. Firstly, to test the translation using the fast Zend_Translate gettext options:

require_once 'Zend/Translate.php';
i = new Zend_Translate('gettext', '/development/language/test.en.mo', 'en');
 
function _($s)
{
    global $i;
    $s = $i->_($s);
    echo($s."<br/>\n");
}
 
_('String 1');
_('String 2');
...

I then ran this file and loaded the cachegrind output into WinCacheGrind. 87.88% of the execution time was spent in Zend_Translate_Adapter_Gettext->_loadTranslationData. Performing translations took 1.99% of the time.

Next I used my PHP array and the Zend_Translate array backend:

require_once 'Zend/Translate.php';
require_once '/development/language/test.en.php';
$i = new Zend_Translate('array', $LANGUAGE, en);

(The rest of the file remains the same). I then ran this and checked the output. loadTranslationData took 78.36% of the time. Performing translations took 2.86% of the time.

My third test was just to use the test.en.php file and a simple translation function:

require_once '/development/language/test.en.php';
function _($s) {
  global $LANGUAGE;
   $s = (array_key_exists($s, $LANGUAGE)) ? $LANGUAGE[$s] : $s;
  return $s;
}

The first thing to note was that the Zend_Translate items took over 20ms each. This one not using Zend at all took 2.8ms. The require_once statement took 1.83% of that time. Then it was just repetition of an un-recordably-fast translation 50 times.

So what do I draw from this? I draw from this that for simple translations, you can’t beat a very, very simple system with just an array of translations. I haven’t looked in any depth at the other services offered by Zend_Translate, but it does allow you to add multiple translations and translate in multiple languages. But, do you have a use-case for that?

If your UI needs to display in a single language, but translate that language, take the simple approach. It needs a little extension to support modular languages, but look at the PHPBB3 implementation and you can’t go far wrong. That loads modular translation files (just to keep that trivial require_once cost down) each of which array_merge’s back into a single translation array which is key’d by constants.

Fast.

My cachegrind files for your reading pleasure:
Zend_Translate - Array
Zend_Translate - Gettext
Non-Zend_Translate

Popularity: 79% [?]

Profiling PHP With XDebug - Portably!

Saturday, June 23rd, 2007

When you are working on a web site or web application, something that little thought is spared for at development time all too often is it’s performance. You need to know where your bottlenecks are and what the costs of each architectural and implementation choice you make are. You should routinely profile your application’s core functions to see how they behave. How do we do this? You could put little timers into the code and log timings to see how things work, or you could use XDebug.

XDebug is a PHP extension that provides a number of critical features to developers. It supports step through debugging (assuming your code editor can hook into it) and it supports profiling of your scripts. This gives you a detailed breakdown of every command executed in your application, how many times it was executed and what it cost.

This is invaluable when tracing your performance work. You can identify exactly which routines are costing you too much. It can give valuable insight into the performance landscape of your software. So I’m going to hook it up into my development environment on it’s USB key.

Firstly, pop over to the XDebug site and download the relevant version for your PHP version. I downloaded the Windows Binary of 2.0RC4 for PHP 5.2.1+. XDebug is a “Zend Extension”, it’s not a standard PHP Extension, it extends the Zend Engine that powers PHP. This changes how we configure it in php.ini and means it doesn’t have to go on the extensions folder of your PHP install. But, I keep it there for consistency. Once placed in that file you need to edit your php.ini, the commands can go anywhere in the file, but, I placed them after the PHP Extension loading commands, again for consistency:

zend_extension_ts=/development/php/ext/php_xdebug-2.0.0rc4-5.2.1.dll

Note that this is loaded with the zend_extension_ts command instead of the extension command (the ts denotes Thread Safe mode) . Also note that we specify the full path to the extension. The zend_extension_ts (and other zend_extension commands) need the full path as they don’t pay attention to our extension directory command.

Once this is done, go to your PHPInfo() test page and check, you should have XDebug information included:

XDebug Enabled

Ok so far so good. If you have a page which currently throws a php error, go check it now. You’ll notice that just having XDebug installed gives you much more information. XDebug makes developing easier just by being there.

Now, we’re mainly going to use it to profile performance of our applications and third party libraries, so we need to enable profiling, this is done with a couple of new entries in php.ini. I placed them just after my command to load the extension so it’s all in one place:

xdebug.profiler_enable = 1
xdebug.profiler_output_dir=/development/

Now restart Apache and hit your PHPInfo() page. In the development folder on your USB Key you will have some files called cachegrind.out.[some number]. This is the profiling information in it’s raw form and of no use to you on it’s own.

You need a cachegrind analysis program. I use Wincachegrind as I’m on windows. You can use this to open up the cachegrind file and see what took what time. Visit yourPHPInfo() page, pick up the cachegrind output and take a look. You can see a lot of detail.

CacheGrind

I don’t propose to detail how to use Wincachegrind and do a full analysis, a bit of poking should show you what’s going on. But, I’ll be using XDebug and WinCacheGrind to get under the covers of some third party libraries I’m considering for use in the development of Multiblog, so we’ll see more information then.

Popularity: 61% [?]

Internationalisation

Thursday, June 21st, 2007

The web is global.

Lots of websites do not cope with this. They do not provide a user setting for the language and deliver their content in that language.

Clearly, this is bad. If you are producing an application, like Multiblog, then you need to make it international. It needs the UI at least (content is a more thorny issue) to work in the users preferred language. Otherwise they will experience friction trying to use the confusing foreign thing.

There are a lot of ways to achieve this. Geeklog and PHPBB3 use arrays to translate content and allow the user to pick things. Drupal uses the GNU GetText system. And there are other approaches.

Choosing the right approach and using it correctly is difficult. I’m currently experimenting with approaches for Multiblog and other projects. I’m currently looking into the very interesting Zend Framework’s Zend_Translate class. This allows a number of different approaches, including both Array and GetText.

GetText appears to be the recommended choice. There are a number of free tools that can generate your translation files, as the translation files are not human readable. It’s fast and threadsafe. The Zend Framework Manual offers some advice on how to structure your translation files. There are several suggested methods, but, there is no suggestion of how to structure your translation modules.

The question I asked was “What’s the best practice?”, and no-one seems to know, so I guess I need to figure it out for myself from basic principles.

Now, GetText was written to provide internationalisation for the GNU software. Including the core of the Linux OS. Here, the GetText file is (I assume) parsed once at start up and held in memory to translate as things go. Web applications are different. Every page view is essentially a new start up. That GetText translation source is going to be loaded hundreds and thousands of times. Not just once on boot of the web server.

So, if we want to get this right, we need to know what our best bet is. Do we want a monolithic all translations file, or do we want to modularise this file and load it as needed? Does it use the file like a database and seek things out, or does it parse the whole thing every time and process it internally?

I’ve done some simple testing. I produced a basic test catalogue with poEdit and compiled a mo file from it:

msgid ""
msgstr ""
"Project-Id-Version: Test Zend GetTextn"
"POT-Creation-Date: n"
"PO-Revision-Date: 2007-06-21 12:20-0000n"
"Last-Translator: THEMike n"
"Language-Team: n"
"MIME-Version: 1.0n"
"Content-Type: text/plain; charset=utf-8n"
"Content-Transfer-Encoding: 8bitn"
"X-Poedit-Language: Englishn"
"X-Poedit-Country: UNITED KINGDOMn"
"X-Poedit-SourceCharset: utf-8n"
msgid "This is a test."
msgstr "[Translated]This is a test.[/Translated]"

I then wrote a simple test harness PHP file which loads a Zend_Translate using gettext and translates a single line. Before performing a translation, I var_dump the Zend_Translate instance to see what’s in it:

  <?php
  /* Configuration: */
  define('PATH_TO_ENGINE', '/development/engine/');
  define('PATH_TO_LANGUAGE', '/development/language/');/* Put engine on the include path */
$curPHPIncludePath = ini_get( 'include_path' );
if (defined( 'PATH_SEPARATOR')) {
    $separator = PATH_SEPARATOR;
} else {
    // prior to PHP 4.3.0, we have to guess the correct separator ...
    $separator = ';';
    if( strpos( $curPHPIncludePath, $separator ) === false ) {
        $separator = ':';
    }
}
if (ini_set('include_path', PATH_TO_ENGINE . $separator . $curPHPIncludePath) === false){
        die('Buggered');
}
require_once 'Zend/Translate.php';
$t = new Zend_Translate('gettext', PATH_TO_LANGUAGE.'test.en.mo', 'en');
echo('<pre>');var_dump($t);echo("</pre><hr/>n");echo($t->_('This is a test.'));?>

The result of the var_dump being:

object(Zend_Translate)#1 (1) {
  ["_adapter:private"]=>
  object(Zend_Translate_Adapter_Gettext)#2 (6) {
    ["_bigEndian:private"]=>
    bool(false)
    ["_file:private"]=>
    resource(21) of type (stream)
    ["_locale:protected"]=>
    string(2) "en"
    ["_languages:protected"]=>
    array(1) {
      ["en"]=>
      string(2) "en"
    }
    ["_options:protected"]=>
    array(1) {
      ["clear"]=>
      bool(false)
   }
    ["_translate:protected"]=>
    array(1) {
      ["en"]=>
      array(2) {
        [""]=>
        string(339) "Project-Id-Version: Test Zend GetText
POT-Creation-Date:
PO-Revision-Date: 2007-06-21 12:20-0000
Last-Translator: THEMike
Language-Team:
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Poedit-Language: English
X-Poedit-Country: UNITED KINGDOM
X-Poedit-SourceCharset: utf-8
"
        ["This is a test."]=>
        string(40) "[Translated]This is a test.[/Translated]"
      }
    }
  }
}

As you can see, before I’ve even called a translate call, the entire mo translation catalogue has been loaded into memory and parsed internally to form a PHP array. Which is then used for translation.

Clearly, this indicates a very modular translation system. I would want a core.lang.mo file for “common” translations used througout the application and then a controller.lang.mo file for each controller that had that controller’s specific phrases in it which is only loaded by that controller.

However, note that the translation is done to a PHP array. Essentially, it seems the gettext translator is really a front-end loader of the array translator. So why not use the array translator?

The only downside I can see is that it’s harder to get non-programmers to generate valid PHP arrays when supplying your translation. Other than that, anything that the PHP extension does to optimise compilation and processing of PHP code will kick in and give you a significantly improved performance. Put extra things on top of that like the Zend Optimisers and so forth, and you have a compelling reason to use highly modular array based translation.

The problem then remains getting valid PHP array files back from your translators, and frankly, that can be solved by writing a simple front end for your translators so they have a GUI to use.

Popularity: 100% [?]

Choosing a Language is Choosing a Platform

Tuesday, April 17th, 2007

I’ve already posted an article about choosing a platform for your development. I missed a very important point, when choosing a language to develop in you also buy into the platform available to you, in this context, this means the third party libraries and supporting frameworks available to you. And as Jeff Atwood points out:

When you choose a language, like it or not, you’ve chosen a platform. And as Steve so patiently and calmly explained to all the Lisp enthusiasts, the platform around the language, more than the language itself, sets the tone for your development experience. The availability of common, popular libraries and the maturity of the development environment end up trumping any particular significance the language holds.

This was something I was planning to move on to talk about in more detail later. When you pick a language, you do need to look around at the choices of libraries available to you. PHP is lucky. It’s mature. There are a lot of libraries out there.

But, you must be very careful when confronted by such choice. Take a look at the options available for templating. Pear (a major source of libraries) have several implementations. There’s Smarty, and numerous other “just templating” libraries. The new Zend Framework also has a templating implementation.

There is a lot to choose from. Making the choice to go Pear::Flexy because that fits in with your use of Pear::DB, or to make use of the entire Zend Framework might be completely the wrong thing to do. You might want to create your own lightweight abstraction. Or go for something like SimpleT.

Picking a language with a small platform footprint can actually be better than picking a language with a large platform footprint.

But I’ll be talking about the costs associated with blindly using a library from the “PHP platform” for every project. When all you have is a hammer, everything looks like a nail. If you only have one type of hammer, that’s not worse than having a whole basket of different hammers.

Popularity: 58% [?]