Archive for the 'Development' Category

Chrome, Gears & XUL

Tuesday, September 2nd, 2008

I’m writing this post in Google Chrome. Google’s beta release of the Google Web Browser. It took me some time to download and install Chrome because the download site is obviously swamped mere hours/minutes past it’s “exciting” launch.

First impressions? Very clean and simple. The Chrome is almost not there.
And it’s fast. It renders pages very fast. Very fast. Faster than anything I’ve used before. Wordpress is running nicely, the embedded Gears and the Javascript optimisations probably have a huge benefit for Wordpress sites due to the heavy use of Javascript techniques used in it. And Gmail? Scorching performance.

So far, the performance is the killer USP.

No rendering issues on any sites I’ve tried so far.

The big drawback? Lack of features. I guess it took Firefox quite some time to get to the point that I have my perfect profile, with use of firebug, HTML validator etc. But it is so simple and clean. To quote page 24 of the comic (which I can’t find a good way to deeplink, so won’t):

 

We don’t want to interrupt anything the user is trying to do. If you can just ignore the browser we’ve done a good job. 

 
They have done a good job. The browser is not there, in a way that Firefox and IE7 sadly are.

The question for me earlier today was Why? Dare says it’s because they want to drive the web forward, which is backed up by their comic and what they say on their blogs and posts and brochure pages. They live and die by the web. They are producing some of the best, or most influential at least, web based applications. And their ability to develop these and move them forward is stifled by the pace of browser development in IE, Firefox and other products.

The pages in the comic complaining about “Speed Limits” on the web etc seem to back this up. For them, the claimed point of Chrome is to show people what a fresh look at the web could be. Faster. More powerful. Safer. Better. More driven by the Google Ethos and approach to building great, inovative products that push the industry forward in big jumps.

Of course, not everyone buys into this. Microsoft Watch see it as a cynical thing. Killing their partners.  Some kind of anti-trust, Google doing evil, making the wrong move for a fair market etc.

I don’t think I buy into that.

Earlier today, my thoughts ran on the lines of, this is a bad idea and it won’t work. According to Google’s figures, 40% of their users are on IE6, 40% on IE7 and 16% on firefox. Talking to colleagues at work, who are all internet developers. Geeks. The people who live on the web as much as Google, a lot of them use IE because it’s on their PC and it works. They don’t want or need anything more. That 16% Firefox figure comes from Firefox being the best thing on Linux. From Firefox being the best non-Microsoft solution (for the religious out there). And being pretty damn good for customisation. But, I would hazard the opinion that the market is not 100%. There is a significant portion of that market that is not, and never will be, up for grabs.

It’s my family, they’ll only use what’s on their desktop. They don’t know or care why firefox might be better for me to use. It’s not going to change them using IE because it’s there. Microsoft’s bundling of IE was clearly anit-competitive black had because of this. And the number of corporate environments with lock downs that allow the net, or their intranet, but only via IE as it’s part of their OS and can be locked down as such.

So, that makes Firefoxes market share bigger. So Chrome is competing with maybe the 16% Firefox marketshare, and 10% of the IE share that hasn’t moved but could. And the niche things.

So, is Chrome really about forcing inovation? It’ll force it in that niche away from IE, and Microsoft might pull along. But only if it’s really good. I didn’t beleive I’d use it for more than 10 minutes. Now, having used it for a little while, I plan to install it at work too. And live with it for a while. It’s that good. In beta 1. With very few features.

And it’s so fast. Firefox 3 was a lot faster than Firefox 2. If Chrome had come out before FF3, I may never have installed FF3. Because it’s so fast. CLEARLY faster than FF3. Which when I first switched to from FF2 felt so much faster. And when I went back to FF2 on an older machine I hadn’t upgraded it was painful. Will I hurt in Firefox3 at work in the morning?

So Firefox needs to start worying. It needs to look at V8. It needs to pull some of this stuff in and catch up.

Chrome could get a good fast uptake and make a very significant dint into that 16%.

And it doesn’t have adBlock.

Ah, now I’m suspicious. Of that 16% marketshare that firefox has, how many users do you think have adBlock or GreaseMonkey or another script that takes that crap off the web? That removes Googles Ad revenue stream? I’ve not seen an advert on the internet since I switched to Firefox with early versions of adBlock. And I use the web a lot. And I know that most Firefox users have something like adBlock running.

So is it about revenue? Chrome is Open. I assume it will have/does have plugin support. Will adBlock for Chrome come? Or will Google keep that out of it’s ecosystem? Is Chrome about revenue? Switch people to Chrome, remove adblock, get the chance to show people more ads. In return for faster, safer browsing with richer applications.

(I’ve just noticed my CPU pegged at about 30% on task manager. I’ve checked the task manager in Chrome. Shockwave was eating CPU. I’ve killed the tabs with Shockwave in, and now the browser is a ghost in my system eating no resources at all. Neat.)

But, Chrome is about inovation we’re told. Yet doesn’t support XUL, Mozilla’s XHTML + Rich GUI markup language. That is something I thought would be a great thing to see grow. That could give the real edge to these rich applications that Google want Chrome to enable. I assume it’s lacking due to the use of webkit. But, with v8 powered XUL, the things you could build would be awesome.

It’s going to be a case of time will tell I assume. I think it looks positive. This morning when I heard the news, I couldn’t see the point. The competition (Firefox, Safari and Opera) are too far ahead. Google too far behind. Now I’ve used the “vision”, the speed, cleanliness and disruptive new thinking are converting me.

Chrome + Gears + Google’s web applications could be a massive boost in creativity, development and forward motion that the web as an operating system/platform for the future of computing really needs.

Popularity: 53% [?]

Summer of Code 2008

Tuesday, March 18th, 2008

The Geeklog Project (for which I am a Core Team Member) has been confirmed as a participating organisation in the Google Summer of Code 2008. We have our ideas list up and are recruiting for students.

So, if you are a student at university and interested in a Free Google T-Shirt and Some Money for contributing to a PHP Open Source Project, then check it out. I’ve identified three projects that I will be mentoring on if suitable students are found, and there are a number of other projects available if you hate me ;-)

Popularity: 49% [?]

Faster, Easier Wordpress Upgrade

Tuesday, February 5th, 2008

In the last 38 days there have been two urgent security fix releases of Wordpress. In the last 3 and a tiny bit months there have been three maintenance releases of Wordpress. In the last 4 and a tiny bit months there have been four releases of Wordpress.

Other than the fact I’m starting to get pretty worried about the security and stability of the software in general, it’s a pain in the rear to have to keep upgrading. So I’m making it easier for me. Wordpress themselves have helped by having a decent system in place for making it easy to get the latest.

I now have the simplest of shell scripts which:

  1. Backs up my database.
  2. Backs up my Wordpress folder.
  3. Gets the latest Wordpress release.
  4. Unpacks that release.
  5. Deploys that release live.

Being nice, I’m going to share it with you:

mysqldump --host=localhost --user=wordpress --password=wordpress wordpress > wordpress.sql
tar -zcf wordpress_backup.tgz wordpress_live
wget http://wordpress.org/latest.tar.gz
tar -xzf latest.tar.gz
cp -r wordpress/* wordpress_live/
rm -r wordpress

Of course this assumes that you have a wordpress database in a localhost MySQL instance with username and password wordpress and that your live wordpress folder is wordpress_live so you can cope with a temporary wordpress folder from the unpack. It also assumes that mysqldump, tar and wget are available in your shell.

Also, I don’t just do this on live. I back up my live, put it on my portable instance and test the new version first. Then I do it on live. Then I update the versions of my plugins.

What an arse. This is why I prefer Geeklog. It’s more secure and doesn’t change at an alarming rate.

Now I can SSH into my server and type ./upgradewordpress.sh when I’m ready then hit http://inanger.com/[secretlocationofadmin]/wp-upgrade.php and finish things off. Job done. I still have a pain in the rear as I have to test the release locally first (./upgradewordpress.sh on local instance of course, after restoring a fresh backup of live into it and adjusting the config to refer to my local instance).

And I think this is less risky than tracking Wordpress via SVN on live.

Popularity: 55% [?]

Database Change Control

Monday, February 4th, 2008

K. Scott Allen recently published a five part series on the importance of version control in creating and maintaining the database behind your product. This starts with something pretty important and fundamental, three rules for database work. Rules #2 and #3 are vital. No argument there. Rule #1 I’ll come to later. Jeff Atwood has also written on the subject before, and highlighted his previous post and a few further comments in support of what K. Scott Allen had written in an apparently unscheduled post on his blog. Looking at the trackbacks and comments on the posts, this appears to have generated a lot of interest, and I feel compelled to critique the posts.

Firstly, you must get your database under version control if you ever plan on releasing more than one version. You must have an authoritative source of schema and procedures etc, you must do versioned releases. This is not in contention. But other points in K’s series and Jeff’s unquestioning support for this are.

Secondly, some background, I’m a Development Manager for a Microsoft Technology Stack web based application which is maintained and released as a product. We have a lot of tables, a lot of stored procedures and other database entities. And a lot of developers.

Taking the approaches outlined by K in his posts on Change Scripts and Views, Stored Procedures and the Like on face value, they are hopeless. They’ll get you into trouble. Fast.

K’s process is one script for the database schema and one script per stored procedure in the master definition. He then references Phil Haack’s Bullet Proof Sql Change Scripts post for an idea of how to provide better change programs than his simplistic ones.

Phil’s stuff is good in principle, making sure that your SQL Change Scripts can execute many times, but, they are laborious. We’ve rolled all the checks you could possibly want into a set of functions dbo.ColumnExists(TableName, ColumnName). dbo.IndexExists(IndexName, TableName) etc.

But with a system like ours with thousands of stored procedures (literally) and doing a re-deploy of all sps is painful. So we script only the updates and roll them out. But that will be a lot of updates in a release designed to take any version of our oldest supported version up to our newest major release (iApplication XP Panoramic Web 2.0 Version). Executing each of these individually and tracking the results is a hassle.

The next step is to roll these scripts together into one to make a quick deployment, just get the installer/dba to execute “Update_1.1.0.1.sql” on the target database and there you go. *

Only that’s when it gets really problematic. SQL Query Optimiser pulls the rug out from under your feet. What it will do is change the order of execution to optimise it. It’ll create new tables, add rows and script stored procedures at times to suit it’s own optimisation desires. And then everything will fail. Columns won’t exist, so stored procedures won’t compile. Views will fail. Commands to add indexes will explode. There will be bits of schema all over the place.

So then you have to wrap every alter table command, every DROP usp_ and CREATE usp_ in sp_executeSQL. Which means you have to escape every ‘ character correctly. So then you need to write a tool to generate the change scripts cleanly.

Then, you find that if you are executing hundreds of changes in batch in a single change program, you’ll find that it’s hard to stop a change logging itself when only part has fallen over, so then you have to wrap each item in a check for @@ERROR to see if anything went wrong. And wrap the whole lot in a transaction and roll that back if any errors happened at all.

Right, so now we have something approaching bullet proof. Now we need to make sure our team of developers do it right every time. We write a process document and run a training session. We explain how each development starts with getting the latest SP/Table definition from Source Control making your changes, generating your change program. That your change program must be tested and shown to execute cleanly leaving the full trace etc, and must contain the right assigned version number.

Whoever writes this change script will test it thoroughly and against a variety of test data, then commit the change script into source control. The schema change is officially published. The schema change will start to appear in developer workspaces as they update from source control, and on test machines as new builds are pushed into QA and beyond.

That just doesn’t work. You will despair as even your best guru programmers, the ones who really care about their craft, make mistakes and take short-cuts because the process is onerous and is seen as a tax. So then you need to have a nightly build process that restores a clean database, executes every change against it, checks the results and the database. That runs a parse on the change program beyond the SQL Execution to make sure the rules are adhered to and Change_1_0_1_0_2.sql actually records itself as 1.0.1.0.2 and not 2.0.1.3.4 which is it’s number on a different branch etc etc. K’s text is incredibly hand-wavy.

1. Never use a shared database server for development work.

The convenience of a shared database is tempting. All developers point their workstations to a single database server where they can test and make schema changes. The shared server functions as an authoritative source for the database schema, and schema changes appear immediately to all team members. The shared database also serves as a central repository for test data.

Like many conveniences in software development, a shared database is a tar pit waiting to fossilize a project. Developers overwrite each other’s changes. The changes I make on the server break the code on your development machine. Remote development is slow and difficult.

Avoid using a shared database at all costs, as they ultimately waste time and help produce bugs.

So having a single instance per developer on their local machine is a panacea for all your “shared database” problems. All those bugs created by people over-writing each other’s schema changes etc.

What about when a developer neglects to update their local instance and produces a fix based on an out-dated schema or stored procedure? Same thing. How do you ensure all your developers are keeping their local instance sufficiently up to date? Auto-update their instances with each commit as it’s stabilised? What if that wipes out their changes?

I’m not saying a shared database solves these issues, or that you won’t run into the issues he mentions on your shared database. But it’s only one point to control. We periodically re-stabilise our test and development databases where developers are patching their work for peer testing. We’re working on improving this process all the time. Some development requires an isolated environment to avoid breaking everyone’s ability to work, and they do have isolated local environments, but only for the length of that development.

Database version management is an incredibly hard problem to resolve. And although getting people started with it as K and Jeff have done is good, it’s not enough. You have to go further. And if you’ve gone further than us, please tell me where we go next!

* - (Using isql with the right voodoo to supress all the line numbers and pointless messages just leaving us the completion state, piping the output to a text file which we can parse to ensure that everything completed AOK and the database is not left in an inconsistent state, prior to us then needing to validate that the database IS actually in a decent state and the scripts haven’t falsely reported success…)

Popularity: 22% [?]

It’s About Breadth

Thursday, January 31st, 2008

I was reading a blog entry about hot technology in Java over at Manageability.org. The second paragraph in the entry slapped me out of my non-blogging frenzy with it’s wrongness.

It’s been suggested that Polyglot programming be in the list. Even though I do subscribe to the notion that learning other languages are beneficial to one’s craft, it simply is not pragmatic advice. It is not practical to recommend that someone study Ruby, Groovy, Scala and who knows what other language is vying for your attention. Stick to a couple of languages and do it well. Some languages are better than others for certain tasks. However the biggest fallacy of all is that, a dynamic language is not considerably better than a static one. It’s no magic bullet.

The main thing to note, is the paragraph is not completely wrong. The author does note that “the notion that learning other languages are beneficial to one’s craft”, but unfortunately caveats that with it’s just not pragmatic or practical and that the reader should stick to a “couple” of languages.

I strongly disagree.

As a professional programmer, in your day job you should code in one technology set. Note I say technology set, not language. For some that may be one language. For others that may be several. For me, the last time I was a hands-on-programmer as my day job, that was Javascript, CSS, XHTML, ASP(VBScript), Visual Basic 6 and T-SQL.

You should strive for a deep and complete depth of understanding of that technology set. This should clearly start with a basic competency of the limited set of those technologies that relate to the product/project you are working on. But you should deepen and broaden this understanding as fast and well as humanly possible.

You should know how to do anything that VB6 can do, not just within the context of your web development. You will need to learn the aspects of VB6 programming that can never be used in a web context, but along the way, you will learn many things you would not have otherwise learnt. These things may be things you can directly use in a web context, or things that just improve your approach to problem solving, design and development issues, a fresh perspective on the language.

The next step from here is to take that solid grounding in your primary weapon and mature and expand it with exposure to other languages and technologies.

The development communities around each language are akin to separate nations. Sometimes diplomatic channels are open and citizens freely move between the nations. Other times there is open hostility. Each nation has it’s own way of life. There is always some common ground between all languages, but, often between them they have vastly differing ways of approaching a common problem. Continual exposure to these different languages opens you up to more ways to solve the problems you are faced with. You will be able to deal with a vastly wider range of problems as a result.

And this is a critical skill to develop.

Do not let yourself become an island nation. Have that deep mastery of a key language/technology set and use it daily, but make sure you are also constantly looking around for other languages and ideas to broaden your understanding of your craft. Travel widely. Use the languages for “real” in anger development to really understand the different pain points. Ruby may solve one pain for you at the cost of other deeper pains.

Without you, and people like you, doing this language tourism, building this breadth, there won’t be a new top 5 interesting technologies in [whatever language] in 2009, because the [whatever language] community will stagnate as it examines it’s own navel.

Popularity: 24% [?]

Caching Using Zend

Monday, October 22nd, 2007

The Zend Framework provides an interesting set of PHP5 libraries for caching. There’s a nice architecture to it, providing a number of different backends and frontends for caching. However, I recently found it very frustrating to try and figure out a decent caching strategy for the new version of a site I was working on.

And the documentation did not help at all. So, allow me to elaborate for the benefit of the huddled masses.

The introduction in the caching section of the manual gives a decent enough overview of the basics, if I want to cache a page with a nice simple ID, such as “Page1″ with a set lifetime I can do so in a few lines of code.

However, another page goes on to mention how you can also “tag” records with multiple tags. Another page talks about how to clear the cache by a single tag, or combination of tags.

But nothing explains the relationships between tags and ids, and how the clear works with tags or ids.

Now, I’m working on a system which has two views of a music catalogue for a radio station. There is the requests system and there is the discography system. They both present differing views of the same data. The request system filters the list of artists, albums and tracks on the station to those that can be requested and displays a “request optimised” set of screens with some of the data. The discography section shows everything about an artist, all their albums, all tracks, reviews and so forth, without the cruft needed for the request sub-system.

We cache these screens for obvious reasons. What we need is the ability to clear the cache of an artist, album or track possibly within the requests or the discography, or both. So, I figured on a system of unique keys like artist_123 and album_123 etc then to use the tags to “lump” things together. So album_123 would have the artist_123 tag in both discography and request view plus the discography tag in the discography and the requests tag in the requests view. Something like:

$cache->load("album_123", array("album_123", "artist_123", "requests", "en_GB", "album");

I could then simply invalidate the cache of all albums, request pages, artist_123 pages or British English generated pages or any combination of those items

This does not work.

The first important thing is that the ID is the unique key. Not the combination of the ID and the tag(s). So if you save page1 to the cache with the tags tag1 and tag2, then try and load page1 from the cache with tags tag3 and tag4, you’ll get the result of saving with tag1 and tag2!

Insane, but true. Try it. The tag has no effect whatsoever it seems on the load code. If it can find an item by ID, then it loads it, irrespective of tags. I’m not sure if this is behaviour by design, or a bug on my system using the file backend, but it is consistent. I just think it’s mad.

To get what I desire, I’m going to have to cache with:

$cache->load("requests_artist_123_album_123_en_GB", array("requests", "artist_123", "album_123", "en_GB", "albums");

Then I can still do a clear on the appropriate tags, if for example I want to remove all albums from the requests cache:

$cache->clean(Zend_Cache::CLEANING_MODE_MATCHING_TAG, (array("requests","albums"));

It’s a bit of a pain in the rear, but, once you’ve figured out the $tags argument on the load method is pointless, it’s fine.

Of course, figuring all this out was further compicated by the fact that the examples in the manual often use load() with $key and $tags and save() with no arguments. I assumed therefore that the point of the $tags argument on load() was to set the tags that would be used auto-magically on save(). Only, if you don’t pass $tags to save() it saves with no tags. Which is also silly, since it does respect the $key used in the load() method.

Popularity: 40% [?]

The Wrong Answer to the Right Question?

Saturday, September 29th, 2007

I’m often faced with the need to do a one off crunch of data to provide answers to questions management ask about raw data. Not the kind of thing they’ll be asking for on a regular basis. Just a need to scratch an individual itch. One off reports on specific aspects of code metrics. Calculate some predictions of data growth in the application across different aspects of it’s user base.

On these occasions, I either turn to Query Analyser to mine our SQL Server databases directly, or use the data import tool in Excel and try and crunch the data swiftly in that. Sometimes, those needs become a long running need to manage some data, where Excel is often the preferred format, because I can do some initial crunching and manage the data in there and the rest of management can then take a copy and further manipulate it and play with it to get additional information as and when it occurs to them they need it.

The problem I face is that Excel is designed for accountants and management types with no programming knowledge to manage spreadsheets of data they understand. It’s too damn user friendly. I find it very hard sometimes to find a good way to manage my data in Excel. I often throw my hands up in dispaire and lash up a software tool specifically to manage the data. I’m talking about a full on database driven web application in most cases. It’s so much faster for me to work with the data that way, and I can then use the Data Import tool in Excel to shove the data in raw forms into spreadsheets if the people asking for the data want to take it away and play with it.

There has to be a better way. There has to be a more productive way for me to do this. A more developer focussed tool for doing this, that allows you to achieve with scripting/programming what you would achieve in Excel by mucking around with excessively user friendly wizards and obscure dialogue screens.

John Udell thinks the answer might look something like Resolver, which is a new spreadsheet application written in Python that allows you to use Python directly in cells and to have full access to .NET and IronPython through the whole application.

This just seems to be the wrong solution to me. With Excel we have a spreadsheet product that is so good it’s destroyed all competition that non-programmers use and love. It can be extended by programmers with add-ins and macros. You can write .NET code or VBA code (easy for non-programmers to learn) in the Macros etc. However, the formulae are restricted to the old style “icky” functions. Stick=if(condition,forumlae,formulae) in, which just makes programmers recoil from the keyboard in horror.

The right answer to the question is to have a simple option to enable direct access to the .NET runtime in cells. Then people can code formulae in any .NET enabled language they choose, including IronPython.

Do not throw the baby out with the bathwater.

Popularity: 31% [?]

Internationalisation Take 2 - Zend vs Cheap-o Arrays

Saturday, June 23rd, 2007

In response to my emails to the Zend Framework I18N list and my previous post, Thomas, the author of the Zend_Translate framework items mailed back to the list here:

> 2) gettext is a more expensive version of using the arrays backend.

No… it’s a less expensive version. What takes time is reading the original
source. Your processor is always faster than your harddrive.
It is better to do some computations than reading a bigger file. And mo
files are much smaller than the same sized array files.

This still seems wrong to me, so I’ve done a bit more analysis. I have now got XDebug up and running in my portable environment, so I can really see the details of the costs. Now, to caveat all this, I’m running all this from a USB key on a laptop that’s doing a number of other background tasks, so, the performance is not isolated. Due to this I’ll be looking at percentages of time in Wincachegrind, not actual execution times.

Now, to test this I have generated two files. One of which is a .po containing 1000 phrases which I have compiled to a .mo file. The other is a PHP array in a PHP file containing the same 1000 translations. I generated this with a script, the translations are a bit simple:

From the .po file:

msgid “String 0″
msgstr “String Translated 0″

From the .php file:

‘String 0′=> ‘String Translated 0′,

I have then written a simple PHP file which translates 50 of these items. A reasonable enough test I think. Firstly, to test the translation using the fast Zend_Translate gettext options:

require_once 'Zend/Translate.php';
i = new Zend_Translate('gettext', '/development/language/test.en.mo', 'en');
 
function _($s)
{
    global $i;
    $s = $i->_($s);
    echo($s."<br/>\n");
}
 
_('String 1');
_('String 2');
...

I then ran this file and loaded the cachegrind output into WinCacheGrind. 87.88% of the execution time was spent in Zend_Translate_Adapter_Gettext->_loadTranslationData. Performing translations took 1.99% of the time.

Next I used my PHP array and the Zend_Translate array backend:

require_once 'Zend/Translate.php';
require_once '/development/language/test.en.php';
$i = new Zend_Translate('array', $LANGUAGE, en);

(The rest of the file remains the same). I then ran this and checked the output. loadTranslationData took 78.36% of the time. Performing translations took 2.86% of the time.

My third test was just to use the test.en.php file and a simple translation function:

require_once '/development/language/test.en.php';
function _($s) {
  global $LANGUAGE;
   $s = (array_key_exists($s, $LANGUAGE)) ? $LANGUAGE[$s] : $s;
  return $s;
}

The first thing to note was that the Zend_Translate items took over 20ms each. This one not using Zend at all took 2.8ms. The require_once statement took 1.83% of that time. Then it was just repetition of an un-recordably-fast translation 50 times.

So what do I draw from this? I draw from this that for simple translations, you can’t beat a very, very simple system with just an array of translations. I haven’t looked in any depth at the other services offered by Zend_Translate, but it does allow you to add multiple translations and translate in multiple languages. But, do you have a use-case for that?

If your UI needs to display in a single language, but translate that language, take the simple approach. It needs a little extension to support modular languages, but look at the PHPBB3 implementation and you can’t go far wrong. That loads modular translation files (just to keep that trivial require_once cost down) each of which array_merge’s back into a single translation array which is key’d by constants.

Fast.

My cachegrind files for your reading pleasure:
Zend_Translate - Array
Zend_Translate - Gettext
Non-Zend_Translate

Popularity: 37% [?]

Profiling PHP With XDebug - Portably!

Saturday, June 23rd, 2007

When you are working on a web site or web application, something that little thought is spared for at development time all too often is it’s performance. You need to know where your bottlenecks are and what the costs of each architectural and implementation choice you make are. You should routinely profile your application’s core functions to see how they behave. How do we do this? You could put little timers into the code and log timings to see how things work, or you could use XDebug.

XDebug is a PHP extension that provides a number of critical features to developers. It supports step through debugging (assuming your code editor can hook into it) and it supports profiling of your scripts. This gives you a detailed breakdown of every command executed in your application, how many times it was executed and what it cost.

This is invaluable when tracing your performance work. You can identify exactly which routines are costing you too much. It can give valuable insight into the performance landscape of your software. So I’m going to hook it up into my development environment on it’s USB key.

Firstly, pop over to the XDebug site and download the relevant version for your PHP version. I downloaded the Windows Binary of 2.0RC4 for PHP 5.2.1+. XDebug is a “Zend Extension”, it’s not a standard PHP Extension, it extends the Zend Engine that powers PHP. This changes how we configure it in php.ini and means it doesn’t have to go on the extensions folder of your PHP install. But, I keep it there for consistency. Once placed in that file you need to edit your php.ini, the commands can go anywhere in the file, but, I placed them after the PHP Extension loading commands, again for consistency:

zend_extension_ts=/development/php/ext/php_xdebug-2.0.0rc4-5.2.1.dll

Note that this is loaded with the zend_extension_ts command instead of the extension command (the ts denotes Thread Safe mode) . Also note that we specify the full path to the extension. The zend_extension_ts (and other zend_extension commands) need the full path as they don’t pay attention to our extension directory command.

Once this is done, go to your PHPInfo() test page and check, you should have XDebug information included:

XDebug Enabled

Ok so far so good. If you have a page which currently throws a php error, go check it now. You’ll notice that just having XDebug installed gives you much more information. XDebug makes developing easier just by being there.

Now, we’re mainly going to use it to profile performance of our applications and third party libraries, so we need to enable profiling, this is done with a couple of new entries in php.ini. I placed them just after my command to load the extension so it’s all in one place:

xdebug.profiler_enable = 1
xdebug.profiler_output_dir=/development/

Now restart Apache and hit your PHPInfo() page. In the development folder on your USB Key you will have some files called cachegrind.out.[some number]. This is the profiling information in it’s raw form and of no use to you on it’s own.

You need a cachegrind analysis program. I use Wincachegrind as I’m on windows. You can use this to open up the cachegrind file and see what took what time. Visit yourPHPInfo() page, pick up the cachegrind output and take a look. You can see a lot of detail.

CacheGrind

I don’t propose to detail how to use Wincachegrind and do a full analysis, a bit of poking should show you what’s going on. But, I’ll be using XDebug and WinCacheGrind to get under the covers of some third party libraries I’m considering for use in the development of Multiblog, so we’ll see more information then.

Popularity: 31% [?]

Internationalisation

Thursday, June 21st, 2007

The web is global.

Lots of websites do not cope with this. They do not provide a user setting for the language and deliver their content in that language.

Clearly, this is bad. If you are producing an application, like Multiblog, then you need to make it international. It needs the UI at least (content is a more thorny issue) to work in the users preferred language. Otherwise they will experience friction trying to use the confusing foreign thing.

There are a lot of ways to achieve this. Geeklog and PHPBB3 use arrays to translate content and allow the user to pick things. Drupal uses the GNU GetText system. And there are other approaches.

Choosing the right approach and using it correctly is difficult. I’m currently experimenting with approaches for Multiblog and other projects. I’m currently looking into the very interesting Zend Framework’s Zend_Translate class. This allows a number of different approaches, including both Array and GetText.

GetText appears to be the recommended choice. There are a number of free tools that can generate your translation files, as the translation files are not human readable. It’s fast and threadsafe. The Zend Framework Manual offers some advice on how to structure your translation files. There are several suggested methods, but, there is no suggestion of how to structure your translation modules.

The question I asked was “What’s the best practice?”, and no-one seems to know, so I guess I need to figure it out for myself from basic principles.

Now, GetText was written to provide internationalisation for the GNU software. Including the core of the Linux OS. Here, the GetText file is (I assume) parsed once at start up and held in memory to translate as things go. Web applications are different. Every page view is essentially a new start up. That GetText translation source is going to be loaded hundreds and thousands of times. Not just once on boot of the web server.

So, if we want to get this right, we need to know what our best bet is. Do we want a monolithic all translations file, or do we want to modularise this file and load it as needed? Does it use the file like a database and seek things out, or does it parse the whole thing every time and process it internally?

I’ve done some simple testing. I produced a basic test catalogue with poEdit and compiled a mo file from it:

msgid ""
msgstr ""
"Project-Id-Version: Test Zend GetTextn"
"POT-Creation-Date: n"
"PO-Revision-Date: 2007-06-21 12:20-0000n"
"Last-Translator: THEMike n"
"Language-Team: n"
"MIME-Version: 1.0n"
"Content-Type: text/plain; charset=utf-8n"
"Content-Transfer-Encoding: 8bitn"
"X-Poedit-Language: Englishn"
"X-Poedit-Country: UNITED KINGDOMn"
"X-Poedit-SourceCharset: utf-8n"
msgid "This is a test."
msgstr "[Translated]This is a test.[/Translated]"

I then wrote a simple test harness PHP file which loads a Zend_Translate using gettext and translates a single line. Before performing a translation, I var_dump the Zend_Translate instance to see what’s in it:

  <?php
  /* Configuration: */
  define('PATH_TO_ENGINE', '/development/engine/');
  define('PATH_TO_LANGUAGE', '/development/language/');/* Put engine on the include path */
$curPHPIncludePath = ini_get( 'include_path' );
if (defined( 'PATH_SEPARATOR')) {
    $separator = PATH_SEPARATOR;
} else {
    // prior to PHP 4.3.0, we have to guess the correct separator ...
    $separator = ';';
    if( strpos( $curPHPIncludePath, $separator ) === false ) {
        $separator = ':';
    }
}
if (ini_set('include_path', PATH_TO_ENGINE . $separator . $curPHPIncludePath) === false){
        die('Buggered');
}
require_once 'Zend/Translate.php';
$t = new Zend_Translate('gettext', PATH_TO_LANGUAGE.'test.en.mo', 'en');
echo('<pre>');var_dump($t);echo("</pre><hr/>n");echo($t->_('This is a test.'));?>

The result of the var_dump being:

object(Zend_Translate)#1 (1) {
  ["_adapter:private"]=>
  object(Zend_Translate_Adapter_Gettext)#2 (6) {
    ["_bigEndian:private"]=>
    bool(false)
    ["_file:private"]=>
    resource(21) of type (stream)
    ["_locale:protected"]=>
    string(2) "en"
    ["_languages:protected"]=>
    array(1) {
      ["en"]=>
      string(2) "en"
    }
    ["_options:protected"]=>
    array(1) {
      ["clear"]=>
      bool(false)
   }
    ["_translate:protected"]=>
    array(1) {
      ["en"]=>
      array(2) {
        [""]=>
        string(339) "Project-Id-Version: Test Zend GetText
POT-Creation-Date:
PO-Revision-Date: 2007-06-21 12:20-0000
Last-Translator: THEMike
Language-Team:
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Poedit-Language: English
X-Poedit-Country: UNITED KINGDOM
X-Poedit-SourceCharset: utf-8
"
        ["This is a test."]=>
        string(40) "[Translated]This is a test.[/Translated]"
      }
    }
  }
}

As you can see, before I’ve even called a translate call, the entire mo translation catalogue has been loaded into memory and parsed internally to form a PHP array. Which is then used for translation.

Clearly, this indicates a very modular translation system. I would want a core.lang.mo file for “common” translations used througout the application and then a controller.lang.mo file for each controller that had that controller’s specific phrases in it which is only loaded by that controller.

However, note that the translation is done to a PHP array. Essentially, it seems the gettext translator is really a front-end loader of the array translator. So why not use the array translator?

The only downside I can see is that it’s harder to get non-programmers to generate valid PHP arrays when supplying your translation. Other than that, anything that the PHP extension does to optimise compilation and processing of PHP code will kick in and give you a significantly improved performance. Put extra things on top of that like the Zend Optimisers and so forth, and you have a compelling reason to use highly modular array based translation.

The problem then remains getting valid PHP array files back from your translators, and frankly, that can be solved by writing a simple front end for your translators so they have a GUI to use.

Popularity: 48% [?]