25th September 2008

This search has gone on for years. I am looking for a newsletter system that is easy to install (to most likely LAMP-based), allows sign-up to multiple newsletters, and is reliable and flexible.

The most important thing is that once configired, I need to be able to hand over the keys to a client so that they can manage it from that point on. That is the hardest part, because no matter what I try, there are always highly technical aspects to sending out a newsletter that really stump the non-technical clients.

This is what I have come up with (and tried) so far:

PHPlist (Tincan)

PHPlist Logo

PHPlist Logo

This has some lovely features, but some pretty dire ‘gotchas’ that can trap an unwary user. For a start the admin screens are as ugly as hell, and the terminology can make it hard to fathom out what each option does (for example, to create a draft newsletter, you need to use the ‘send a message’ option). It is just not intuitive.

Problems for end users involve its uncanny ability to drop stylesheets when editing templates. Another is signing lists of imported users up to text-only e-mails even when ‘HTML’ is selected. Yes these can be worked around, but only if you know the problems exist.

On the plus side, the e-mail sending system is brilliant. You queue messages to send, and leave it to do its job. If it fails (and anything that involves a browser window staying open will fail) then you just start it again and it carries on from where it left off. It keeps track of who a newsletter has been sent to, so you can requeue a newsletter every day until the next newsletter is ready, and it will send just to new subscribers.

The ability to pick of bounces from an IMAP or POP3 mail box is also great. Each bounce will increment a counter, and subscribers can be disabled when a preset number of bounces is reached.

This system also supports tracking, counting each individual e-mail as it is opened, using an image embedded in HTML e-mails.

Another downside is the lack of APIs, making integration difficult. Signing users up to a list automatically, while they are signing up to a CMS, is impossible without some major hacking.

The problems with PHPlist stem from it being around a long time. It has grown and grown over the years, but is creaking at the seams. Everything it does is great, but the way it does it is not so hot. It is time, IMO, to throw away the old code and start again, with all the same features, an established framework (e.g. Zend or CakePHP) and it will be a killer application.

I must also add that I believe PHPlist got the data schema right from the start. Most other newsletter systems I have looked at have gone for a much more simplistic approach, that makes it much harder for them to advance their products beyond very simple functionality.

With all that it is, I do use it, and find it great at what it does. What I can’t do, howver, is roll it out to clients, because it results in nothing but hassle from the non-technical users who just can’t get to grips with its mix of high-level features, and low-level ‘black art’ knowledge that is needed.

poMMo

poMMo logo

poMMo logo

This one has to get the award for ‘hardest to find mailing list manager’. It was formally the bMail project and is hosted on SourceForge. Trying to find it when you can’t quite remember its ‘web 2.0′ name is quite an effort, but the result is well worth it.

Checking the project’s subversion repository, it does not seem there has been any activity on this project since August 2008, which is not very encouraging.

First of all, the admin screens are lovely. They are smooth, AJAX/jQuery-based, clear and well designed. The whole thing feels smooth – it works with you.

The first thing to note is that it does not support multiple mailing lists. In fact, it does not support any ‘mailing lists’ at all. However, what it does provide is a means to mail out to groups of users depending on attributes of those users.

For example, you could provide a series of checkboxes for a subscriber to select what subjects they are interested in. When sending out a newsletter, the author would send it to all subscribers who have expressed an interest in the subject of that newsletter. Now, that does sound rather like a multiple newslist system, but the subtle difference is that there is no ‘newslist’ object in the database – anything that looks like a newslist subscription is specified by the administrator, and not enforced as any kind of fundamental part of the system’s structure.

I think that approach works well. The system is simpler, and the flexibility is increased. There are a few downsides though.

The first downside is that without a newslist for a user to subscribe to, there is no place in which you can find out exactly when a user subscribed to that newslist. That means sending out additional copies of a newsletter to late subscribers simply cannot be done. Remember PHPlist knows who subscribed to what and when, and so is able to send out ‘catchup’ newsletters so a subscriber will always get the latest newsletter very soon after subscribing.

The other downside is in organisation. Without being able to group newsletters into newslists, it is much harder to provide archives organised by subject. On the other hand, the archives are there, with very easy access to the newsletters from a web-based front end.

The e-mail sending backend is both genius and a little frightening for a control-freak such as myself. It also misses a few tricks, I think.

In order to send e-mails, the system first creates a list of who to send to (in a text file, I believe). It then spawns a process through the HTTP protocal to send as many e-mails as it can in the PHP timeout period (you would set this to as long as possible). Before that script times out, it spawns another process to carry on where it left off, and so on until all the e-mails have been sent, whereupon a lock file created right at the start is released.

This is genious in that the developers have worked out a way to run a background script that can keep going even when the browser window is closed. It is frightening for much the same reason. I just don’t feel comfortable letting something like that loose on my server. It could be running all night in some kind of endless loop, and I would never know until I get irate e-mails from subscribers saying that I have filled up their inboxes.

Another minor flaw in the e-mail sending is that no record is kept of who each newsletter has been sent to. Doing that would make it very easy to sort out who has received a copy and who needs a catch-up copy sent. Following from that, without records of e-mails sent, there is no place to hang any flags to say whether that e-mail has been opened and read, so e-mail tracking is out. Clients need to know these things: how well read are those newsletters? Everyone has a master up the chain to report to, and someone up that line, often holding the purse-strings, likes simple measures of performance.

My ideal newsletter system would contain the database and functional features of PHPlist, with the administration front-end of poMMo.

Other Systems

I’ll add a few more when I get the time. I’m mainly looking at Open Source newsletter systems, aimed at sending to self-subscribed users (i.e. not bulk spam systems). Integration with an existing site or CMS is high on the priority list, along with ease of use for non-technical users. Other systems we will be evaluating are:

  • ListMessenger – Free light version, but dirt cheap Pro version with all the features you would want. This one seems to include captchas for registration.
  • Dada Mail – Old, well known, Perl-based, and it looks like a pain to install.
  • Sympa – Again, Perl. This one seems to have been designed by engineers. That is to say the system looks very robust and complete, but there is little gloss to the system. Like Data Mail, you need root access to install it, so you have to be careful about dependances with other modules on your server.

Follow me on Twitter

Tags: , ,

20th September 2008

So you’ve decided on a framework over a CMS for your web application? Me too! Here’s fundamentally why:

  1. Expandability. A job board is a complex application, and I want to be able to expand it with various modules as my time and resources increase.
  2. Flexibility. I want to know where things are going at a low level so I can better understand the system, and control the overall direction of the project.
  3. Change/Experience. I’m interested to see how a web application is designed and built from a low level, and how PHP5 and OOP can deliver it.

The Contenders

After looking through all the entries in Wikipedia, I’ve picked out the 24 frameworks roughly at the level that I’m after. The rest were either stupidly under-developed like BarebonesMVC, or were at too high a level like Joomla

Akelos, ATK Framework, Atomik Framework, CakePHP, CodeIgnitor, Fuse, Horde, Jelix, KISSMVC, Kohana, Lion Framework, PEAR, PHP on Trax, PHPOpenBiz, phpPeanuts, PHPulse, Prado, Qcodo, Simplicity PHP, SiteSupra, SolarPHP, Symfony, Zend Framework, Zoop Framework

Early Elimination

The first round eliminates those aren’t regularly updated, or have a very small community behind them. Call me superficial, but I don’t like to put my time and trust into lifeless projects. If I’m abroad, I’ll choose a restaurant that’s bustling with people over one with dim lights and empty tables.

  • Akelos: last release was October 2007.
  • Fuse: Seemingly no community behind the project.
  • Lion Framework: Small community.
  • Jelix: A fork of Copix, a french developed framework. Unsupported.
  • KISSMVC: No-one home.
  • PHP on Trax: Slow development.
  • phpPeanuts: Again, no community.
  • PHPulse: A one man project.
  • Qcodo: Admit themselves that it’s had its day. It’s now it’s slowing and looking to change direction.
  • Simplicity PHP: Less than a year old so too early to judge.
  • Zoop Framework: Sounded promising, but nearly two years without a release…

I’m going to give SolarPHP the benefit of the doubt until I’ve explored features. It’s taken three years to get to v1 alpha, but the documentation is superb and there’s a heck of a lot of output from such a small community.

Round 2

Now that we’re down to 13, it’s time to explore some of the fundamental differences between the frameworks. I need to find out what they are, and who are their audence.

High Level (application platforms)

  • ATK Framework: Designed to churn out applications very very rapidly. It calls itself a ‘business framework’.
  • Horde: This took me a while to figure out. Horde is made up of two subpackages, ‘Horde’ and ‘Framework’. Horde is the higher level one. It’s like an application hub, in that it essentially is an application that glues together other applications. Framework is a code library that’s based on PEAR.
  • PHPOpenBiz: Bit different this one. This is an XML metadata driven framework, and they claim there’s no PHP coding involved.
  • SiteSupra: Found a good descriptive word for high level frameworks – application platforms.

Low Level

  • Atomik Framework: Simple framework not using MVC, but instead uses two ‘layers’ – application logic followed by the template.

Class libraries

  • CakePHP: A model framework. Uses strict MVC and ORM conventions so it forces you to code in a certain way. Well supported and highly recommended.
  • CodeIgnitor: Another class library managed by the Ellislab company.
  • Kohana: A fork of CodeIgnitor. Main differences are that it takes advantage of strict PHP5 OOP and is supported by a community.
  • PEAR: The grandad of frameworks. PEAR is a massive library that set the benchmark for other frameworks to follow. It has been described as more of a library and lacks the cohesion of a structured framework. Each component is divided up into a separate project called a package which can lead to crossovers in functionality
  • Prado: The only event driven framework to feature here. For anyone who feels comfortable programming with Visual Basic, thise  could be PHP framework for you.
  • SolarPHP: Almost like a lightweight Zend Framework.
  • Symfony: The hardest framework to get my head into. There seems to be a lot of code generation via command line tools. Well recommended.
  • Zend Framework: PHP’s ‘official’ framework. A massive library of code. This is the only framework that I’ve tried prior to writing this article and I agree with a lot of other developers that it is one of the most flexible frameworks out there.

I’m going to dismiss one framework straight away – Atomik. This is not what I’m looking for and I can’t see a scenario where I would find it useful. In my opinion a simple framework is an incomplete framework.

All the high level frameworks are out too. I want something slightly lower level that what they offer, though I can understand the point in having them. If a client wants an application quickly and cheaply to use purely for in house purposes then I would recommend using one of these frameworks. Horde looks like an ageing giant. There are some fantastic applications that have already been released, namely IMP the webmail client. PHPOpenBiz I don’t really get, a framework for PHP developers that don’t like coding PHP? XML driven templates maybe, but XML driven logic… ATK Framework looks like the best choice for data managment applications, whereas SiteSupra is like the new Horde and can serve a variety of applications for different purposes.

Down into the libraries, or perhaps more appropriately ‘the rest of them’. PEAR is too disjointed, I want something that all fits together nicely to keep all duplication to a minimum. I’m not too keen on the idea of event driven PHP, it sounds a bit too alien, which puts Prado is off the list. And ffter looking at some of the tutorials I’ve decided not to delve further into Symfony. I don’t like the idea using command line tools to generate code and databases. The last elimination is SolarPHP. It was a simple choice between it and Zend Framework. Zend has a big organisation and community behind it, and the killer lucene search module which could be massive in my final decision.

Read part 2

18th September 2008

This article lists some of the PDF tools we have used on our server projects. They all have one thing in common: they can be driven from the command line, and therefore can be run on servers lacking any kind of GUI.

Please note that I won’t be publishing every comment that starts “I work for XYZ and we have a PDF product…” (especially if I see the same posts listed on a hundred other sites with ‘PDF’ somewhere in the title). These are mainly the open source, free or cheap tools that I have personally found useful in projects. I still appreciate any tips or recommendatios.

Most of these tools fall into the data conversion category – they convert to- or from- PDF formats.

pdf2svg (PDFTron)

Convert a PDF document to an SVG document. The command line tool pdf2svg.exe is a WIndows-only tool, but is useful for off-line conversions. It is not free, but significantly cheaper than the Adobe CS3 suite that you would otherwise have to use.

Using the tool without license inserts several watermark layers. The resulting SVG can be manipulated easily using Inkscape (a free SVG editor that every developer should have in their toolbox). The thumbnails generated from each page contain the same watermarks.

This tool will also extract all images (as PNGs) from the PDF document, which is very handy.

iText

This free Java tool is platform-idependant, and provides a few features that make it very suited to batch PDF manipulation. Features include the ability to:

  • Split up a document and manipulate pages.
  • Generate PDF content on-the-fly.
  • Fill out PDF forms.
  • Add digital signatures.
  • Create PDFs from scratch, including barcodes.

iText can also output Rich Text Format (RTF) documents. I have used it in the past to extract text from PDFs for indexing on a website, though there are lighter tools for doing this.

pdftk (PDF Tool Kit, by AccessPDF)

This is surely the lightest Swiss Army Knife of PDF tools. Features include:

  • Bursting into single-page PDFs and recombining.
  • Encrypting/decrypting.
  • Inserting and extracting form data (for older style forms, though Adobe seems to have changed the way forms work in later versions of the PDF format, so that extracting from filled forms is no longer straight-forward).
  • Extract and manipulate metadata.

PDFBox

This is another free java library, with features including:

  • Extracting text from a PDF (for indexing, e.g. with Lucene or mnoGoSearch).
  • Manipulating pages (inserting, extracting, reordering).
  • Filling and extracting form data (PDF version below 1.6) using FDF and XFDF data files.
  • Creating images from PDF files – good for thumbnails and creating ‘page flipper’ applications.

To use the text extraction in a search engine, I use the following shell script to wrap it all up:

#!/bin/sh
# Convert a PDF document to text
# Usage: $0 [OPTIONS] <PDF file> [Text File]
#   -password  <password>    Password to decrypt document
#   -encoding  <output encoding>    (ISO-8859-1,UTF-16BE,UTF-16LE,...)
#   -console    Send text to console instead of file
#   -html    Output in HTML format instead of raw text
#   -sort    Sort the text before writing
#   -startPage <number>    The first page to start extraction(1 based)
#   -endPage <number>    The last page to extract(inclusive)
#   <PDF file>    The PDF document to use
#   [Text File]    The file to write the text to

PDFBOX_BASE=/usr/local/lib/pdfbox

export CLASSPATH=$PDFBOX_BASE/external/FontBox-0.1.0-dev.jar:$PDFBOX_BASE/lib/PDFBox-0.7.3.jar
java org.pdfbox.ExtractText "$@"

This assumes that PDFBox has been installed under /usr/local/lib.

SWFtools

Although this toolkit is primarily about SWF files, it does have some neat PDF to SFW conversion scripts. Versions are available for Windows and Linux under an Open Source licence.

Notes

One of these tools also extracts images from PDFs, which can b every useful when converting PDF to HTML formats.

There are a number of PHP tools, other libraries (e.g. Image Magick) and more heavy-weight tools (e.g. Ghostscript) that I will cover later. Hopefully this selection will help in the meantime. If you have any further suggestions, I would love to hear of them. Even if a tool duplicates much of what these do, it only needs have do one extra feature that the others don’t cover to be worthwhile using.

Follow me on Twitter

Tags: , ,

16th September 2008

I’ve figured I need to take a step back and re-analyse exactly what it is I’m looking for. There are now a lot of CMSs and frameworks in the market, all boasting about exactly the same. The most irritating claim of all is:

“Framework/CMS XXYY is designed to make life as easy as possible to develop your robust, user-friendly websites”

… REALLY? NEVER!

The time has come where some investigation is needed to separate the men from the boys. What do these features mean? How can you tell straight away that a CMS/framework isn’t for you? Guess what – I’m going to find out!

Step 1 – CMS vs Framework

The first question you need to ask yourself is do I want to use, A – a framework, or B – a content management system. Distinguishing between the two will half your list of altenatives straight away.

Frameworks

A framework is library of code that you tie together. It’s like another layer that sits on top of PHP and helps to speed up the development of an application. It provides common snippets of code that you are likely to use within your application, allowing you to focus your time and energy on the really bespoke stuff.

If coding was a piece of cake… a framework would be the ingredients. You’d bake and ice it.

Content Management Systems (CMSs)

Content management systems are exactly what you’d think them to be – they’re there to manage content! Essentially these are applications that are already written for a purpose. They’re the next level up from frameworks. A lot of these systems provide APIs and hooks that allow developers to write modules/plugins to extend them. The trade-off is that they’re often not as flexible and as frameworks e.g. you may be restricted to using an specific technique/architecture.

If coding was a piece of cake… a CMS would be a Victoria Sandwich. You can decorate how you please and add some filling if needed.

Which should I choose?

Frameworks are for:

  • Standalone bespoke applications
  • Application rich websites

CMSs are for:

  • Standard websites
  • Open ended websites where the requirements haven’t yet been released
15th September 2008

So, I want to stream my own video, from my own server? Why would I want to do that? Control, I guess. It cna be argued that there are enough sites around already that can host videos for you (Youtube, Flickr, et al), and it makes sense to eat into their bandwidth rather than your own, but let’s just see – can it be done using Open Source tools?

Enter Red5

Red5 is an Open Source Flash video streaming application. It is written in Java and serves the same purpose as the very expensive Adobe Flash Media Server.

Before I take it for a test run, it is worth looking at the Red5 hall of fame. Noteworthy is openmeetings, which – if it supports video – could fill a niche that clients often ask for.

In the next part I will install the server and run some tests.

————————

The next part is going to take a little longer than I expected. The installation instructions start with “download the source code, you will need Eclipse too…”. So my idea of ‘just installing, trying and tweaking’ goes out the window at the first step. It may take an hour to set up, it may take three days – that’s the way with Linux and Java, and I just don’t have the time to take a risk just to see how well something is going to work.

If anyone knows of a packaged version of Red5 that I can simply drop onto a non-GUI Linux server and run, I’d love to know.

————————

But Do I Really Need Streaming?

This is the point at which a realisation that I have been chasing the wrong horse hits me. I have been imaginging that YouTube is streamed to the browser. It turns out it is not. It is delivered in pseudo stream through HTTP, which is probably one of the main reasons for its success, since HTTP gets through firewalls without a thought.

So, to deliver Flash videos from a website, and allow the user to fast-forward to any point in that video, only a simple PHP script is needed. The PHP script will start streaming, byte-by-byte, the Flash video file from any given point. The Flash player then handles the rest; it just makes GET requests: give me the file X starting at byte Y; keep going until I break the connection.

The xmoov-php script here can deliver this stream, and there is an associated xmoov FLV player that hooks up to it. At the time of writing, the player has not been released, so it is difficult to take it for a test-run. However, the JW FLV Player supports the same protocol and that is available under an Open Source licence for non-profit organisations (though check out the definition of ‘non-profit’ before assuming your organisation counts as one), or under a commercial licence for a small and reasonable fee. As a bonus, the player now supports true RTMP streaming, as provided by the Red5 server.

Having tried to get this player working, I am having some Java troubles am waiting for a response to my bug report.

————————

‘bug’ fixed: it turns out the FLV file needs metadata added to it to list where all the keyframes are. The FLVMDI tool does that nicely by scanning the file and inserting the metadata. Once the FLV video was post-processed to add this metadata, the whole pseudo-streaming worked like a dream.

The demo can be seen here. Try to random-access parts of the video; it works. What would be a nice addition would be for the player to cache each downloaded section and be able to piece them together. At the moment it will cache just one segment of the video, and you can seek to any point within that cache, but if you click outside of it (i.e. after the current loading point or before the segment starting point) then the cache is thrown away completely, forcing the player to start downloading – quite possible – the whole segment again.

————–

See also ffmpeg FLV converter.

Follow me on Twitter

Tags: ,

Page 1 of 212