Thursday, July 22, 2010

Faking PHP functions

Usually when I want to fake some kind of functionality provided by PHP, I create a new library that maps all that crazy API stuff baked into PHP to something that makes more sense. Maybe it even uses exceptions rather than -1/0/false/null. Holy shit. I almost never write something that did a straight pass-through - no parameter changes, method signature identical to the underlying function. I'd find myself spending time and thinking a lot about this library I was building. So PHP does it “wrong”, what does “right” look like? This can be a time-consuming exercise, but these kinds of libraries can get reused a lot. Writing tests for what you've done can be a problem as well. One of these problems is the inspiration for this post.

Here's a dumbed-down version of some code I was writing for a MySQL class (Yes, I'm actually writing one of those... in 2010... really!):


class Mysql {
public function execute($query) {
$result = mysql_query($query);
if($result === false) {
if(mysql_errno() === self::LOST_CONNECTION) {
throw new MysqlLostConnectionException();
} else {
throw new MysqlException();
}
}
}
}


How do I get mysql_errno() to return the code for a lost connection? I should write a class to wrap this MySQL functionality... oh wait... I am. I needed something much simpler. Enter "THE SIMPLEST CLASS EVER!!!"...


class PHP_Functions {
public function __call($phpFunction, $args) {
return call_user_func_array($phpFunction, $args);
}
}


Now we can rewrite the other code by taking the PHP_Functions object in the constructor of our MySQL class and modify two other lines...


class Mysql {
public function execute($query) {
$result = $this->_phpFunctions->mysql_query($query);
if($result === false) {
if($this->_phpFunctions->mysql_errno() ===
self::LOST_CONNECTION) {
throw new MysqlLostConnectionException();
} else {
throw new MysqlException();
}
}
}
}


In our tests its now easy to mimic these MySQL error conditions. We just have to mock our PHP_Functions object.


/**
* @expectedException MysqlLostConnectionException
*/
public function testShouldThrowLostConnectionException() {
$phpFunctions = $this->getMock('PHP_Functions',
array('mysql_query', 'mysql_errno'));
$phpFunctions->expects($this->any())
->method('mysql_query')
->will($this->returnValue(false));
$phpFunctions->expects($this->any())
->method('mysql_errno')
->will($this->returnValue(Mysql::LOST_CONNECTION));

$mysql = new Mysql($phpFunctions);
$mysql->execute('SELECT 1');
}


Generally speaking, if someone shows me a magic method like __call() I find myself unable to reach their heart through their throat but I keep trying. My arms are big and their throats are small. It just doesn't work... I keep “breaking” these people. Anyways, I'm not a fan. I haven't found them to be the least bit useful except for adapting some legacy code (like I am sort of doing in my example) and maybe making a Decorator base class if working with code with no type hints (I don't like those systems either).

There are other options in this case if you find yourself more upset about magic methods than I am. You can move calls to the MySQL functions to protected methods and override them in testing to provide mock functionality. I use this trick a lot when getting legacy code under test (See Working Effectively with Legacy Code. Excellent book.). Write a class like below. Then, for your tests, subclass this, override the methods to provide mock functionality, then run the test on the subclass you created.


class Mysql {
public function execute($query) {
$result = $this->_mysql_query($query);
if($result === false) {
if($this->_mysql_errno() === self::LOST_CONNECTION) {
throw new MysqlLostConnectionException();
} else {
throw new MysqlException();
}
}
}

protected function _mysql_query($query) {
return mysql_query($query);
}

protected function _mysql_errno() {
return mysql_errno();
}
}


I had several cases where I needed to mock consecutive calls so it just seemed like PHPUnit was ready to go with that, so I opted for that route. Either approach would work well.

Wednesday, July 21, 2010

Code smell: Too many private methods

I mentioned this in my last post so I figured I'd add some examples to clarify. Consider the following code:


class Cache {
public function set($key, $data) {
if($this->_useMemcache()) {
$this->_setInMemcache($key, $data);
} elseif($this->_useFilesystem()) {
$this->_setInFilesystem($key, $data);
} else {
$this->_setInMemory($key, $data);
}
}
}


Its part of a class that would be used for caching data. If you had a view of all the methods on this class there would be groups of similar methods: those with "memcache" in the name, "filesystem" in the name, and "memory" in the name. On any given operation when one "memcache" method would be used other "memcache" methods are used with it. The same would be true for the "filesystem" and "memory" methods.

If this class were around for a while eventually you'd find yourself saying "Damn, I want to reuse this functionality for writing to memcache, but I don't want the whole Cache system."

If this class was built without tests and you ended up with the task to add them you'd be saying "Holy fucking shit of shit, this is fuckin hard to test" in a Lewis Black voice as you wildly shook from side to side. This is your Minnesota winter (google it).

What we really want is separate classes for caching in memcache, the filesystem, and in memory. You could probably wire these together using a Chain of Responsibility patttern. The Chain of Responsibility essentially lets you create a series of fallbacks.

Client: "I need to cache this data."
Memcache: "I don't want to cache this data but I know someone who might. Do you want to cache this data?"
Filesystem: "I'm watching my show, fuck off. Memory, get the fuck in here? Cache this for me."
Memory: "I have no feeling of self-worth and will grow up to be an unsuccessful empty shell of a man, so I may as well comply. OK, filesystem, I'll cache it."

Anyways, once you understand the Single Responsibility Principle you'd never write this Cache class in the first place. Its obviously flawed. Clear as day. Sometime you don't find out until you're already coding though.

The less obvious case



About a year and a half ago I was writing a class that could read and write language translations from a excel file. The class was going to be used to import it into a database or export what was in the database into a spreadsheet. I already had an interface I used to access translations from other sources so I started implementing it. I used a 3rd-party library, PHPExcel, for traversing the grid, reading, and writing.

I was learning PHPExcel's API as I went. We also had somewhat complex schema that the spreadsheet needed to fit. There could be a variable number of columns depending on how many languages were translated, and any number of rows because new strings were always being added or removed.

I got it working but was left with lots of private methods that did what I needed to conveniently on the spreadsheet. I had a good tool to work with (PHPExcel) but I needed more. I should have had a class which used PHPExcel but traversed the data in the way I needed.

Since then our translation files have grown and we found that PHPExcel is a memory hog. Exporting a few hundred (maybe up to a thousand) rows takes several minutes. I've profiled the code and the problem is PHPExcel (FWIW, I hear there are some newer features in later versions that may fix this). Unfortunately, if we wanted to replace PHPExcel with a faster library we have to tear apart the class I wrote because its all cooked in there, hiding complexities in private methods.

Had I separated my concerns properly replacing the underlying Excel reader/writer would be simply a matter of writing a new class with the interface needed to work with my translation importer/exporter class.

Tuesday, July 20, 2010

Private visibility in open-source

The statement in question: "Private has absolutely no useful role in open-source code."

There's already lots of drawn out discussion about this so I'll keep it short.


  • Lots of private methods are bad, but not because of OSS. Its a code smell that usually means too many responsibilities (thereby violating the Single Responsibility Principle). If you don't believe me look at the classes in your code base with the most private methods - ugly right?

  • Private methods are a good way to keep methods short by dividing up work into small well-described chunks. These well-described methods and can be used in place of more heavy-weight documentation in some, but certainly not all, cases. I haven't seen any large scale project with perfect documentation, so its important that the code speaks to you. Self-describing code won't degrade like documentation. (Excellent examples of this in the book Clean Code)

  • Code that relies on the internals of 3rd-party code is fragile. If the package author changes the internal implementation of a class that you depend on then your shit is going to break. Technically speaking, of course. If you want to extend code for other OSS projects use inheritance or a form of delegate to use the underlying package without having to modify it or depend on its internals.

  • Plenty of successful programming languages do no have private methods. It can be done. It just requires discipline and technical expertise.



At the end of the day its almost a silly point. Public or private, when talking about these internal-use-only methods programmers should avoid mucking with them. What are you trying to get at anyways? If its something significant it should be extracted into a new class which is in turn made a dependency of the original.

Why I'll never recommend the "Universal constructor" pattern

I read the PlanetPHP feed every day and honestly it usually leaves me scratching my head a bit. Recently I saw a post referencing the "universal constructor" pattern in the PHP framework, Solar. I hadn't heard this pattern name before but I've seen the code a million times looking through other people's code (and maybe even my own 3+ years ago). The people who use it generally think its the best thing ever. It solves that way-too-many parameters problem. In my experience, however, it almost always indicates poor design elsewhere.

The pattern is simple. Constructors take only one parameter ever. That one parameter contains a hash of option/setting pairs (configuration). Here's an example of where you might see something like this...


$userImporter = new UserImporter(array(
'enable_deduping' => true,
'email_admin_on_failure' => true,
'import_batch_size' => 1000
));


Code that uses this pattern is notorious for breaking the Single Responsibility Principle. Breaking this principle leads to tightly coupled, hard-to-test code. Its fairly obvious problem to catch when you think about it. If someone showed you a class with a constructor that had maybe a few required parameters then 10 optional ones you'd probably run for the hills.

My example class apparently reads some sort of importable data structure, knows how to interact with the data's destination, dedupes users, breaks work into batches, sends alert emails, and probably nailed Jesus to the cross.

This approach appears to solve one problem: it gets rid of that long optional parameter list which leads to code like so...


class UserImport {
public function __construct($enableDeduping = true, $enableEmailOnFailure = false, $batchSize = 1000) {
// ...
}
}

$userImport = new UserImport(
true, // default
false, // default
2000 // custom setting!
);


You're repeating defaults all over the place to get to that last optional value. You can never remember which parameter it is without looking. Is batch size the 5th or 7th param??? Guess I'll go read the code AGAIN.

We can fix this with a better API. Unfortunately the the UserImport example I am giving fails at this point. These parameters shouldn't live together because there are too many responsibilities that correspond to. Here a slightly modified example that might be a scenario where you'd actually use setters for your optional parameters.


class UserImport {
public function __construct(Framework_DatabaseConnection $databaseConnection) {
$this->_databaseConnection = $databaseConnection;

// implements logger interface but does not write logs or actually do anything else. this is my default logger value.
$this->_logger = new Framework_Logger_Null();
}

public function setLogger(Framework_Logger $logger) {
$this->_logger = $logger;
}
}


In this case rather than having an optional logger in the constructor with default value I just set the object's logger to the default, a null object called Framework_Logger_Null. If you want to supply a real logger just use out setLogger setter method.

In summary, constructors take required parameters. Setters take optional parameters.

Next, don't think of using types here. Can't be done. You're writing all your validation in PHP. You could write a base class for that functionality and waste the one chance of inheritance PHP gives you on that. You could couple your class to some validation library you instantiate statically (using "new") or using static method calls. Granted you're on the hook here for scalar types anyways in PHP since it does not offer types for integers, strings, and booleans for example.

In a project like Solar you probably always have good documentation that's kept up-to-date, but for your home-grown PHP project that almost never the case. These things over time always end up with new configuration options added making them even more complex and the documentation more out-of-date. What I am getting to is - how do you know what options are available? Manual documentation works but decays over time. Auto-generated documentation isn't going to help you. Users could just read the implementation of your class to figure it out (I see this way too much). Essentially you are hiding how to use your class. This is true both when you are using these configuration hashes in the constructor and also a method parameter.

The easy way to expose the functionality your classes offer is by exposing that functionality through the public API.

Lastly, although this is on a required symptom of this pattern, I almost never see dependency injection used with this pattern. People don't want to mix simple configuration settings with dependency management because this pattern is suppose to be all about convenience, and how convenient is it to instantiate all kinds of dependency objects when you're creating this object in some client code? Not very. The general solution Universal-constructor people use is to just instantiate dependencies right in the same class, thereby tightly coupling it to those implementations. There are better ways to handle dependency management but thats another post for another day.

Monday, July 19, 2010

Configurable return types kill puppies

Here's some code you could see using PHP's Adodb library. The library isn't the point but its demonstrates a pattern I see a fair amount.


$oldFetchMode = $conn->setFetchMode(ADODB_FETCH_ASSOC);
$recordSet = $conn->execute("SELECT * FROM table");
$objectThatDoesWork->work($recordSet);
$conn->setFetchMode($oldFetchMode);


To me this is a multi-fail.

1) If an exception is thrown before the fetch mode is reset it doesn't get reset unless you catch it. That can add up to a lot of boiler-plate try/catch blocks.

2) If the next call to execute() doesn't set the fetch mode first in order to guarantee you get the return type you expect bad things can happen. I've seen plenty of code that doesn't explicitly set the fetch mode every time and it works... until it doesn't. Code somewhere else in your system changes the fetch mode. The error is coming from code that source control tells you hasn't changed recently. Your tests prove to you that the code works. Your query logs show you that the query you expected to be run was actually run. It sends you in the wrong direction every time.

3) Setting the fetch mode every time with this long method and verbose constant adds a bunch of boilerplate.

4) $objectThatDoesWork->work() should also have its first parameter be typed or you'll only know something went wrong when it tries to access that first row the wrong way rather than failing immediately. I'd prefer it to fail fast (pdf).

At first glance it could look flexible and convinient. You don't need to set the fetch mode ever in theory. You can just use the default, and that's pretty simple right? I'd be happy to rip this method out and pretend it never existed, but it does and its scaring the kids.

Luckily, there is actually a easy enough solution to this problem. Take a look at this rewrite.


$recordSetFactory = $conn->execute("SELECT * FROM table");
$objectThatDoesWork->work($recordSetFactory->asAssoc());


$recordSetFactory has the underlying record set resource stored privately.
$recordSetFactory object might have different ways to access the data such as asAssoc(), which would return an iterator where the current row is returned as an associative array. It could have another that returns an iterator which returns row data with numeric indexes (asList() maybe). There are plenty of ways to access this data conveniently.

Best of all that configuration is explicitly requested every time. Its used and it gets thrown away when the object's lifecycle ends. Done and done.