PCI DSS compliance and spaghetti code, Part 1

Anyone taking a substantial amount of credit card payments will eventually stumble onto the Payment Card Industry Data Security Standard (PCI DSS). Meeting this standard likely adds significant complexity to your application, comes at fairly great expense, and will leave your developers and sysadmins with at least a few head-scratching moments where as they ask themselves "Why do I have to do this? It has nothing to do with security." The project I'm working on has been audited, but the final paperwork is not in yet. Things look good, but pass or fail, there's still plenty to learn from the experience already.

Minimizing the PCI DSS burden

My employer's web application has a fundraising module which needed to become PCI DSS compliant. There are many things we considered when deciding how to modify our fundraising module. Here are a few of the larger concerns:

Hardware which handles credit card numbers is subject to audit (these systems combined are called the Cardholder Data Environment, or CDE)
CDE hardware changes must be documented
CDE code changes are subject to audit
CDE code changes must go through a documented security-focused peer review
Quality testing must be documented for all CDE code changes
There are restrictions on who is permitted to deploy CDE code

As you can see, maintenance to a CDE has lots of extra overhead. It didn't make sense to add that to overhead anything other than our fundraising module so we decided to separate it from the rest of our application code as well as serving it from its own cluster of web servers. This solution also reduces the attack surface area of this sensitive part of the application.

We would obvious have to make some changes that would cost us this extra overhead but we realized we could keep that cost low by reducing the need for change within the CDE code. To do this we tried to keep the CDE code... well... as dumb as possible. Our application has many customizable types of fundraising forms used in various workflows, but we don't want the CDE code to know about that. That stuff is complicated. It changes, grows, and has a higher bug potential than something smaller and simpler. All that complexity should continue to live within our original application which the CDE code can communication with via a simple API. This sounds great, but how do we rip our application in two?

Plotting the course

Segmenting our application this way had huge implications for our existing fundraising code. The fundraising code had feature after feature piled on over 5 or 6 years. The user workflow stumbles through a 5000-line top-down PHP file of spaghetti code. Eventually other "helper" functions and classes might output to browser, redirect, or charge a credit card and then terminated the request with an exit() call. Eww! This code was so complex that it was hard to get a good idea of what all of it did. Without structure or comments it was unreadable and there was no test coverage.

When I first approached this problem I looked at in a very linear way. This is roughly how I saw the workflow for a basic fundraising page with all valid data submitted and no credit card processor errors:

User opens fundraising page on web server (not a CDE web server)
User submits form which posts data to CDE web server
CDE code pulls out credit card number and other sensitive data and validates it
CDE passes the validation results and non-sensitive data to a validate function on the web server API
Web server validates the rest of the posted data using some validation extracted from the spaghetti code
Validation passes so some other extracted code decides to return a "sale command" to the CDE (see Command Pattern)
CDE executes the command
CDE calls web server API telling it the transaction was successful
Web server uses some extracted code to determine if the user is to be shown a "thank you" message or redirected to a "thank you" page (each of which could be different depending on the workflow the user was executing)

All this code extraction was going to be a real pain. The validation and form building was tightly coupled using an old Pear library called QuickForm. The various user workflows were tangled together and extremely overcomplicated. Making a few huge tears in the middle of the application seemed high-risk given our relatively short timeline.

A-ha!

I soon realized this wasn't necessary. I didn't need to do all this extraction. I didn't need to rip this thing apart with a few huge, dangerous tears. What I needed to do was encapsulate those end points which triggered the output of data to the user's browser, a user redirect, or charging a user's credit card.

These encapsulated end points form Command classes which can be serialized and returned to the CDE in an API response. These commands are simple. Its not important what kind of page you are outputting, what kind of page you're redirecting to, or what kind of fundraising charge you're doing. The CDE code doesn't need to know. The CDE code just know how to handle certain sensitive fields, perform a credit card transaction, and execute these generic commands. All other decision making is delegated to the web server API.

For our first iteration I created these command classes and took all that structureless top-down spaghetti code and tossed it into a single class method (we'll call this class a Controller). I extracted global variable usage so their values could be passed in through the Controller's constructor. I also extracted the validation of the sensitive data and had it run outside the Controller with its results injected into the Controller (as would eventually happen with the validation running on the CDE web server). I used several legacy code refactoring methods described in one of my favorite programming books, Working Effectively with Legacy Code. These changes produced something I could release immediately.

Continue to Part 2!

Exceptions as part of "regular" control flow

I've heard this rule a lot: "Never use exception for control flow." Its an interesting statement to parse. We know what an exception is, but the definition of control flow is a little fuzzy, so lets clarify things a bit.

In computer science, control flow (or alternatively, flow of control) refers to the order in which the individual statements, instructions, or function calls of an imperative or a declarative program are executed or evaluated.

http://en.wikipedia.org/wiki/Control_flow

Immediately I'm confused. Is it possible to use an exception and not effect control flow? Nope. Exceptions are a means of controlling flow. Sometimes people making this point specify that its "normal" or "regular" control flow. "Normal" and "regular" are delightfully relative and unhelpful. Normal? Compared to what? What they're trying to get at is that they don't believe exceptions should be used unless there is an application error.

This seems odd considering that business layer classes aren't necessarily made for only one code path or even one application and therefore lack context to determine the exceptionality of the condition. I touch on this a bunch in my Exceptions vs Null article.

There are lots of "crazy" things you can do with exceptions and if you're interested in seeing more check out this article on c2.com. Here I will only be discussing three usages I found particularly interesting.

Exceptions as return values

This example comes from c2.com. I think even people who have never thought deeply about exception usage would never dream up this code. Still, I believe this might be the perfect example of what people are talking about when they say not to use exceptions for control flow.

void search( TreeNode node, Object data ) throws ResultException {
    if (node.data.equals( data ))
        throw new ResultException( node );
    else {
        search( node.leftChild, data );
        search( node.rightChild, data );
    }
}

Get it now? As a starting point I think we can all agree that this it batty. Its a search algorithm that throws an exception upon success. Really? The article correctly points out that this is a violation of the Principle of Least Astonishment. I know this code make me feel violated.

Exceptions as commands to the caller

} catch(MyService_Exception_CouldNotBeReached $e) {
    throw new MyOtherService_Exception_Retry("Couldn't reach my service, retry!");
}

This exception seems to be commanding the caller to retry something. Like the previous example, this also breaks the Principle of Least Astonishment. In order to benefit from the full functionality of the method you need to be setup to catch exception commands that it throws and follow out some other action to continue. This is a application-agnostic service making application-specific decisions (whether or not to retry). No thank you.

Exceptions as loop termination conditions

c2.com offers us another gem, and I don't mean that negatively.

try {
    for (int i = 0; /*wot no test?*/ ; i++)
        array[i]++;
} catch (ArrayIndexOutOfBoundsException e) {}

The first thing when I thought when I saw this was "What about things like Python's StopIteration?" One line later its mentioned. Yay! This article may be reading my mind. I do wonder if the fact that StopIteration exist in Python, Ruby, and Javascript now might start to carve away at exception/null failure/absence debate. I'd like to know more about the decision making that went into the the design of feature but so far have failed to find any discussions on the matter.

All that said, I'm not sure how I feel about the name - "stop" iteration. Sounds like the service commanding its caller. If I were to use a exception-terminated loop I'd prefer to explicitly specify the exception type rather than using a language generic exception with a name that is a command. Maybe something like this (which I would not be surprised to find already exists somewhere):

until(RecordNotFoundException $e) {
    print $recordSet->getNext()->getLabel() . "\n";
}

Exceptions vs Null

Plenty of developers agree that returning mixed-type results is not a good practice. It leads to conditional statements wherever the method returning the result is used. Everbody agrees thats bad, and then Null walks in, and we're not sure anymore.

Null is suppose to be magical value which can passed as any type of object and represent "no object". Usually methods return Null when the thing we want to get is absent, and that absence is considered normal for our particular application. Maybe the caller asked for a row that doesn't exist in a table or the next line of a file when you've already reached the end.

If the method you are calling may return Null then you must (almost?) always check for Null. Isn't this why we dislike mixed-type return values? This is only the beginning of the problem with Null.

Personally, I've chosen this rule: "In OOP never return Null and instead always use exceptions".

Absence vs Failure

Those whom I disagree with seem to draw the line at "absence versus failure". They interpret the absence of something as being unexceptional and return Null. The failure to be able to do something within a method call is seen as being exceptional so they throw an exception (or maybe they return some other magic value because they think they're mother-fuckin' David Blane, and by that I mean talentless and insignificant).

This excitingly-named article - When To Use Exceptions - asks one question that seems rather compelling and butts heads with the "absence versus failure" rule:

"Who exactly are [library programmers] to decide that not finding [something] is a non-exceptional event for my application?"

I like everything about that statement including the touch of anger. Business layer classes, or libraries, shouldn't be making application layer decisions. Its like a business layer class triggering a fatal error rather than throwing an exception. I determine whats a fatal error, not the tools I'm using to build with. I always separate my application layer from my business layer, but until recently never considered determining exceptional conditions as part of that division.

Suppose we had a web app for managing blog entries. The app chooses a search-engine-friendly /english-words-dashed-together/ URL for the blog post based on the blog post title you entered. This gives us two obvious times were we want to lookup URLs:

We want to use the URL in a new blog post but need to make sure its not already in use (absence is not exceptional)
A blog post URL has been requested and we need to find the blog post and serve it (absence is exceptional)

Both of these scenarios could call upon the same method - $blogPostGateway->getPostByUrl($url). The exceptionality of this scenario depends on the context within the application, yet, as a business class, this gateway knows (or should know) nothing about the application which it lives in.

This does not mean that a method call lacks any context. There are still post-conditions for the method even absent an application context. A method call's post-condition is that it returns a value of a pre-specified type. If an object of the proper type cannot be returned because of absent data then we have an exceptional case. We don't need a new tool, Null, to sit in place of that object because we already have a tool to say "I failed!" - an exception.

Null = repetition

Suppose $blogPostGateway->getPostByUrl($url) does return Null. Now the caller is left to interpret these magic values. Should callers be left to duplicate those conditionals everywhere? That's not good design.

Preferably the "absence scenario" here would be handled on the blog post gateway in a method like $blogPostGateway->blogPostExists($url) which returns a boolean. No Null. No new exceptions necessary. This decision not to use Null also forces the developer to make the right choice and write these extra methods which check for existence.

Whats the difference, really?

This issue is seen by some as a matter of style. Exceptions with try/catch blocks seen as being identical to Nulls with if/else blocks. Is there a difference? I think so the previously mentioned points are huge, but there are functional differences as well.

Exceptions can be handled immediately, eventually, or allowed to go uncaught and fail fast. Null values may not cause problem until much later in a program's execution. That means cryptic error messages and more tracing to get back to where the Null originated.

Is there anyone else who knows a lot about Null that doesn't like Null?

It seems the guy who invented it calls it his "Billion Dollar Mistake".

Summary

Don't use Null. Don't. Do not.

« Page 3 / 4 »