Wednesday, December 3, 2008

User always chooses the wrong path - on data flowing through your code

Software development is so much fun



Developing the code is a very cool activity. Because of how creative and limitless it is and because of instant rewarding nature. You figure out what needs to be done - you model it, write it down, and execute almost immediately. Caboom! New shiny '4' is produced in your console as the answer to your '2' and '2' input. You just did an amazing piece on integers adding. You can move on to the next adventure.

And then software development suddenly is mundane


But then, can you really? I would suggest to try '3' and '2' inputs as well to make sure that your algorithm is adding (we expect '5') and not multiplying (which would get us to '6' and a bit of frustration when trying to claim money back from our savings account) or even better with '3' and '4' to make sure that it doesn't add '2' to the first argument. This for some time can be cool experience - as you are still playing in 'what I could’ve missed' game, but far before we get to '2147483647' + '1' the boredom sneaks in and just kills all the fun.

What makes the whole thing worse is that you are locked in what you know about the code you've just written, and you haven't managed to develop your brain significantly since then, so you are locked in the same mindset with all limiting consequences.

Bad news is that we have to do it and there is no way around it, good news is that there are tools and techniques available to shortcut it. Today I would like to talk about one of them called 'data-flow analysis'.

Data flow analysis - what is it about?



So what is it - it is a static code analysis in a sense that we don't run our software to get results, but what it is trying to do is to mimic potential paths through the code in search of some specific path or data patterns which are for some reasons interesting. When you think about it - this approach is much more powerful than just testing some of the paths - here we have all of them analyzed.

The idea is: let's assume we can collect all possible paths through our application and then let's define subset there which would collect 'something went wrong' paths. Extremely powerful idea - if entirely realizable, it would be equivalent to testing all possible inputs and conditions. And the world would be a different place, where software is cheap. And big part of software developers would be selling coffee in Starbucks.

Data flow analysis - how useful it is?



Unfortunately for software users and fortunately for software developers and currently selling coffee wanna-be-actors it is not entirely possible. Calculus required is too complicated, space of all possible paths and inputs too big to control. Does it mean it's useless? Not at all.

It's actually extremely useful and commonly used - the trick is to limit the 'all paths' set by imposing maximum path length, size of all possible within path transitions etc. And we still can get extremely valuable results - these algorithms know transitions or path segments which you usually don't anticipate - like for simple setups which lead to raising exception from standard functions, rare paths which lead to leaked memory and resources, weird user scenarios which lead to pumping your collections with excess of data. Running your code through such algorithms is actually an eye-opening experience - there is so much you haven't anticipated getting your 'man of an hour creativity reward' earlier this day.

And remember - no matter how weird and rare these paths seem to be, these are the very ones people will follow as soon as they start using your software. Users are vicious when it comes to using our software - they don't add '2' and '2', they just keep adding whatever they fancy with no respect to the inputs we test it against. Unless you know how to change this behavior, data flow analysis can help you 'prove' your software (in a limited but still powerful way).

Use it.

No comments: