Tech Trained Monkey

Everyday Problem Solvers

Category Archives: General Development

Analysing success

As the 1940’s air war in Europe intensified, the Allies faced a major problem. Their bombers would leave England by the hundreds, but too many of them didn’t return, brought down by extremely heavy enemy flak. The Allies desperately needed to beef up the armor on their planes to provide protection, but armoring an entire plane, or even an entire cockpit, involved far too much weight. How could they choose the few especially vulnerable places to be armored?

A couple of clever engineers solved this problem with a counter-intuitive analysis. After comprehensively logging the locations of flak damage inflicted around the fuselages, engines, and cockpits of planes returning from hundreds of bombing runs, they calculated Read more of this post

What is spaghetti code?

One of the easiest ways for an epithet to lose its value is for it to become over-broad, which causes it to mean little more than “I don’t like this”. Case in point is the term, “spaghetti code”, which people often use interchangeably with “bad code”. The problem is that not all bad code is spaghetti code. Spaghetti code is an especially virulent but specific kind of bad code, and its particular badness is instructive in how we develop software. Why? Because individual people rarely write spaghetti code on their own. Rather, certain styles of development process make it increasingly common as time passes. In order to assess this, it’s important first to address the original context in which “spaghetti code” was defined: the dreaded (and mostly archaic) goto statement.  Read more of this post

I’ve inherited 200K lines of spaghetti code—what now?

kmote asks: I am newly employed as the sole “SW Engineer” in a fairly small shop of scientists who have spent the last 10-20 years cobbling together a vast code base. (It was written in a virtually obsolete language: G2—think Pascal with graphics). The program itself is a physical model of a complex chemical processing plant; the team that wrote it has incredibly deep domain knowledge but little or no formal training in programming fundamentals. They’ve recently learned some hard lessons about the consequences of nonexistent configuration management. Their maintenance efforts are also greatly hampered by the vast accumulation of undocumented “sludge” in the code itself. I will spare you the “politics” of the situation (there’s always politics!), but suffice it to say, there is not a consensus of opinion about what is needed for the path ahead.  Read more of this post

Why integers are lousy primary keys

When I was a noob (about 2 years ago) I took a freelance gig to develop a simple system for a small shop. Very basic stuff. Few weeks ago the owner called me saying that he was very happy but he was facing a problem… You see, his business grew and he now has 3 shops, all of them using my system, and he wants to consolidate on one database. Well, as you can imagine I only managed to accomplish this by replacing the primary keys. They were all integers, now they’re GUIDs. The reason? Integers clash very easily!  Read more of this post

About commenting code

I will not lie: I always had a beef with literature and non technical text interpretation. Dont get me wrong: I’ve never misunderstood a math problem text, or a chemestry-phisycs one, but when the literature teacher would say that the poem was about the solitude and the seek of self I was always like “WTF?!? where did she took that from?!? this poem is about a man who enjoys long walks in the desert!!!”. Some say that I might be a little too literal and stuff, and I disagree! I’m practical, rational, precise, objective and I belive that if you want to tell the world about “solitude and seek of self” you should do it clearly and precisely, and not by writing a 200 foot long poem that tells the story about a man walking in a desert.

And yet no one expect to find something like this on poems:

Roses are red

Violets are blue

// Here I’ll define the meaning of life

I don’t live for myself

// And now a recursive definition of life

I live for you

I know, very Shakespeare-like… The point is: Why should a programmer write comments on his own code?!? If something was very hard to figure out and program, it should be very hard to understand and maintain! Right?!?

The “-2000 lines of code” report

First of all I’m sorry I’ve been so absent. I’m working on something of my own and I hope I can make it stable and working properly so I can talk about it here.
Meanwhile I read a nice post and I’m reposting it here:

In early 1982, the Lisa software team was trying to buckle down for the big push to ship the software within the next six months. Some of the managers decided that it would be a good idea to track the progress of each individual engineer in terms of the amount of code that they wrote from week to week. They devised a form that each engineer was required to submit every Friday, which included a field for the number of lines of code that were written that week.
Bill Atkinson, the author of Quickdraw and the main user interface designer, who was by far the most important Lisa implementor, thought that lines of code was a silly measure of software productivity. He thought his goal was to write as small and fast a program as possible, and that the lines of code metric only encouraged writing sloppy, bloated, broken code.
He recently was working on optimizing Quickdraw’s region calculation machinery, and had completely rewritten the region engine using a simpler, more general algorithm which, after some tweaking, made region operations almost six times faster. As a by-product, the rewrite also saved around 2,000 lines of code.
He was just putting the finishing touches on the optimization when it was time to fill out the management form for the first time. When he got to the lines of code part, he thought about it for a second, and then wrote in the number: -2000.
I’m not sure how the managers reacted to that, but I do know that after a couple more weeks, they stopped asking Bill to fill out the form, and he gladly complied.

Phases of development

Late last week I wondered: where do the software terms alpha and beta come from? And why don’t we ever use gamma? And why not theta or epsilon or sigma?

alpha character beta character

Alpha and Beta are the first two characters of the Greek alphabet. Presumably these characters were chosen because they refer to the first and second rounds of software testing, respectively.

But where did these terms originate? Read more of this post

Logging and software development maturity

According to Wikipedia maturity is:

a psychological term used to indicate how a person responds to the circumstances or environment in an appropriate manner. This response is generally learned rather than instinctive, and is not determined by one’s age.

It is a widely know and accepted fact that write logs is a very good thing, since it helps you find out what happened at a given moment of time. It is rule number one on the universal developer book, if it existed… But again and again I find a lot of programs that just don’t do it.

I’m used to clients calling and say that something is not correct with the product, but it makes me cry when I ask for the logs and he says that there are none and I want to kill the dev who didn’t write log. I always find the bastard. If by any chance it is not a intern, oh help me lord, some blood will be shed!

Logging came to me as a very instinctive thing. My firsts programs didn’t write any log, but of course, I was the only one using them. But as I got better and better i felt the need to better control my “world”. I was scared of the fact that a problem would occur and I wouldn’t be able to know about it. In order to really evolve and grow as developer one needs to make more complex and bigger programs and, more importantly, learn from ones own mistakes. Logging is perfect for the learning part, because you don’t need to bother the users with questions and other stuff (if you logged all the information you needed).

One might start to wonder what does it have to do with maturity. Just logging is not enough. Logging for the sake of it does not help. You need to have a consistent, useful, categorized and, most of all (I think), easy to track log message. It takes real-maturity to know how to log. You should log everything that happens but each event must be logged in a particular way. Logging user interactions that same way you log Null Reference Exceptions is not very helpful. Remember that logging provides information and information is gold!

A lot of people do write a bunch of logs. They actually log the entire thing. Take World of Warcraft for example: Everything you do in the entire game can be traced! They can trace every little critter that you killed. They know which weapon you used. They log everything! But, as i said before, just write messages is not enough. You have to write them in a understandable, easy-to-track way. If a developer in your team can’t trace back, for example, the users actions, such as buttons clicked, radio button choices, in-screen info, check-boxes selected, etc, your log is not good. It might help you find and solve a problem, but it could do a lot better. I recently learned a important lesson.

I must say that I have never had a problem with production environment software (Yes, my programs bugs, but always very soon after deploy, so I was always very close by to help). Thank god all programs that I wrote never gave me serious headaches. Until a couple weeks back. Deploy went fine and everything was peachy. So one beautiful morning the client called me and reported a issue. As always I asked for the logs, he sent me and then the my world felt apart. I could not trace what was going on. Everything was there, but i could not set a time-line to the events. Due to multiple front-ends and multiple web services servers it was very, VERY hard to track what happened to a user. I had a special hard time figuring out what messages on the web services logs belonged to whom. It was a nightmare. I wanted to cry. But I managed to do it and I matured a bit… no, I matured a byte… sorry for the pun…

So now I’m developing a new logging lib and a log-reader tool (feel free to “borrow”) that is planned to solve the multi user/thread/server scenarios. The current log lib we use today is perfect to developer and single-user/thread/server scenarios. You can easily “follow” up what and when it happened, what came after what and so on. But when multi-thread/multi-users/async-operations come-out to play things get ugly. Since the current lib only logs the time and severity of the event its very hard to track continuous actions of a user, or even the path of a particular thread.

The lib itself is not enough though. The real magic is the log reader. The reader creates a visual path of the massages. It literally creates a “fork” for multithread visually showing parallel operations and stuff! Its getting very cool when its ready ill post the source code… but now back to maturity…

I must disclose that I’m not the most seasoned developer out there… for christ sake I’m only 24, but this much I can say: Write easy to read and understand log messages. Your kids will be thankful! Ok maybe not, but you will when you need to know whats really going on!

Is “crashing” the worst thing it could happen?

Here’s an interesting thought question from Mike Stall: what’s worse than crashing?

Mike provides the following list of crash scenarios, in order from best to worst:

  1. Application works as expected and never crashes.
  2. Application crashes due to rare bugs that nobody notices or cares about.
  3. Application crashes due to a commonly encountered bug.
  4. Application deadlocks and stops responding due to a common bug.
  5. Application crashes long after the original bug.
  6. Application causes data loss and/or corruption.

Mike points out that there’s a natural tension between…

  • failing immediately when your program encounters a problem, eg “fail fast”
  • attempting to recover from the failure state and proceed normally

The philosophy behind “fail fast” is best explained in Jim Shore’s article (pdf).

Some people recommend making your software robust by working around problems automatically. This results in the software “failing slowly.” The program continues working right after an error but fails in strange ways later on. A system that fails fast does exactly the opposite: when a problem occurs, it fails immediately and visibly. Failing fast is a nonintuitive technique: “failing immediately and visibly” sounds like it would make your software more fragile, but it actually makes it more robust. Bugs are easier to find and fix, so fewer go into production.

Fail fast is reasonable advice– if you’re a developer. What could possibly be easier than calling everything to a screeching halt the minute you get a byte of data you don’t like? Computers are spectacularly unforgiving, so it’s only natural for developers to reflect that masochism directly back on users.

But from the user’s perspective, failing fast isn’t helpful. To them, it’s just another meaningless error dialog preventing them from getting their work done. The best software never pesters users with meaningless, trivial errors– it’s more considerate than that. Unfortunately, attempting to help the user by fixing the error could make things worse by leading to subtle and catastrophic failures down the road. As you work your way down Mike’s list, the pain grows exponentially. For both developers and users. Troubleshooting #5 is a brutal death march, and by the time you get to #6– you’ve lost or corrupted user data– you’ll be lucky to have any users left to fix bugs for.

What’s interesting to me is that despite causing more than my share of software crashes and hardware bluescreens, I’ve never lost data, or had my data corrupted. You’d figure Murphy’s Law would force the worst possible outcome at least once a year, but it’s exceedingly rare in my experience. Maybe this is an encouraging sign for the current state of software engineering. Or maybe I’ve just been lucky.

So what can we, as software developers, do about this? If we adopt a “fail as often and as obnoxiously as possible” strategy, we’ve clearly failed our users. But if we corrupt or lose our users’ data through misguided attempts to prevent error messages– if we fail to treat our users’ data as sacrosanct– we’ve also failed our users. You have to do both at once:

  1. If you can safely fix the problem, you should. Take responsibility for your program. Don’t slouch through the easy way out by placing the burden for dealing with every problem squarely on your users.
  2. If you can’t safely fix the problem, always err on the side of protecting the user’s data. Protecting the user’s data is a sacred trust. If you harm that basic contract of trust between the user and your program, you’re hurting not only your credibility– but the credibility of the entire software industry as a whole. Once they’ve been burned by data loss or corruption, users don’t soon forgive.

The guiding principle here, as always, should be to respect your users. Do the right thing.

Top Project Manager Practice: Project Postmortem

You may think you’ve completed a software project, but you aren’t truly finished until you’ve conducted a project postmortem. Mike Gunderloy calls the postmortem an essential tool for the savvy developer:

The difference between average programmers and excellent developers is not a matter of knowing the latest language or buzzword-laden technique. Rather, it can boil down to something as simple as not making the same mistakes over and over again. Fortunately, there’s a powerful tool that any developer can use to help learn from the past: the project postmortem.

There’s no shortage of checklists out there offering guidance on conducting your project postmortem. My advice is a bit more sanguine: I don’t think it matters how you conduct the postmortem, as long as you do it.Most shops are far too busy rushing ahead to the next project to spend any time thinking about how they could improve and refine their software development process. And then they wonder why their new project suffers from all the same problems as their previous project.

Steve Pavlina offers a developer’s perspective on postmortems:

The goal of a postmortem is to draw meaningful conclusions to help you learn from your past successes and failures. Despite its grim-sounding name, a postmortem can be an extremely productive method of improving your development practices.

Game development is some of the most difficult software development on the planet. It’s a veritable pressure cooker, which also makes it a gold mine of project postmortem knowledge. I’m fascinated with the Gamasutra postmortems, but I didn’t realize that all the Gamasutra postmortems had been consolidated into a book: Postmortems from Game Developer: Insights from the Developers of Unreal Tournament, Black and White, Age of Empires, and Other Top-Selling Games (Paperback) . Ordered. Also, if you’re too lazy for all that pesky reading, Noel Llopis condensed all the commonalities from the Game Developer magazine postmortems.

Geoff Keighley’s Behind the Games series, while not quite postmortems, are in the same vein. The early entries in the series are amazing pieces of investigative reporting on some of the most notorious software development projects in the game industry. Here are a few of my favorites:

Most of the marquee games highlighted here suffered massive schedule slips and development delays. It’s testament to the difficulty of writing A-list games. I can’t wait to read The Final Hours of Duke Nukem Forever, which was in development for over 15 years (so it must be a massive doc). Its vaporware status is legendary— here’s a list of notable world events that have occurred since DNF began development.

Don’t make the mistake of omitting the project postmortem from your project. If you don’t conduct project postmortems, then how can you possibly know what you’re doing right– and more importantly, how to avoid making the same exact mistakes on your next project?