Flutterby™! : Continuous Delivery 2012-02-02 02:34:07.479479+00

Continuous Delivery

2012-02-02 02:34:07.479479+00 by ebradway 10 comments

Last night, as Asha called me up to help her with a common task in Gmail, I realized a major flaw in Google's Continuous Delivery software development model. Some people, especially computer programmers, are used to a constantly changing environment. They know they that software A should have feature X and that it may be hidden inside UI element Q (despite the fact that it was in UI element P just the day before). It doesn't bother us that our software is changing all the time.

For people like my wife who just gave up her Windows XP laptop and Internet Explorer only because I enticed her away with a Macbook, this change causes constant reinforcement of their distrust of software. In my house, I hear "The Internet is broke again" multiple times a day. Asha is always befuddled because all that is needed for "the Internet to work again" is my physically standing next to her or touching her computer. This, of course, one of the basic tenants of Quantum Bogodynamics. I am a bogon sink. When I step near her computer, I absorb the bogons that are causing her computer to malfunction.

I get annoyed when people ask me "How do you know much about computers?" or "How did you learn to do XYZ?" I really know very little. What I do know is that if I click a button on a computer, it probably won't blow up. So I just start clicking buttons until I get the results I want. Nowadays we have this wonderful thing called "undo" that lets me fix things when I get bad results. It's not like you have to type your computer job on punch cards, submit them to the system operators, and hope you get a print out in the morning that matches your expectations. Click the button already!

Why am I able to format Asha's documents in Apple Pages? I've never touched the software before she did. But I know that there ought to be buttons somewhere to make it work and I just need to find them and click them. It's not like the DOS or Unix shell days when you needed to know these arcane incantations to do simple things like navigate a folder hierarchy. Just start clicking buttons already!

And here's a secret that the geek intelligentsia will string me up for sharing: If you can't figure out how to do something, just type a description of what you want to do into Google. Want to know how to change a lightbulb? Want to know how to change a light switch? Want to know how to format a table in Apple Pages?

And this continues just as complex as you want. Rocket Science, Brain Surgery, and even super complex subjects like downloading pictures from email, are just a Google search away. You can even click the "I'm feeling lucky button" if you don't want to decide which search result to try first.

So please, please, just start clicking buttons. Observe what the computer does. If you don't like the results. Try another button. Eventually, you'll know what the buttons do and not too long after, you'll be able to predict whether or not the button should exist. And if the button doesn't exist, then notify the software developers and we'll add the button you need. Unless, of course, all of our time isn't spent helping our wives download photos in email attachments for the ten thousandth time!

I agree, but it won't stick. I've been telling people this for years, in both personal and in work environments. Their fear of "but what if it DOES blow up?" is simply too strong to dent. You can sometimes make headway with a person on their own personal computer, if they're brave, but at the work place, where the risk/reward calculation is very badly skewed in favor of the person who dares, and does, the least, I have never won over a single convert in damned near three decades of trying.

P.S. I just got finished with a long discussion/argument with a developer friend of mine on Twitter on the topic of how much software changes, and I have to say, while I agree with all the rest of it, I disagree with your first paragraph. I am a developer, I've made my living from wrangling computers in some fashion my entire adult life, and yet I'm NOT copacetic about how often software changes under us, and I probably never will be.

I maintain that in well-designed software it should be possible to improve the underlying mechanics, optimize, fix bugs, et cetera, without changing the visible API or the user experience in significant ways. I prize stability and "set and forget" in my software, and I rarely get as much of it as I want ... to the point where I'm once again pondering reinventing the wheel just so I can get off a constant-upgrade path for certain pieces of web software I rely upon.

I agree, other_todd; I'm not at all happy about the auto-updater trend. Most of the time things work just fine the way they are, and rarely do the hypothetical benefits I might get by upgrading outweigh the loss of stability and possibility that things I can currently do won't work anymore.

I've been using an Android phone for the last month, and have found its ceaseless "upgrade me!" notes incredibly annoying. Where's the "go away and don't bother me again" button? The makers of this software clearly expect that I am *going* to upgrade, it's just a question of when I have a spare moment to let the phone reboot, but really I would just rather they left the software alone.

The auto-updater trend wouldn't bother me so much if they were just fixing bugs. But when they're introducing bugs (and making me go into the config to turn back on me seeing my full address bar), it pisses me off.

I think the fear of "What if it does blow up?" is also one of the struggles I have with software engineering: When organizations get heavily bound up in a software development process, they become so afraid of blowing things up that the ship rate gets way low.

Pseudo Quoting Dan, in the early 90's: "Good software is not what it does when it does, it's what it does when it doesn't." - I tend to do some redundant safety steps in modules, and especially how I store critical data in multiple ways/places. I like that if a part blows chunks, it does not kill everything. One of my complains about overly modularized objectified systems is that a small tweak to one part can have disaster level system wide consequences. So, often weird things get a little duplicated into just the place where it gets used. Then problems are isolated, but now I have redundant code in different places to maintain.

I've also learned to do lots of use of queuing. Big job to run? Queue data needed into a table, and some background process does the work. My scripts often have built in "how do I reset this back" method, so when thing break, they can be fixed and ran again. The user/customer never sees the issue.

Re: Interfaces; I'm sometimes living in a world where the cost of retraining the cubicle monkeys is much higher than the cost of interface changes. We geeks expect things to morph, evolve and hopefully get better in the process. We make assumptions, we guess, we poke around and understand what should be going on behind the scenes. We forget the average person can not distinguish an iPhone from magic. Could we have, 30, 50 or 100 years ago? Once the magic incantation or finger movements is memorized, why does/should it change?

A digression into software development: Meuon, on "what it does when it doesn't": Default behavior on an RPC library I'm using right now is that, if the RPC object gets destroyed without you telling it that you've explicitly handled, when it detects an error it exits the program in a way that can't be trapped (ie: eval { ... } doesn't trap it, it's a real POSIX::_exit(-1)), after emailing the details of the error to the entire operations team.

#Comment Re: made: 2012-02-04 22:44:41.593071+00 by: meuon [edit history]

You just reminded me,, there is someone I need to chide about using 'die' to finish almost every thing he does. Even when it did what it should have.

It took me a while to figure out what was strange about that anecdote, Dan. I guess my take on error handling is a bit unusual: I believe in blowing up in as dramatic and obvious a way as possible, as soon as possible, and doing absolutely nothing to attempt to recover. This is annoying, and that's exactly what you want: you want the end user to be annoyed, so they will complain, so you'll actually hear about it, and have a chance to fix it. Otherwise you end up with users who struggle along for years, cursing you every day as they repeat bizarre voodoo dance workarounds for bugs that you could fix in five minutes if you even had a clue they existed.

In my experience people generally just fuck things up worse when they try to recover from errors, because they just create a long delayed chain of side effects between the symptom that ultimately shows up and whatever the original cause was. Better to just assert all over the place and die fast whenever anything looks remotely weird.

#Comment Re: made: 2012-02-05 16:11:19.764381+00 by: meuon [edit history]

Mars, I think your method is correct for most desktop application software and real public world web interfaces. I think my method is correct for systems in which the end user (typically an employee at the utility or an outside partner) should never see anything "break" unless it is a major real problem. We "babysit" our babies, and have to monitor all the things that go on behind the scenes. Typical issues are usually where we interface to other systems.

Mars, for years I've been asked to make things fail silently, better that the users develop funky workarounds than that they increase the load on the customer support call center. Users adapt to quirks, dialog boxes just result in calls and explaining that they need to develop "quirks" (which often mean things like "configure your environment right"). And to do lots of "try everything possible at the application layer to retry and work around the failure".

Specific example: In Microsoft's .NET 2d graphics subsystem, there are some bugs that cause a munged transform somewhere deep inside to start throwing errors. You can take the sequence of calls and replay them, and usually they'll work, sometimes they'll fail.

Either you can try to figure out what caused the failure and explain the situation to the user, or you can just try { ... } catch {}, let the primary control come up blank, and the users will figure out that they need to save, exit and start the app again. Introducing a dialog box into that will just up the support/call center load.

I'd be overjoyed that I'm living in an environment where failure is taken seriously now, except that there also seems to be a culture of accepting a certain cognitive load of scanning through the barrage of failure emails, because I've seen one particular one every time I check my email for 3 days straight now.

And, I hate to admit, I've got a number of >&/dev/null tasks in the cron job, because I know when they fail for other reasons, and writing wrappers to sort the output is too much of a hassle.