Tuesday, October 14, 2014

Computer Science vs Software Development

Recently a good friend of mine, whom I consider among the best programmers I've ever worked with, interviewed for my team and was rejected by my coworkers.  I don't necessarily fault my coworkers for this; they were doing the best they could with the information they had.  It was a close decision, but it made me sad that my coworkers couldn't see in him what I did.  It took me a while to process it, and I went into my shell for a bit during that time (moreso than normal).  I eventually came to a realization about myself upon reflection of that experience that has helped me better understand my own value, because I feel like that friend and I have a lot in common.  Here it is: I am a great software developer, but a mediocre computer scientist.  I've been trying to improve more on the latter lately, which is why I upgraded myself to mediocre.  Maybe software developer isn't the right name for what I mean, but it was the best one I could think of.  I think to many people these skills are one and the same, but to me they really aren't, so let me clarify a bit what I feel are the differences.

The prime directive

Computer science focuses on algorithms and data structures, whereas software development focuses on ease-of-use and maintainability.  Computer science focuses on low-level data structures and algorithms like search and sort.  I'm not saying these aren't important, but I can count on the number of billions of dollars I have how many times I've implemented any of those in my career (i.e. none).  I vaguely remember some of them, and I've read about them to bone up for interviews, but in actual programming jobs in my industry, they come up precisely never.  Every language I've ever used in a professional capacity has them built in, and if they don't then it's too inefficient to implement them yourself, so you just use whatever builtin is closest and make it work.  It's way more efficient to offload the searching to the datastore in nearly every single case.

Software development tends to focus more on things like having consistency in the API so that other people can develop an intuitive sense of your code (i.e. if other objects behave a certain way, they can reasonably expect similar behavior from related objects).  It doesn't matter if you implemented the fastest search algorithm ever, if one object calls the method 'search' and other objects calls it 'find'.  That kind of stuff makes it painful to use your system, increases cognitive overhead, and sadly is way too often ignored or accepted in the industry.  Software development is more concerned with others being able to grok your code quickly so they can navigate it and add features or fix bugs.  Things like consistent naming conventions, good use of namespaces, separation of concerns, etc, are subjects of focus. Most importantly, you have to know how to empathize with a consumer of your system.  Put yourself in their shoes; how would you want it to behave if you didn't understand the inner workings of the system?  Are you leaking your abstractions (i.e. does the user have to know how the internals of your system work in order to use it effectively)? The user doesn't care about the differences between bubble sort and insertion sort, unless it means that you're getting them responses faster and/or more accurately than before.

Optimization is the root of all evil

Computer science optimizes for raw algorithmic performance, whereas software development optimizes for responsiveness and user experience.  In my experience as a developer, I've had the opportunity to optimize a lot of slow code.  I once dropped a rate-limiting algorithm from about 80ms to 1ms through a few optimization iterations.  During those iterations, the algorithmic complexity didn't really change all that much, but the performance sure did.  I couldn't even honestly tell you what the Big-O was on either end of the algorithm because it was fairly complex and the biggest culprit was I/O.  There should really be a Big IO notation, since unless you're developing realtime systems or games, and possibly even then, 99% of your optimizations will be accomplished by reducing I/O.  You can loop through several billion iterations of a loop in the same amount of time it takes an SSD to return a single bit of data from a file read.  I've seen plenty of code that was O(n) that was sped up by moving to an O(n^2) version of the code, simply because it reduced the amount of I/O done within the loop.

Software development tends to focus more on how responsive the application is.  If something is going to take a while, offload it to an asynchronous process and feed status data back to the user.  Don't lock the UI thread.  Don't lock the browser up while the page is loading. To understand how to do those, you have to understand the systems you're working with and how they interact with each other.  If you're dealing with a large number of records, give them some way to split it up so each request isn't ridiculously slow and/or large (pagination?  tagging?  groups? column-based filtering? full-text search?).  That last one has ramifications for the load on your system as well.  In distributed or web-based systems, it's a much bigger optimization to reduce the number of network requests than to make each request as fast as possible.  So you might have to make each request slower in order to return enough data that the requestor doesn't need to contact you again for more.  That's pretty antithetical to the normal methods of profiling and optimization, but it's probably the biggest optimization you can make.  I'm not saying that each request should be wasteful of resources, because you should still try to make it as fast as possible without sacrificing maintainability, but knocking 5ms off a request that then requires them to make an additional request is kind of silly when the network+protocol+routing overhead of a single request is 50ms.

If you pay attention, there will be a point

I think what it boils down to is a fairly obvious dichotomy between theory and practice, science and art.  I'm more concerned with practical knowledge, whereas interviews focus mostly on theoretical knowledge.  I've known plenty of people who were great at CS that ended up being terrible programmers, and plenty of people who were weak on CS but produced great code.  It should be telling that much of the work that comes out of CS research is generally considered to be subpar software.  Don't get me wrong; I'm glad people are doing that work and pushing things forward for all of us.  I just don't want them on my team because I have to maintain that spaghetti mess of a codebase when they move on to the new shiny.

Obviously, all of this is just like, my opinion, man.  As in all things, there's a balance to be had.  Software development is a craft, equal parts art and science, and all too often we ignore the artistry involved.  People who master that aspect of it are equally valuable to those who firmly grasp the deep theory.  Both are rare, and even rarer is someone who masters both.  I'll let you know if I ever meet one.


No comments:

Post a Comment