Insane Ramblings of an Egomaniac: October 2014

Thursday, October 23, 2014

Inconsistent naming in Computer Science

I can't count the number of times I start finally reading about a concept or technique and realize that I already know it, just under a different name. Currying is sometimes called partial application. Dictionaries in Python are hashes in Perl, associative arrays (or just objects, depending on how pedantic you are) in Javascript, HashMaps in Java (I think?), and who-knows-what-else in other languages. Lambdas are just anonymous functions, as far as I can tell. Promises are called futures in some cliques. Mixins vs roles (hint: they're the same thing). Aspect-oriented programming is also referred to as monkey-patching. I can keep going, but I think you get the point.

I think it would vastly improve interviews, and conversations with coworkers, if we could all agree on a single name for all these concepts we are expected to know. But I guess that's life. Soda vs pop, anyone?

Tuesday, October 14, 2014

Computer Science vs Software Development

Recently a good friend of mine, whom I consider among the best programmers I've ever worked with, interviewed for my team and was rejected by my coworkers. I don't necessarily fault my coworkers for this; they were doing the best they could with the information they had. It was a close decision, but it made me sad that my coworkers couldn't see in him what I did. It took me a while to process it, and I went into my shell for a bit during that time (moreso than normal). I eventually came to a realization about myself upon reflection of that experience that has helped me better understand my own value, because I feel like that friend and I have a lot in common. Here it is: I am a great software developer, but a mediocre computer scientist. I've been trying to improve more on the latter lately, which is why I upgraded myself to mediocre. Maybe software developer isn't the right name for what I mean, but it was the best one I could think of. I think to many people these skills are one and the same, but to me they really aren't, so let me clarify a bit what I feel are the differences.

The prime directive

Computer science focuses on algorithms and data structures, whereas software development focuses on ease-of-use and maintainability. Computer science focuses on low-level data structures and algorithms like search and sort. I'm not saying these aren't important, but I can count on the number of billions of dollars I have how many times I've implemented any of those in my career (i.e. none). I vaguely remember some of them, and I've read about them to bone up for interviews, but in actual programming jobs in my industry, they come up precisely never. Every language I've ever used in a professional capacity has them built in, and if they don't then it's too inefficient to implement them yourself, so you just use whatever builtin is closest and make it work. It's way more efficient to offload the searching to the datastore in nearly every single case.

Software development tends to focus more on things like having consistency in the API so that other people can develop an intuitive sense of your code (i.e. if other objects behave a certain way, they can reasonably expect similar behavior from related objects). It doesn't matter if you implemented the fastest search algorithm ever, if one object calls the method 'search' and other objects calls it 'find'. That kind of stuff makes it painful to use your system, increases cognitive overhead, and sadly is way too often ignored or accepted in the industry. Software development is more concerned with others being able to grok your code quickly so they can navigate it and add features or fix bugs. Things like consistent naming conventions, good use of namespaces, separation of concerns, etc, are subjects of focus. Most importantly, you have to know how to empathize with a consumer of your system. Put yourself in their shoes; how would you want it to behave if you didn't understand the inner workings of the system? Are you leaking your abstractions (i.e. does the user have to know how the internals of your system work in order to use it effectively)? The user doesn't care about the differences between bubble sort and insertion sort, unless it means that you're getting them responses faster and/or more accurately than before.

Optimization is the root of all evil

Computer science optimizes for raw algorithmic performance, whereas software development optimizes for responsiveness and user experience. In my experience as a developer, I've had the opportunity to optimize a lot of slow code. I once dropped a rate-limiting algorithm from about 80ms to 1ms through a few optimization iterations. During those iterations, the algorithmic complexity didn't really change all that much, but the performance sure did. I couldn't even honestly tell you what the Big-O was on either end of the algorithm because it was fairly complex and the biggest culprit was I/O. There should really be a Big IO notation, since unless you're developing realtime systems or games, and possibly even then, 99% of your optimizations will be accomplished by reducing I/O. You can loop through several billion iterations of a loop in the same amount of time it takes an SSD to return a single bit of data from a file read. I've seen plenty of code that was O(n) that was sped up by moving to an O(n^2) version of the code, simply because it reduced the amount of I/O done within the loop.

Software development tends to focus more on how responsive the application is. If something is going to take a while, offload it to an asynchronous process and feed status data back to the user. Don't lock the UI thread. Don't lock the browser up while the page is loading. To understand how to do those, you have to understand the systems you're working with and how they interact with each other. If you're dealing with a large number of records, give them some way to split it up so each request isn't ridiculously slow and/or large (pagination? tagging? groups? column-based filtering? full-text search?). That last one has ramifications for the load on your system as well. In distributed or web-based systems, it's a much bigger optimization to reduce the number of network requests than to make each request as fast as possible. So you might have to make each request slower in order to return enough data that the requestor doesn't need to contact you again for more. That's pretty antithetical to the normal methods of profiling and optimization, but it's probably the biggest optimization you can make. I'm not saying that each request should be wasteful of resources, because you should still try to make it as fast as possible without sacrificing maintainability, but knocking 5ms off a request that then requires them to make an additional request is kind of silly when the network+protocol+routing overhead of a single request is 50ms.

If you pay attention, there will be a point

I think what it boils down to is a fairly obvious dichotomy between theory and practice, science and art. I'm more concerned with practical knowledge, whereas interviews focus mostly on theoretical knowledge. I've known plenty of people who were great at CS that ended up being terrible programmers, and plenty of people who were weak on CS but produced great code. It should be telling that much of the work that comes out of CS research is generally considered to be subpar software. Don't get me wrong; I'm glad people are doing that work and pushing things forward for all of us. I just don't want them on my team because I have to maintain that spaghetti mess of a codebase when they move on to the new shiny.

Obviously, all of this is just like, my opinion, man. As in all things, there's a balance to be had. Software development is a craft, equal parts art and science, and all too often we ignore the artistry involved. People who master that aspect of it are equally valuable to those who firmly grasp the deep theory. Both are rare, and even rarer is someone who masters both. I'll let you know if I ever meet one.

Friday, October 10, 2014

Impostor Syndrome

I mentioned previously that I failed to publish one of my blog articles for over a year due to impostor syndrome, but then I realized that maybe some people don't know what that is or if I was even being serious. So, first of all, yes, I was serious, sort of. I do think I suffer from what's colloquially known as Impostor Syndrome, but not to the extremes that some do. The basic gist of the problem is that I always worry that people will find out that I have absolutely no idea what I'm doing, despite having a fairly successful career as a software developer. There's more information at Wikipedia if you're curious. It's the opposite of the Dunning-Kruger Effect, in which incompetent people tend to overestimate their own level of competence, which sadly is what most really successful business people have from what I've seen over the years. We, as a society, tend to reward overconfident incompetence instead of self-doubting excellence. I've read that it's significantly more common in women than men, but I personally believe it affects far more developers than are willing to admit it, based on behavior I've witnessed over the years. It could just be that more women are willing to admit to it. I've decided to share my personal experience with this problem, what I've been able to do to overcome it to a degree, and how it still affects a lot of things in my career, usually for the worse, in the hopes that bringing more attention to the problem will help others who are similarly afflicted. I'm going out on a major limb here and I hope that it doesn't collapse under my weight.

We've got wee, not so wee, and friggin huge

The sheer enormity of the volume of knowledge in Computer Science and software development is frankly overwhelming. As the old adage goes, the more I learn, the less I know. There are new developments constantly coming out, and it's impossible to keep up with them all, especially given my relative lack of formal education which has left me to pick up some of the more fundamental parts of the discipline in a more ad-hoc manner over the years. I feel like I'm constantly catching up with where I should have been years ago. It doesn't help that the landscape has changed drastically in that time. What was once considered PHD level material now seems to be considered by some to be basic knowledge. Object-oriented programming, functional programming, reactive programming, aspect-oriented programming, set theory, graph theory, bayesian statistics, machine learning, AI, computer graphics, big data, distributed systems, operating systems, compilers, interpreters, security, encryption, dynamic typing, static typing, weak typing, strong typing, data structures, search algorithms, consensus algorithms, and on and on and on. And that doesn't include all the peripheral knowledge like project management, agile methodologies, version control systems, bug tracking software, etc. It's a lot to know, and it's seemingly more and more all required knowledge based on what you read from the peanut gallery. And those are just a sampling of the things I either know or know I need to know, and I keep finding new examples all the time. The backlog on my reading list is huge, and I don't have the time or energy to actually catch up. It would take years of dedicated full-time effort, and that's without a job or a family to maintain.

This one goes to 11. It's one louder.

So, that's the first part of the problem. It's impossible to understand all these topics to a significant depth. Some people are fine just skating by on the surface, understanding enough to talk about it, but I personally don't feel qualified to talk on a subject until I have really absorbed it. And that takes time, and with things like programming, a lot of practice. I'm stuck with an unfortunately limited subset of these topics that I really understand, and a larger subset that I sort of get but couldn't really speak to to any depth. Some would say I'm lacking in some of the basics, but I also really know some of the not-basics, which makes it really difficult for people to gauge my competence level until they've actually worked with me. My biggest concern is that I'll have a conversation like the one from Spinal Tap where Nigel keeps insisting that his amp is louder because it has an 11 on the knob, where others only go up to 10. He knows enough about amps to know that higher numbers are louder, but completely misunderstands what the numbers actually represent and assumes that 11 is just louder. Some people are content to pretend they understand the technology while spouting off incorrect information to people who don't understand it enough to call BS. As Mark Twain said, "It is better to keep your mouth closed and let people think you are a fool than to open it and remove all doubt." I opt to keep quiet, but when I hear people talking with confidence on so many topics, I falsely assume they actually understand what they're talking about and I feel lacking by comparison. I project my ethics and thought processes on to others who don't necessarily share them.

How does he not get fired?

It doesn't help people with this problem that many developers exhibit a tendency which has been labeled as "feigning surprise" (see Hacker School Rules), in which they pretend to be shocked when someone doesn't know something that they have learned. This has a severe negative emotional impact on people who already think they are impostors. It makes them want to speak up less, and just retreat into their safety zone. I personally think this can be pretty tightly coupled with impostor syndrome, but I can't necessarily prove it, it's just a hunch based on my own experiences. I think, at least in some cases, that the person feigning surprise is not consciously faking it but is actually surprised that someone else doesn't know something they do. They feel like they're a fraud, and certainly the other person shouldn't also be a fraud. It's like a weird combination of nihilism and narcissism that doesn't make any logical sense to others, and it comes across as demeaning and only serves to worsen the problem.

This is an NP hard problem

I think a lot of my opinions on interviewing in my previous post revolve around this problem, honestly. I tend to clam up in interviews when they hit an area I don't understand extremely well, out of fear that I'll say something extremely stupid on the subject and just cast away all doubt as to my imposterosity. In reality, often times, they're purposely testing how the subject reacts to a situation in which they don't know the answer. Despite my logically understanding this, my psychological reaction is to flee inward rather than reveal my ignorance, so I look even worse by not working it out with the interviewer. My whole career is just a house of cards waiting to crash down as soon as someone realizes I really don't understand all of the intricacies of directed acyclic graphs and paxos consensus algorithms. This can come across as incompetence in interview settings, and I've done poorly on a number of interviews and lost jobs because of it.

Nevermind the man behind the curtain

Given how interviews are conducted, I'm amazed every time I get to a new job how much it's basically like the old job. I've had a couple times where I finally felt like I got in; I was now part of the elite who worked at these places with these ridiculous interviews. I sure fooled those guys; they're gonna feel pretty silly that they hired me once they realize I'm not a demigod. Come to find out that my new coworkers were basically of the same competence level as my old ones; some good, some not so good. I imagine it's that way at places like Google, Microsoft, and Amazon, despite all their pretense of hiring only the best of the best. They still make stupid decisions and release the same crappy, bug-filled software as the rest of us. They just don't do it in the open, so people romanticize that things must be perfect behind their shroud of secrecy.

I'm good enough, I'm smart enough, and dog gone it, people like me

So, what can someone with impostor syndrome do to combat it? There are a few things I've found that have helped me immensely, but each is a constant battle. Just being aware that I have this problem, and that it's a known, not-uncommon issue has helped tremendously. I can objectively look back over my career and realize that I have actually done well at most jobs I've had. When I left one job, my boss told his boss that they were losing their best developer, and at another job I was kept on after nearly everyone was let go in order to help transition things to the parent company. Even at the places where I felt completely unworthy after the difficult interview, yet somehow managed to land the job, I ended up being well-respected and valued because of my pragmatism, my attention to detail, and my ability to ramp up quickly. You would think that would be validation enough, but I tend to dismiss those things because those people obviously didn't see how much I was struggling. When I feel especially fraudulent, I remind myself of those experiences so I can feel more calm and confident, and it helps.

Another thing that has helped immensely is setting aside time to actually research areas where I feel I'm lacking. If I can at least have a cursory knowledge of an area, enough to actually have a decent conversation on the subject, I feel a lot less incompetent. This has proven the most difficult, because it's often led to going down a rabbit hole of related subjects that I can never possibly learn to the degree I would prefer. Nevertheless, it's still been a boon, and I feel like I'm in a much better position than I was a few years ago. I'm working on being less dismissive of my own ideas, as it's turned out that I've had some good ideas over the years that have worked out pretty well, but I still have a tendency to discount my opinions or second guess myself.

Oh, he comes from a broken home. So... no coffee then?

I guess that's all the advice I have. There's no miracle cure here, but it is manageable. Be objective, be logical, and remind yourself that you actually have done a good job from time to time. Most of all, remember that nearly everyone else is faking it just as much as you are, if not more. Some people are just better at hiding it than others.

Thursday, October 9, 2014

Being a part of something special

In my career, I can think of only one time I've felt like I was a part of something truly special. At Liquid Web, I was part of small team of developers and engineers that built out Liquid Web's cloud product called Storm on Demand. It was a pretty amazing accomplishment for a handful of people over a thirteen month period from inception to public release. Despite being a very niche player in the market, we were only a few months too late to be the 2nd public cloud provider after Amazon. Unfortunately, during those few months, several other players also got their clouds out, so when we released, it didn't make quite the splash we had hoped. Don't get me wrong, it was wildly successful, I just think some people had unrealistic expectations. It was really an amazing accomplishment, and despite the fact that I've since left the company, I still count it as the best experience of my career thus far.

So, I think I'm about to have my second such experience. At Rackspace, we've formed a Data Services "practice area" (practice as in law practice, not basketball practice, it took me a while). A year or two ago, we acquired a startup called ObjectRocket that specializes in providing MongoDB as a service. Their focus is on providing the best MongoDB experience out there and taking all of the headaches of administrating a MongoDB installation away from the developer so they can just focus on the code. So now, ObjectRocket, the CloudDatabases teams, and CloudBigData team (I'm on this one) are joining forces to form this group. There's a ton of work to do to get where we need to be, but it's exciting to be a part of, and even if we only accomplish half of what it has set out to do, it should be huge.

It's also possible that this ends up being a giant disaster and implodes. Given my conversations with the leadership so far, and based on their personalities and management style, I don't feel like this is a big concern. We've got the resources of Rackspace behind us, and we have the right type of pragmatism and excitement and technical know-how to make this succeed.

If this sounds like something you'd like to be a part of, please contact me. We need your help. I think this is a fantastic opportunity, and I'd love to see some of my friends and former coworkers (those aren't mutually exclusive sets) be a part of it.

Wednesday, October 8, 2014

Modern code review practices

I might just be curmudgeonly, but I'm about to the point where I won't even bother contributing to a project that doesn't use the Github pull-request model for code contributions. There's so much pomp and circumstance with some projects, when they should be doing everything they can to welcome contributions. Other similar systems like Bitbucket are fine, although Bitbucket's code review UI needs some help. I've tried a lot of other systems over the years, and none of them match the simplicity and joy of Github.

The 90s called and want their code contribution practices back

I've recently begun trying to contribute to Ambari. Ambari's contribution instructions read like a Microwave Oven's operator manual. The process is so arcane that I'm surprised that anyone contributes. You have to sign up for two services, JIRA and Review Board, and then you submit a patch to the JIRA ticket, as well as create a Review Board review with the same patch, and the same information as in the ticket, then link it back to the JIRA ticket and link the JIRA ticket to the review. Here, read it yourself: Ambari's How To Contribute Page I don't know if the Apache Foundation dictates this broken process or if the projects just take it on themselves to make the process horrible, but either way, someone needs to be informed that the 90s called and want their code contribution practices back. Some of you might be saying, "what's the big deal? Submitting patch files to two places and manually inviting people to review your code when you don't know who should review your code, then manually linking the review back to the ticket, and then enlisting yet another person to actually commit the code on your behalf isn't that bad" (if you're saying this, please punch yourself in the face and save me the trouble). But there's also another catch that might be a little less obvious: attribution. If you submit a patch, and then someone else actually commits the patch for you, the git log attributes the patch to the person who committed it, not you. You may not realize this, but people like receiving credit for their work, even in Open Source. Some people will simply refuse to contribute if they don't get credit, and frankly that's as it should be. Can you imagine an actor or a writer doing work and then letting someone else put their name on it? I can't. Ambari puts your name on a contributors page, so it's not like there's no attribution, but that's not the same as having it show up on github, which is commonly used as a portfolio these days.

Barriers, barriers everywhere, and not a drop to drink

Openstack's contribution model isn't as terrible as Ambari. It at least gets attribution right. Openstack uses a system called Gerrit for code reviews. You simply have to sign a CLA, sign up for Launchpad, download a magical tool called git-review (it handles all the arcane Gerrit shenanigans for you), and then run git-review on your committed local branch. Then Gerrit takes it from there. More info here: OpenStack's How To Contribute page You'll also need to read all of the coding standards, and all the information about how to format your commit messages, and oh yeah, make sure you know everything that's talked about on the mailing list because if you don't conform, your code will be rejected. You'll likely not pass your code review and then have to re-submit the same branch for review (your new friend it git commit --amend). Assuming you actually make it through the gauntlet of feedback and pass all the automated tests, which cover things like Openstack's extremely limiting and pedantic coding standards, then Gerrit will automatically merge it for you and push it up to Github. So, it's usable, even if a bit overblown. The biggest problem is that the process takes forever. Even working with the same developer on an Openstack project and another project that's just on Github, it's amazing how much faster progress is made on the Github-based project. I understand the need for verification of new code before it's released, but this is just a bit much. Github lets you integrate with other systems to verify pull requests, and it works extremely well and doesn't require a ton of hoop-jumping from the developers.

I'll quit while I'm ahead

I really think it's antithetical to the spirit of Open Source to make it difficult to contribute to your project. Github gets it right. You fork my code, write your own code, then submit a pull request. All the feedback and code is in the same interface, you can update your pull request based on feedback, and when it's good to go, the committer can merge it in. Done. If you don't want to use their issue tracker, you can integrate it with most popular issue trackers and have it automatically update tickets for you. Whatever you do, don't make the contributor have to do this manually.

Full disclosure: I don't work for Github and I'm not a paid shill. I just really think they nailed this process and there's a good reason they're becoming a de-facto standard. I wish other projects would catch on.