Tuesday, November 18, 2014

Introducing python-ambariclient

Apache Ambari is an open-source project that configures and manages Hadoop clusters.  The product I work on at work configures and manages Hadoop clusters... in the cloud (ooooh).  Now that Ambari has matured enough to be stable and have a fairly usable API, we've decided to discontinue the part of our product that overlaps with Ambari and use and contribute to that project instead.  Our product isn't going anywhere, just the ~50% of our codebase that does pretty much exactly what Ambari does will be replaced with Ambari and we'll put those resources we would spend building and maintaining our own code into improving Ambari.  Given that Ambari's primary audience is Java developers, their main efforts at a client library are written in Groovy, another JVM-backed language that works with Java seamlessly.  They also have a Python client, but it's fairly immature, incomplete, and buggy.  Efforts to contribute to it proved onerous, mostly due to concerns about breaking backwards-compatibility, so we decided that I would create a new client that we'd release to the public.  And so I'm here to announce our Python Ambari client libraries, aptly named python-ambariclient.

There were a few things I wanted out of the client that I felt we weren't able to accomplish easily with the existing one:
  1. An easy-to-intuit, consistent interface that mimicked the API structure.
  2. Native support for polling the long-running background operations that are common to working with Ambari.
  3. Easy to add new types of objects to as the Ambari API added new features.
  4. Minimize the number of actual HTTP requests executed.
  5. An ORM-style interface that felt natural to use coming from projects like SQLAlchemy and libcloud.
To accomplish all those goals, I felt like a vaguely promises-style API would suit it best.  This would allow us to delay firing off HTTP requests until you actually needed the response data to proceed, and I wanted the method-chaining style reminiscent of Javascript projects like jquery.  I was able to accomplish both, and I think it turned out pretty well.  It's a good example of what I've always wanted in an API client.  So, let's dive in to some of the design decisions.

Delegation and Collections and Models, oh my

The main API client is just an entry point that delegates all of the actual logic to a set of collection objects, each of which represents a collection of resources on the Ambari server.  For those who are used to REST APIs, this might make sense, but here's some examples to show what I mean:
# get all of the users in the system
users = ambari.users
for user in users:
    print user.user_name
# get all of the clusters in the system
clusters = ambari.clusters
for cluster in clusters:
    print cluster.identifier
The collections are iterable objects that contain a list of model objects, each representing a resource on the server.  There are some helper methods on the collections to do bulk operations, such as:
# delete all users (this will likely fail or break everything if it doesn't)
ambari.users.delete()
# update all users with a new password (bad idea, but hey)
ambari.users.update(password='new-password')
If you want to get a specific model out of a collection, that's easily accomplished by passing a single parameter into the accessor for the collection.
# get the admin user
admin_user = ambari.users('admin')
# get a specific cluster
cluster = ambari.clusters(cluster_name)
# get a specific host
host = ambari.hosts(host_name)
Additionally, you can get a subset of a collection by passing in multiple arguments.
# get a subset of all hosts
hosts = ambari.hosts([hostname1, hostname2, hostname3])
So, this is just the basic entry point model collections.  In Ambari, there's a large hierarchy of related resource and sub-resources.  Users have privileges, clusters have hosts, services have components, etc.  To handle that, each model object can have a set of related collections for the objects that are contained by it.  So, for example:
# get all hosts on a specific cluster
ambari.cluster(cluster_name).hosts
# get a specific host on that cluster
host = ambari.cluster(cluster_name).hosts(host_name)
Some of the hierarchies are very deep.  These are the deepest examples I can find so far:
# get a repo for a specific OS for a specific version of a specific stack
ambari.stacks(stack_name).versions(stack_version).operating_systems(os_type).repositories(repo_id)
# get a component for a specific service for a specific version of a specific stack
ambari.stacks(stack_name).versions(stack_version).services(service_name).components(component_name)
Obviously those are outliers, in general use you only need to go one or two levels deep for most things, but it's good to know the pattern holds even for deep hierarchies.

When you get to the individual model objects, they behave much like a normal ORM.  They have CRUD methods like create, update, delete, and they use attribute-based accessors for the fields returned by the API for that resource.  For example:
cluster = ambari.clusters(cluster_name)
print cluster.cluster_id
print cluster.health_report
There's no fancy data validation or type coercion like in SQLAlchemy, just a list of field names that define which attributes are available, but really that's all that I think is necessary in an API client.  The server will do more robust validation, and I didn't see any places where automatic coercion made sense.  What I mean by automatic coercion is automatically converting datetime fields into datetime objects, or things of that nature.  I'm not doing that, and it's possible that that decision turns out to be shortsighted, but I'm guessing the simplicity of the current design will win out.

Wait for it...

Because the client is a promises style API, it doesn't necessarily populate the objects when you expect.  For the most part, if it can't accomplish what you're requesting without populating the object with data from the server, it will do it automatically for you.  Many operations also are fairly asynchronous, and what you as a user really care about is that you are safe to operate on a resource.  To accomplish that, there is a method called wait() on each object.  Calling wait() will basically do whatever is required for that model or collection to be in a "ready" state for you to act on it.  Whether that's simply just requesting data from the server or waiting for a background operation to complete or waiting for a host to finish registering itself with the Ambari server, the method is the same.  .wait():
# wait for a recently-added host to be available in Ambari
ambari.hosts(host_name).wait()
# wait for a bootstrap call to finish and all hosts to be available
ambari.bootstrap.create(hosts=[hostname1, hostname2], **other_params).wait()

I have a request

In the Ambari API, if your POST or PUT command triggers a background operation, a 'request' object is returned in the response body.  It will look something like this:
{
  "href" : "http://c6401.ambari.apache.org:8080/api/v1/clusters/testcluster/requests/1",
  "Requests" : {
    "id" : 1,
    "status" : "InProgress"
  }
}
If any API call returns this information, the Ambari client will automatically recognize that and store that information away.  Then, if you call .wait() on the object, it will poll the Ambari API until that request has completed.  At some point, it will start throwing exceptions if the request doesn't complete successfully, but that logic hasn't been built in yet.
# install all registered components on a host and wait until that's done
ambari.clusters(cluster_name).hosts(host_name).components.install().wait()
And to be consistent and obey the principle of least surprise, you can chain off wait() calls to do further actions, so this also works:
# install and start all registered components on a host and wait until it's done
ambari.clusters(cluster_name).hosts(host_name).components.install().wait().start().wait()
It's not generally a great idea to just have a huge long method chain like that, but it's possible.  It would be written better like:
components = ambari.clusters(cluster_name).hosts(host_name).components
components.install().wait()
components.start().wait()

Wait, that's it?

I wanted it to be extremely easy to add new model classes to the client, because that was one of my biggest complaints with the existing client.  So most of the common logic is built into two base classes, called QueryableModel and DependentModel.  Now defining a model class is as simple as defining a few pieces of metadata, for example:
class Cluster(base.QueryableModel):
    path = 'clusters'
    data_key = 'Clusters'
    primary_key = 'cluster_name'
    fields = ('cluster_id', 'cluster_name', 'health_report', 'provisioning_state',
              'total_hosts', 'version', 'desired_configs',
              'desired_service_config_versions')
    relationships = {
        'hosts': ClusterHost,
        'requests': Request,
        'services': Service,
        'configurations': Configuration,
        'workflows': Workflow,
    }
  1. 'path' is the piece of the URL that should be appended to access this model.  i.e. /api/v1/clusters
  2. 'data_key' defines which part of the returned data structure contains the data for this particular model.  The Ambari API returns the main model's data in a subordinate structure because it also returns a lot of related objects.
  3. 'primary_key' is the field that is used to generate the URLs to a specific resource.  i.e. /api/v1/clusters/cluster_name
  4. 'fields' is a list of field names that should be returned in the model's data.
  5. 'relationships' is a list of accessors that should build related collection objects. i.e. ambari.clusters(cluster_name).hosts == collection of ClusterHost models
Some objects are not represented by actual URLs on the server and are only returned as related objects to other models.  These are called DependentModels in my client.  Here's a pretty simple one:
class BlueprintHostGroup(base.DependentModel):
    fields = ('name', 'configurations', 'components')
    primary_key = 'name'

class Blueprint(base.QueryableModel):
    path = 'blueprints'
    data_key = 'Blueprints'
    primary_key = 'blueprint_name'
    fields = ('blueprint_name', 'stack_name', 'stack_version')
    relationships = {
        'host_groups': BlueprintHostGroup,
    }
When you get a specific blueprint, it returns something like this:
{
  "href" : "http://c6401.ambari.apache.org:8080/api/v1/blueprints/blueprint-multinode-default",
  "configurations" : [
    {
      "nagios-env" : {
        "properties" : {
          "nagios_contact" : "greg.hill@rackspace.com"
        }
      }
    }
  ],
  "host_groups" : [
    {
      "name" : "namenode",
      "configurations" : [ ],
      "components" : [
        {
          "name" : "NAMENODE"
        }
      ],
      "cardinality" : "1"
    }
  ],
  "Blueprints" : {
    "blueprint_name" : "blueprint-multinode-default",
    "stack_name" : "HDP",
    "stack_version" : "2.1"
  }
}
As you can see, the 'Blueprints' key is the 'data_key', so that structure has the data related to the blueprint itself.  The 'host_groups' and 'configurations' structures are related objects that don't have URLs associated with them.  For those, we can define DependentModel classes to automatically expand them into usable objects.  So, now this works:
for host_group in ambari.blueprints(blueprint_name).host_groups:
    print host_group.name
    for component in host_group.components:
        print component['name']
I tried to make things act consistently even where they weren't consistent in the API.  It should be noted that objects that are backed by URLs are also returned in related collections like this, and the client will automatically use that data to prepopulate the related collections to avoid more HTTP requests.  For example, here is a very trimmed down cluster response:
{
  "href" : "http://c6401.ambari.apache.org:8080/api/v1/clusters/testcluster",
  "Clusters" : {
  },
  "requests" : [
    {
      "href" : "http://c6401.ambari.apache.org:8080/api/v1/clusters/testcluster/requests/1",
      "Requests" : {
        "cluster_name" : "testcluster",
        "id" : 1
      }
    }
  ],
  "services" : [
    {
      "href" : "http://c6401.ambari.apache.org:8080/api/v1/clusters/testcluster/services/GANGLIA",
      "ServiceInfo" : {
        "cluster_name" : "testcluster",
        "service_name" : "GANGLIA"
      }
    }
  ]
As you can see, both the 'requests' and 'services' related collections were returned here.  So, if you were to then, do:
for service in ambari.clusters(cluster_name).services:
    print service.service_name
It would only have to do the single GET request to populate the cluster object, then use the data returned there to populate the service objects.  There is a caveat here.  When getting collections in the Ambari API, it generally only returns a minimal subset of information, usually just the primary_key and possibly the primary_key of its parent (in this case, service_name and cluster_name).  If you want to access any other fields on that object, it will have to do another GET call to populate the remaining fields.  It does this for you automatically:
for service in ambari.clusters(cluster_name).services:
    print service.maintenance_state
'maintenance_state' was not among the fields returned by the original call, so it will do a separate GET request for  http://c6401.ambari.apache.org:8080/api/v1/clusters/testcluster/services/GANGLIA to populate that information and then return it.

Smoothing out the rough edges

The Ambari API is mostly consistent, but there are some warts from old designs or one-off pieces.  The bootstrap API and the configurations are the worst offenders in this regard.  All efforts were made to make those areas behave like the other areas as much as possible.  I didn't want the user to have to know that, for example, bootstrap requests aren't the same as every other asynchronous task, or that even when a bootstrap finishes the hosts are not visible to Ambari until their agents have booted up and registered themselves.  So, I overloaded the wait() method on those objects so that it just does the needful.
# wait until these hosts are in a ready state
ambari.hosts([hostname1, hostname2]).wait()
Similarly, adding a host to a cluster normally involves manually assigning all of the components, but an upcoming Ambari feature will make it so you simply have to pass in a blueprint and host_group and it will automatically do it for you.  I pre-emptively smoothed this out in the client so you can do this now, it just involves a few more API requests to be made automatically on your behalf.  Wherever things are inconsistent on the API server, my client makes them consistent to the user.
# add a new host to an existing host_group definition
ambari.clusters(cluster_name).hosts.create(host_name, blueprint=blueprint_name, host_group=host_group_name)
When the server-side is updated to include support for this, I can simply pass the information along and let it sort it out.  There are a few other cases where warts in the API were smoothed over, but for the most part the idioms in the client matched up with the API server pretty well.

Where do we go from here?

There was one feature that I really wanted to have that I wasn't able to wrap my head around sufficiently to implement in a clean, intuitive way.  That is the ability to act on collections of collections.  Wouldn't it be awesome if this worked?
# restart all components on all hosts on all clusters
ambari.clusters.hosts.components.restart().wait()
The .wait() would get a list of clusters, then get a list of hosts per cluster in parallel, then get a list of components for each host in parallel, then call the restart API method for each of those, gobble up all the request objects, and wait until all of them completed before returning.  This should be possible, but it will require a bit more thought into how to implement it sanely, and there wasn't enough bang for the buck for our use-cases to justify spending the time right now.  But maybe I'll get back to it later.

What's it to me?

I realize Ambari is a niche product, and that most of this post will be gobbledy-guck for most of you, but I think the general principles behind the client's design apply well to any REST-based API client.  I hope that people find them useful and maybe lift a few of them for their own projects.  Most of all, I think this is probably the best client library I've ever written, and it embodies pretty much everything I've wanted in a client library in the past.  We plan on rewriting the client library for our own API in a similar fashion and releasing that to the public in the near future*.

* Usual disclaimer about forward-looking statements and all that.  I make no guarantee that this will actually happen.

Monday, November 17, 2014

Promise-based REST API clients

In software development, the concept of promises (sometimes also called futures) is deceivingly simple.  There's a pretty good wikipedia article that explains it better than I do, and the opening line is about a good a summary as I can think of:
In computer sciencefuturepromise, and delay refer to constructs used for synchronizing in some concurrent programming languages. They describe an object that acts as a proxy for a result that is initially unknown, usually because the computation of its value is yet incomplete.
What that means in normal people English is that you call a method to do something (i.e. compute a value, gather some data, etc) and instead of doing what you requested immediately and making you wait, it returns a "promise" that it will eventually do what you requested.  While this is generally done to make concurrency easier, I think the concept works well for hierarchical REST API clients as well.

Let me explain.  No, no, there is too much.  Let me sum up.

REST APIs are generally hierarchical.  You often see structures like:

GET /artists - get a list of artists
GET /artists/metallica - get the artist Metallica
GET /artists/metallica/albums - get a list of albums by Metallica

And so on.  What I wanted from an API client was to do something along the lines of this Python snippet:
for album in client.artists('metallica').albums:
    album.delete() # take that Metallica

What I didn't want to happen in the above snippets was to do all of these HTTP requests

GET /artists
GET /artists/metallica
GET /artists/metallica/albums
DELETE /artists/metallica/albums/killemall
DELETE /artists/metallica/albums/ridethelightning
...

But instead just do:

DELETE /artists/metallica/albums

(or if the API didn't support bulk-delete in that way, just do a single GET followed by a series of DELETE calls)

So, enter promises.  If each method in that chain simply returned the promise of fetching the underlying data, I could chain off of it and only actually fire off HTTP requests for the things I actually needed to get.

I should've taken the blue pill

So, that makes pretty good sense, but that's not all.  For your 3 easy payments of $29.95 you get not only this nice, easy-to-use method chaining and minimal HTTP request overhead, but also implicit parallelized requests as well.  What if, given the above example, you could also do something like:
for album in client.artists.albums:
     print album.tracklisting
And that just automatically did basically this:
  1. GET /artists
  2. for every $artist, GET /artists/$artist/albums (in parallel)
  3. for every album from every $artist, print the tracklisting
Now, let's see how deep this rabbit hole can go:
for track in client.artists.albums.tracks:
    print "%d - %s" % (track.number, track.name)
  1. GET /artists
  2. for every $artist, GET /artists/$artist/albums (in parallel)
  3. for every $album from every $artist, GET /artists/$artist/albums/$album/tracks (in parallel)
  4. for every track for every $album from every $artist, print the number and name

I am serious, and stop calling me Shirley

I think this is a really powerful idea that could make an API client very easy to use and actually, you know, fun.  But maybe I'm alone here, since every API client ever is just the same old boring bunch of methods, sometimes in namespaces, sometimes with actual model-style classes for the resources, if you're lucky.  Granted, this idiom could be abused to fire off hundreds of HTTP requests in parallel and potentially overwhelm the server, either by accident or malice.  But, in my opinion the ease-of-use outweighs the potential for mass destruction.  I'd like to see a client that works this way, even if I have to build it myself (especially if I get to build it).  I've managed to build a client that does the first half of this, and does it pretty well IMO (blog post coming soon).  I have a good idea of how to accomplish the second part, but I haven't yet overcome the effort-to-payoff ratio on that one.  It would be cool, and potentially useful, but it's a lot of work for such a small benefit.  Still, that itch needs scratching, and one day it shall be scratched.  Even though I realize that maybe it means I'm insane.

Friday, November 7, 2014

Call me Elmer

My wife commented to me the other day that I'm really good at picking up the slack.  She's kind of slammed with a lot of things right now, and I just do what needs to be done to make things work at home.  It got me thinking, and I think that's really me in a nutshell, both personally and professionally.  When I was interviewing with Rackspace, they asked me what role I filled on teams and I said I was The Glue.  I'm not sure my interviewer really understood what I meant by that and whether it was a good thing, so let me elaborate.  I make sure what needs to get done gets done; I bind together the pieces and make sure things stick.  

To me there are a few archetypes that most good programmers seem to fall into.  Many end up having attributes from multiple archetypes, but generally one is the most prominent.  I'm not just The Glue, I'm also some other things, but what's most obvious and where my biggest contributions come is in being The Glue.  I'm going to avoid talking about the negative archetypes like The Hero, The Recluse, The Sheep, and The Manchild, but certainly even good programmers can revert into those at times, too.

The Glue


A glue programmer is very valuable.  They are skilled in a wide variety of tasks, they pick things up quickly, and they're fearless, or else they wouldn't be able to be The Glue.  The Glue binds everything together and makes sure everything holds up.  The Glue is selfless, as they care more about the project or team's success than about their own individual satisfaction.  They'll do the mundane work that nobody else wants to do, not because they can't do more interesting things, but because it needs to be done for the project to succeed.  They tend to flitter about between systems and responsibilities and make sure the details are being covered.  Their weakness is their feeling of responsibility over everything can make them avoid getting too deep in any one particular area, lest something else falter.  Because of that, they're not the best choice for doing upfront architectural work, because that requires a lot of dedicated focus on a specific problem for a long period of time.  They can lose sight of the big picture while they're focusing on keeping everything together.

The Architect


Architects really enjoy thinking about problems and coming up with solutions.  They're good at looking at things from a distance and seeing how all the components will need to work together to make the whole function.  They're great at starting projects, but often are not as good at finishing them.  When they get down to the details and have to make the hard decisions about compromising their vision, they can be paralyzed.  All projects require some tradeoffs to get finished, and often they feel that this betrays their artistic ideals.  They're a great resource to have, but when you have too many, you'll notice a lot of grand, lofty ideas being discussed and planned with little actual end result to show for it.  These are the types that do well in interviews because they like coming up with solutions to challenging problems, but if they don't also have some bits of other archetypes in them, they'll end up spending all their time coming up with the perfect theoretical solution without actually shipping software.

The Builder


A builder is perfect to pair with an architect, because they're great at seeing another person's vision and bringing it to life. They are good at tracking the details and ensuring constant progress on the project.  They can be known as finishers, and are very task oriented.  They tend to be pretty reliable at making visible progress on projects and getting things out the door on time. They know when to make tradeoffs to ensure the project gets completed, but sometimes they can be known to cut corners too much.  The flipside to that it that they can sometimes, in their rush to get something functional out there, build something out of duct-tape and glue, or something so convoluted and cumbersome to maintain that nobody else dare work on it.  When guided by a quality architect, they're extremely useful, though, and will make sure that you have something to show for all those lofty ideas.

The Firefighter


Production's down?  Who you gonna call?  The firefighter, that's who.  They're great at jumping into a tense situation, keeping composure, finding the problem, and fixing it.  They're not afraid to put a band-aid solution in place to keep things going while they work on a more permanent solution.  That can break down when there are too many fires for them to actually get to the permanent solution, as the adrenaline of the chase will keep them putting bandaids on everything rather than solving fundamental problems.  They are great at debugging systems and understanding their complex interactions so that they can see where the problem is quickly and resolve it.  They should not be confused with The Hero, who rushes to get things out and then gets praised for quickly band-aiding the problem that he created to begin with.  I often refer to that as someone who jumps on the grenade that they had thrown.

The Fixer


This is the guy you call in when there's blood and guts everywhere and the cops are on the way.  You can add a fixer to a delayed project and actually negate the mythical man-month, as they will turn things around.  They're not afraid to step on toes or even run people over if they have to, as long as it serves to move things forward.  In doing that, they can burn bridges and drive people away, even though their intent is more altruistic and less personal than how it's received.  They're still useful to have as they can rescue a doomed project and turn it into something useful for the company, just be aware that they might also drive some people from the team in the process.  Then again, if those people were finishing their projects, you wouldn't have had to call in the fixer to begin with.

What are you?


I think I fall pretty well into The Glue for the most part, with a good chunk of The Builder, and maybe a touch of the others.  I don't enjoy having to be The Fixer, as I know I've offended some coworkers when I've needed to do that (usually when asked, but sometimes of my own volition).  Then again, at other times it's worked out wonderfully when those on the doomed project really wanted the help.  To me, the team is paramount.  If your lack of progress is going to prevent the team from succeeding, I'll try to give you a chance to correct course by offering help.  I can't stop you from hanging yourself once you have enough rope, though, and I'll be the jerk who takes over your project if I have to.  I'd prefer you didn't make that necessary, and hope that you don't take it personally.  Maybe that makes me a bad person.

Where do you fit?  Do you disagree with my self-assessment (assuming you've worked with me)? Or am I way off-base in my over-generalized archetypes?

Thursday, October 23, 2014

Inconsistent naming in Computer Science

I can't count the number of times I start finally reading about a concept or technique and realize that I already know it, just under a different name.  Currying is sometimes called partial application.  Dictionaries in Python are hashes in Perl, associative arrays (or just objects, depending on how pedantic you are) in Javascript, HashMaps in Java (I think?), and who-knows-what-else in other languages.  Lambdas are just anonymous functions, as far as I can tell.  Promises are called futures in some cliques.  Mixins vs roles (hint: they're the same thing).  Aspect-oriented programming is also referred to as monkey-patching.  I can keep going, but I think you get the point.

I think it would vastly improve interviews, and conversations with coworkers, if we could all agree on a single name for all these concepts we are expected to know.  But I guess that's life.  Soda vs pop, anyone?

Tuesday, October 14, 2014

Computer Science vs Software Development

Recently a good friend of mine, whom I consider among the best programmers I've ever worked with, interviewed for my team and was rejected by my coworkers.  I don't necessarily fault my coworkers for this; they were doing the best they could with the information they had.  It was a close decision, but it made me sad that my coworkers couldn't see in him what I did.  It took me a while to process it, and I went into my shell for a bit during that time (moreso than normal).  I eventually came to a realization about myself upon reflection of that experience that has helped me better understand my own value, because I feel like that friend and I have a lot in common.  Here it is: I am a great software developer, but a mediocre computer scientist.  I've been trying to improve more on the latter lately, which is why I upgraded myself to mediocre.  Maybe software developer isn't the right name for what I mean, but it was the best one I could think of.  I think to many people these skills are one and the same, but to me they really aren't, so let me clarify a bit what I feel are the differences.

The prime directive

Computer science focuses on algorithms and data structures, whereas software development focuses on ease-of-use and maintainability.  Computer science focuses on low-level data structures and algorithms like search and sort.  I'm not saying these aren't important, but I can count on the number of billions of dollars I have how many times I've implemented any of those in my career (i.e. none).  I vaguely remember some of them, and I've read about them to bone up for interviews, but in actual programming jobs in my industry, they come up precisely never.  Every language I've ever used in a professional capacity has them built in, and if they don't then it's too inefficient to implement them yourself, so you just use whatever builtin is closest and make it work.  It's way more efficient to offload the searching to the datastore in nearly every single case.

Software development tends to focus more on things like having consistency in the API so that other people can develop an intuitive sense of your code (i.e. if other objects behave a certain way, they can reasonably expect similar behavior from related objects).  It doesn't matter if you implemented the fastest search algorithm ever, if one object calls the method 'search' and other objects calls it 'find'.  That kind of stuff makes it painful to use your system, increases cognitive overhead, and sadly is way too often ignored or accepted in the industry.  Software development is more concerned with others being able to grok your code quickly so they can navigate it and add features or fix bugs.  Things like consistent naming conventions, good use of namespaces, separation of concerns, etc, are subjects of focus. Most importantly, you have to know how to empathize with a consumer of your system.  Put yourself in their shoes; how would you want it to behave if you didn't understand the inner workings of the system?  Are you leaking your abstractions (i.e. does the user have to know how the internals of your system work in order to use it effectively)? The user doesn't care about the differences between bubble sort and insertion sort, unless it means that you're getting them responses faster and/or more accurately than before.

Optimization is the root of all evil

Computer science optimizes for raw algorithmic performance, whereas software development optimizes for responsiveness and user experience.  In my experience as a developer, I've had the opportunity to optimize a lot of slow code.  I once dropped a rate-limiting algorithm from about 80ms to 1ms through a few optimization iterations.  During those iterations, the algorithmic complexity didn't really change all that much, but the performance sure did.  I couldn't even honestly tell you what the Big-O was on either end of the algorithm because it was fairly complex and the biggest culprit was I/O.  There should really be a Big IO notation, since unless you're developing realtime systems or games, and possibly even then, 99% of your optimizations will be accomplished by reducing I/O.  You can loop through several billion iterations of a loop in the same amount of time it takes an SSD to return a single bit of data from a file read.  I've seen plenty of code that was O(n) that was sped up by moving to an O(n^2) version of the code, simply because it reduced the amount of I/O done within the loop.

Software development tends to focus more on how responsive the application is.  If something is going to take a while, offload it to an asynchronous process and feed status data back to the user.  Don't lock the UI thread.  Don't lock the browser up while the page is loading. To understand how to do those, you have to understand the systems you're working with and how they interact with each other.  If you're dealing with a large number of records, give them some way to split it up so each request isn't ridiculously slow and/or large (pagination?  tagging?  groups? column-based filtering? full-text search?).  That last one has ramifications for the load on your system as well.  In distributed or web-based systems, it's a much bigger optimization to reduce the number of network requests than to make each request as fast as possible.  So you might have to make each request slower in order to return enough data that the requestor doesn't need to contact you again for more.  That's pretty antithetical to the normal methods of profiling and optimization, but it's probably the biggest optimization you can make.  I'm not saying that each request should be wasteful of resources, because you should still try to make it as fast as possible without sacrificing maintainability, but knocking 5ms off a request that then requires them to make an additional request is kind of silly when the network+protocol+routing overhead of a single request is 50ms.

If you pay attention, there will be a point

I think what it boils down to is a fairly obvious dichotomy between theory and practice, science and art.  I'm more concerned with practical knowledge, whereas interviews focus mostly on theoretical knowledge.  I've known plenty of people who were great at CS that ended up being terrible programmers, and plenty of people who were weak on CS but produced great code.  It should be telling that much of the work that comes out of CS research is generally considered to be subpar software.  Don't get me wrong; I'm glad people are doing that work and pushing things forward for all of us.  I just don't want them on my team because I have to maintain that spaghetti mess of a codebase when they move on to the new shiny.

Obviously, all of this is just like, my opinion, man.  As in all things, there's a balance to be had.  Software development is a craft, equal parts art and science, and all too often we ignore the artistry involved.  People who master that aspect of it are equally valuable to those who firmly grasp the deep theory.  Both are rare, and even rarer is someone who masters both.  I'll let you know if I ever meet one.


Friday, October 10, 2014

Impostor Syndrome

I mentioned previously that I failed to publish one of my blog articles for over a year due to impostor syndrome, but then I realized that maybe some people don't know what that is or if I was even being serious.  So, first of all, yes, I was serious, sort of.  I do think I suffer from what's colloquially known as Impostor Syndrome, but not to the extremes that some do.  The basic gist of the problem is that I always worry that people will find out that I have absolutely no idea what I'm doing, despite having a fairly successful career as a software developer.  There's more information at Wikipedia if you're curious.  It's the opposite of the Dunning-Kruger Effect, in which incompetent people tend to overestimate their own level of competence, which sadly is what most really successful business people have from what I've seen over the years.  We, as a society, tend to reward overconfident incompetence instead of self-doubting excellence.  I've read that it's significantly more common in women than men, but I personally believe it affects far more developers than are willing to admit it, based on behavior I've witnessed over the years.  It could just be that more women are willing to admit to it.  I've decided to share my personal experience with this problem, what I've been able to do to overcome it to a degree, and how it still affects a lot of things in my career, usually for the worse, in the hopes that bringing more attention to the problem will help others who are similarly afflicted.  I'm going out on a major limb here and I hope that it doesn't collapse under my weight.

We've got wee, not so wee, and friggin huge

The sheer enormity of the volume of knowledge in Computer Science and software development is frankly overwhelming.  As the old adage goes, the more I learn, the less I know.  There are new developments constantly coming out, and it's impossible to keep up with them all, especially given my relative lack of formal education which has left me to pick up some of the more fundamental parts of the discipline in a more ad-hoc manner over the years.  I feel like I'm constantly catching up with where I should have been years ago.  It doesn't help that the landscape has changed drastically in that time. What was once considered PHD level material now seems to be considered by some to be basic knowledge.  Object-oriented programming, functional programming, reactive programming, aspect-oriented programming, set theory, graph theory, bayesian statistics, machine learning, AI, computer graphics, big data, distributed systems, operating systems, compilers, interpreters, security, encryption, dynamic typing, static typing, weak typing, strong typing, data structures, search algorithms, consensus algorithms, and on and on and on.  And that doesn't include all the peripheral knowledge like project management, agile methodologies, version control systems, bug tracking software, etc.  It's a lot to know, and it's seemingly more and more all required knowledge based on what you read from the peanut gallery.  And those are just a sampling of the things I either know or know I need to know, and I keep finding new examples all the time.  The backlog on my reading list is huge, and I don't have the time or energy to actually catch up.  It would take years of dedicated full-time effort, and that's without a job or a family to maintain.

This one goes to 11.  It's one louder.

So, that's the first part of the problem.  It's impossible to understand all these topics to a significant depth.  Some people are fine just skating by on the surface, understanding enough to talk about it, but I personally don't feel qualified to talk on a subject until I have really absorbed it.  And that takes time, and with things like programming, a lot of practice.  I'm stuck with an unfortunately limited subset of these topics that I really understand, and a larger subset that I sort of get but couldn't really speak to to any depth.  Some would say I'm lacking in some of the basics, but I also really know some of the not-basics, which makes it really difficult for people to gauge my competence level until they've actually worked with me.  My biggest concern is that I'll have a conversation like the one from Spinal Tap where Nigel keeps insisting that his amp is louder because it has an 11 on the knob, where others only go up to 10.  He knows enough about amps to know that higher numbers are louder, but completely misunderstands what the numbers actually represent and assumes that 11 is just louder.  Some people are content to pretend they understand the technology while spouting off incorrect information to people who don't understand it enough to call BS.  As Mark Twain said, "It is better to keep your mouth closed and let people think you are a fool than to open it and remove all doubt."  I opt to keep quiet, but when I hear people talking with confidence on so many topics, I falsely assume they actually understand what they're talking about and I feel lacking by comparison.  I project my ethics and thought processes on to others who don't necessarily share them.

How does he not get fired?

It doesn't help people with this problem that many developers exhibit a tendency which has been labeled as "feigning surprise" (see Hacker School Rules), in which they pretend to be shocked when someone doesn't know something that they have learned.  This has a severe negative emotional impact on people who already think they are impostors.  It makes them want to speak up less, and just retreat into their safety zone.  I personally think this can be pretty tightly coupled with impostor syndrome, but I can't necessarily prove it, it's just a hunch based on my own experiences.   I think, at least in some cases, that the person feigning surprise is not consciously faking it but is actually surprised that someone else doesn't know something they do.  They feel like they're a fraud, and certainly the other person shouldn't also be a fraud.  It's like a weird combination of nihilism and narcissism that doesn't make any logical sense to others, and it comes across as demeaning and only serves to worsen the problem.

This is an NP hard problem

I think a lot of my opinions on interviewing in my previous post revolve around this problem, honestly.  I tend to clam up in interviews when they hit an area I don't understand extremely well, out of fear that I'll say something extremely stupid on the subject and just cast away all doubt as to my imposterosity.  In reality, often times, they're purposely testing how the subject reacts to a situation in which they don't know the answer. Despite my logically understanding this, my psychological reaction is to flee inward rather than reveal my ignorance, so I look even worse by not working it out with the interviewer.  My whole career is just a house of cards waiting to crash down as soon as someone realizes I really don't understand all of the intricacies of directed acyclic graphs and paxos consensus algorithms. This can come across as incompetence in interview settings, and I've done poorly on a number of interviews and lost jobs because of it.

Nevermind the man behind the curtain

Given how interviews are conducted, I'm amazed every time I get to a new job how much it's basically like the old job.  I've had a couple times where I finally felt like I got in; I was now part of the elite who worked at these places with these ridiculous interviews.  I sure fooled those guys; they're gonna feel pretty silly that they hired me once they realize I'm not a demigod.  Come to find out that my new coworkers were basically of the same competence level as my old ones; some good, some not so good. I imagine it's that way at places like Google, Microsoft, and Amazon, despite all their pretense of hiring only the best of the best.  They still make stupid decisions and release the same crappy, bug-filled software as the rest of us.  They just don't do it in the open, so people romanticize that things must be perfect behind their shroud of secrecy.

I'm good enough, I'm smart enough, and dog gone it, people like me

So, what can someone with impostor syndrome do to combat it?  There are a few things I've found that have helped me immensely, but each is a constant battle.  Just being aware that I have this problem, and that it's a known, not-uncommon issue has helped tremendously.  I can objectively look back over my career and realize that I have actually done well at most jobs I've had. When I left one job, my boss told his boss that they were losing their best developer,  and at another job I was kept on after nearly everyone was let go in order to help transition things to the parent company. Even at the places where I felt completely unworthy after the difficult interview, yet somehow managed to land the job, I ended up being well-respected and valued because of my pragmatism, my attention to detail, and my ability to ramp up quickly.  You would think that would be validation enough, but I tend to dismiss those things because those people obviously didn't see how much I was struggling.  When I feel especially fraudulent, I remind myself of those experiences so I can feel more calm and confident, and it helps.

Another thing that has helped immensely is setting aside time to actually research areas where I feel I'm lacking. If I can at least have a cursory knowledge of an area, enough to actually have a decent conversation on the subject, I feel a lot less incompetent.  This has proven the most difficult, because it's often led to going down a rabbit hole of related subjects that I can never possibly learn to the degree I would prefer.  Nevertheless, it's still been a boon, and I feel like I'm in a much better position than I was a few years ago.  I'm working on being less dismissive of my own ideas, as it's turned out that I've had some good ideas over the years that have worked out pretty well, but I still have a tendency to discount my opinions or second guess myself.

Oh, he comes from a broken home.  So... no coffee then?

I guess that's all the advice I have.  There's no miracle cure here, but it is manageable.  Be objective, be logical, and remind yourself that you actually have done a good job from time to time.  Most of all, remember that nearly everyone else is faking it just as much as you are, if not more.  Some people are just better at hiding it than others.

Thursday, October 9, 2014

Being a part of something special

In my career, I can think of only one time I've felt like I was a part of something truly special.  At Liquid Web, I was part of small team of developers and engineers that built out Liquid Web's cloud product called Storm on Demand.  It was a pretty amazing accomplishment for a handful of people over a thirteen month period from inception to public release.  Despite being a very niche player in the market, we were only a few months too late to be the 2nd public cloud provider after Amazon.  Unfortunately, during those few months, several other players also got their clouds out, so when we released, it didn't make quite the splash we had hoped.  Don't get me wrong, it was wildly successful, I just think some people had unrealistic expectations.  It was really an amazing accomplishment, and despite the fact that I've since left the company, I still count it as the best experience of my career thus far.

So, I think I'm about to have my second such experience.  At Rackspace, we've formed a Data Services "practice area" (practice as in law practice, not basketball practice, it took me a while).  A year or two ago, we acquired a startup called ObjectRocket that specializes in providing MongoDB as a service.  Their focus is on providing the best MongoDB experience out there and taking all of the headaches of administrating a MongoDB installation away from the developer so they can just focus on the code.  So now, ObjectRocket, the CloudDatabases teams, and CloudBigData team (I'm on this one) are joining forces to form this group.  There's a ton of work to do to get where we need to be, but it's exciting to be a part of, and even if we only accomplish half of what it has set out to do, it should be huge.

It's also possible that this ends up being a giant disaster and implodes.  Given my conversations with the leadership so far, and based on their personalities and management style, I don't feel like this is a big concern.  We've got the resources of Rackspace behind us, and we have the right type of pragmatism and excitement and technical know-how to make this succeed.

If this sounds like something you'd like to be a part of, please contact me.  We need your help.  I think this is a fantastic opportunity, and I'd love to see some of my friends and former coworkers (those aren't mutually exclusive sets) be a part of it.

Wednesday, October 8, 2014

Modern code review practices

I might just be curmudgeonly, but I'm about to the point where I won't even bother contributing to a project that doesn't use the Github pull-request model for code contributions.  There's so much pomp and circumstance with some projects, when they should be doing everything they can to welcome contributions.  Other similar systems like Bitbucket are fine, although Bitbucket's code review UI needs some help.  I've tried a lot of other systems over the years, and none of them match the simplicity and joy of Github.

The 90s called and want their code contribution practices back

I've recently begun trying to contribute to Ambari.  Ambari's contribution instructions read like a Microwave Oven's operator manual.  The process is so arcane that I'm surprised that anyone contributes.  You have to sign up for two services, JIRA and Review Board, and then you submit a patch to the JIRA ticket, as well as create a Review Board review with the same patch, and the same information as in the ticket, then link it back to the JIRA ticket and link the JIRA ticket to the review.  Here, read it yourself: Ambari's How To Contribute Page  I don't know if the Apache Foundation dictates this broken process or if the projects just take it on themselves to make the process horrible, but either way, someone needs to be informed that the 90s called and want their code contribution practices back.  Some of you might be saying, "what's the big deal?  Submitting patch files to two places and manually inviting people to review your code when you don't know who should review your code, then manually linking the review back to the ticket, and then enlisting yet another person to actually commit the code on your behalf isn't that bad" (if you're saying this, please punch yourself in the face and save me the trouble).  But there's also another catch that might be a little less obvious: attribution.  If you submit a patch, and then someone else actually commits the patch for you, the git log attributes the patch to the person who committed it, not you.  You may not realize this, but people like receiving credit for their work, even in Open Source.  Some people will simply refuse to contribute if they don't get credit, and frankly that's as it should be.  Can you imagine an actor or a writer doing work and then letting someone else put their name on it?  I can't.  Ambari puts your name on a contributors page, so it's not like there's no attribution, but that's not the same as having it show up on github, which is commonly used as a portfolio these days.

Barriers, barriers everywhere, and not a drop to drink

Openstack's contribution model isn't as terrible as Ambari.  It at least gets attribution right.  Openstack uses a system called Gerrit for code reviews.  You simply have to sign a CLA, sign up for Launchpad, download a magical tool called git-review (it handles all the arcane Gerrit shenanigans for you), and then run git-review on your committed local branch.  Then Gerrit takes it from there.  More info here: OpenStack's How To Contribute page You'll also need to read all of the coding standards, and all the information about how to format your commit messages, and oh yeah, make sure you know everything that's talked about on the mailing list because if you don't conform, your code will be rejected.  You'll likely not pass your code review and then have to re-submit the same branch for review (your new friend it git commit --amend).  Assuming you actually make it through the gauntlet of feedback and pass all the automated tests, which cover things like Openstack's extremely limiting and pedantic coding standards, then Gerrit will automatically merge it for you and push it up to Github.  So, it's usable, even if a bit overblown.  The biggest problem is that the process takes forever.  Even working with the same developer on an Openstack project and another project that's just on Github, it's amazing how much faster progress is made on the Github-based project.  I understand the need for verification of new code before it's released, but this is just a bit much.  Github lets you integrate with other systems to verify pull requests, and it works extremely well and doesn't require a ton of hoop-jumping from the developers.

I'll quit while I'm ahead

I really think it's antithetical to the spirit of Open Source to make it difficult to contribute to your project.  Github gets it right.  You fork my code, write your own code, then submit a pull request.  All the feedback and code is in the same interface, you can update your pull request based on feedback, and when it's good to go, the committer can merge it in.  Done.  If you don't want to use their issue tracker, you can integrate it with most popular issue trackers and have it automatically update tickets for you.  Whatever you do, don't make the contributor have to do this manually.

Full disclosure: I don't work for Github and I'm not a paid shill.  I just really think they nailed this process and there's a good reason they're becoming a de-facto standard.  I wish other projects would catch on.

Monday, July 21, 2014

Even the best software sucks

I rant a lot about user experience.  If you know me, you've probably heard me ramble on about it.  And when I talk about user experience, I don't just mean a slick UI.  There are a lot of aspects to user experience, like performance, responsiveness, obviousness, etc.  I'm not going to go into all of that here, as this is a rant about traveling.

I recently embarked upon a cross-country drive with my family to attend my wife's family reunion.  It was a good trip overall, except for one thing.  Google Maps!  Oh, how I have learned to hate thee.  Now, don't get me wrong, I love Google Maps.  It's saved my bacon on more than one occasion.  There have been times where it has wrong information and has led me astray, but that's understandable.  It's a difficult task to keep that much map data up-to-date and accurate with how frequently things change in our world.  What happened to me was far more annoying.  I use the built-in turn-by-turn navigation feature of Google Maps when traveling by car, and it's almost always been great.  However, a recent update has made it extremely unstable.  It crashes every hour or two.  No notification, it just silently exits and my phone eventually goes to sleep if I don't notice in time.  This wouldn't be a huge deal, except for two factors that the developers seem to be unaware of:


  1. Google Maps actually does crash.
  2. There are areas of the country that have unreliable or nonexistent cellular data signals. 


You would think this would be common knowledge, but I guess over in California, they have miles and miles of reliable data signals, and they never, ever leave their little bubble.  There's no other explanation for a few of the behaviors of Google Maps.  Let's go into them:

Google, Google, where art thou, Google?

After restarting from a crash, Google Maps forgets that you were mid-navigation.  I can understand that it doesn't want to just assume you didn't exit on purpose.  That's fine. But it could pretty easily prompt you.  "I see you were in the middle of a trip.  Resume?"  This would have saved me from so much pain that I might have forgiven the fact that the app crashed approximately 40 times on my trip.  This would depend upon Google Maps actually storing ANY trip data on the phone itself, which apparently it does not or else you wouldn't have to do the following.

Houston, I'm broadcasting in the blind.  Is anybody out there?

To restore your trip, you have two options.

  1. Type in the destination again.  Wait until it finds it.  Click on the navigate link.  Pick the route.  Click navigate again.  Wait for it to get the GPS sorted out.  Go.
  2. Click in the search box.  Wait for the recent history to appear.  Click on the navigate link, etc, as above.
So option 2 would be acceptable, if a bit tedious, if Google Maps actually kept your recent history on your device.  Alas, no, it does not.  It has to go out to the internet and get your history so you can start your trip again. Did it never occur to the developers to follow basic mobile app development guidelines and assume that the data connection is unreliable?  I can understand that Google has to spy on everything you ever do and wants to store that data online, but there's absolutely no reason it doesn't also keep a local copy that it just asynchronously updates when the internet reappears.  No data connection?  We got you covered.  But, no, that isn't how they do it.  There's nothing quite so infuriating as having the map die a few miles before a turn in the middle of nowhere and you furiously are trying to get it going again before you miss your turn, only to see a "no connection" error when trying to start the trip again.  Why isn't the trip already downloaded?  Don't you know what local storage is for?  It got to the point where my wife just loaded the trip in Waze on her phone as a backup, and while having the same directions parroted to me from two devices within seconds of each other was kind of amusing at first, it was also a sad statement about how badly Google Maps was performing for this trip.

To err is program

To me, this is just a basic feature of a navigation program that should have been solved years ago.  I'm not even getting into nice-to-have features like displaying speed limits on the map.  Seriously, why isn't that a thing?  You already have the data or you wouldn't be able to calculate how long the trip takes.  So, instead of asking for the bare minimum of caching the trip after you search for it, and not requiring an internet connection to see your recent searches, I'm going to one-up that and say what Google Maps really should provide: the ability to pre-cache an entire trip from your cushy, safe, wifi connection at home that never needs to touch the internet again the entire trip. It should download the entire route and all maps within a reasonable distance of the trip (say, 1 mile on both sides of the road, to cover pit stops and small detours). I've got gigabytes of free space just begging to be used for this. Bring it on, Google.  Based on the difficulty of your interviews, even your HR interns should be capable of providing these features.  They aren't that difficult to program.  Now, where's that bug tracker...

Tuesday, July 15, 2014

Making sense of the Hadoop ecosystem

I've been learning Hadoop for my job at Rackspace, so I wrote up a rather lengthy treatise as a primer on the various pieces of software in the Hadoop ecosystem.  After asking for some fact-checking from my team, it was suggested that I post it to the Rackspace blog instead, so I did.  They said it was too much awesome for a single post, so they had me split it into two posts, kind of like Kill Bill or the last Twilight movie.  Writing these posts is what got me to finally start this blog, so now you know where to point the blame.

Part 1

Part 2

Things have changed a little in the last month since I wrote the posts.  Maybe I'll do a Part 3 at some point with what I've learned since then.

Edit: Fixed the link to Part 1 - it had changed since they originally published it for some reason.

Thursday, July 3, 2014

On recruiting at conferences

I always find it interesting, if a bit sad, that at every conference I attend, every talk has an obligatory "we're hiring" slide.  It's always delivered with such a lackluster, almost apologetic tone, that it probably only serves to turn people off of the option, honestly.  It feels like whoever is presenting was forced to include it by an overeager recruiting department.  The same thing goes for trying to recruit by having a booth or by networking at the conferences.  A girl came up to the table I was having lunch at recently, and very bluntly said she was looking for people to come work for her company.  My response was "so is everyone else".  The fact of the matter is that people who come to conferences are sent there by their employer, which means that a) they're employed and b) their employer thinks highly enough of them to pay a lot of money to send them to a conference.  It's just not a receptive audience.  The best thing you can expect out of conferences is to get your name out there so when people are looking, they're aware of you, and all the awesome stuff you're doing, and how happy all your employees seem, and the fact that you didn't force your presenter to post a "we're hiring" slide in their talk. You're better off investing that effort and money in getting involved in the community by hosting local user groups, contributing back to open source projects, and building a good, developer-friendly reputation.

Tuesday, July 1, 2014

Anger-Driven Development

If you've been a programmer long enough, then you've come to realize one simple, universal, unassailable truth: all other programmers are morons.  Then if you've been a programmer a little bit longer, you realize that to every other programmer, you *are* the other programmer, but I digress.  There's been a slew of X-Driven Development methodologies talked about over the years: Test-Driven Development, Behavior-Driven Development, etc.  While those are all good in theory, what I find that actually gets things done is what I like to call Anger-Driven Development, or Frustration-Driven Development.

The basic gist of it is "these floors are dirty as hell and I'm not gonna take it any more!".




Here's how it goes:

1. You repeatedly have to do some mundane or menial task that the idiot who designed the system should have automated (note: the idiot might be you).
2. You get sick of doing it, and yell "Khaaaaaaaannnnnnn!!!!!" a lot.
3. You channel that anger, ignore your deadlines, and automate it away so you never have to do it again.

Or alternatively:

1. You have to maintain some software written by a moron.  You know you've been there.
2. You keep having to use some awkward-to-use, obtuse interface that only makes sense to the moron who wrote it.  Good thing he no longer works there or you'd give him  piece of your mind!
3. You cry when nobody's looking (hopefully nobody's looking).
4. You channel the rage and rewrite or wrap the problematic code so that you don't have to waste one more second of your precious life dealing with it.

That's basically it.

1. Situation sucks
2. You get angry enough to do something about it
3. You fix the glitch.
4. ???
5. Profit

The problem with all that anger is, if you bottle it up too much, it tends to come out in powerful bursts.  You become the Hulk and rage on the source of your frustration, which is counter-productive.  Joss Whedon, the wise sage, has taught us the secret in the recent Avengers movie:



I think that's about where we need to be as developers: always angry.  We maintain a base level of righteous indignation about the code we have to work with, but we keep the Hulk at bay by fixing just enough of the brokenness to keep the blood from boiling over.  Always just on the cusp of blowing up, but channeling that anger into productive change.  You let the anger drive you to produce better code, better systems, and better processes just so you can sleep at night, despite someone being wrong on the internet.

I'd like to say I'm at that point of anger zen, but I still let the Hulk out at times.  Fortunately for me, my current situation is very empowering, so I'm able to fix things as I run into them for the most part, and Hulk stays within.  For now.

Monday, June 23, 2014

Designing an Access Control System

Designing the Proper Thing

One thing I've learned in my career is that in order to build something that will outlast its original intention, you need to boil the idea down to a fundamental use case and design the interface to your system around that.  A clean interface, whether that's a user interface or an API, makes all the difference.  It's really the Unix philosophy of "do one thing and do it well".  Once you have that clean, well-defined interface, the actual implementation can be done and redone or built upon as needed without causing consumers of your code to make any changes to compensate.  And not forcing users to redesign around your short-sightedness goes a long way to making them want to use your software. Shortly after I started at Liquid Web, I was tasked with building a system that would allow us to selectively restrict access to portions of our internal applications to employees whose jobs required them to have said access.  This idea is common, and there are systems out there designed to handle similar workloads, LDAP being the most notable.  However, what I found while researching options was that many of them hoisted the bulk of the actual restriction effort on to the user interface, which I found problematic from a maintenance perspective.

Imagine you're writing an interface, and you want to selectively hide parts of the UI so people who can't do those actions aren't distracted by them being there.  You can go about that in a few ways.  For a group-based approach, you can say "Is Joe a member of the flibbertygibbit group?  Ok, he sees the flibbertygibbit widget".  You can take a roles based approach, and then the question becomes "Is Nancy a SuperHeadAdminPerson?  Ok, she can see the 'delete user' button.".    In either of these cases, what happens when the organization restructures itself and suddenly the flibbertygibbit group has all new responsibilities that don't include the flibbertygibbit widget?  What if a new level is inserted above SuperHeadAdminPerson (SuperDuperHeadAdminPerson), and now SuperHeadAdminPerson is actually no longer allowed to delete users?  Why, you get to wait until the developers have time to retool the entire interface to fix the issues with people seeing the wrong things.  That's just bad voodoo, so I wasn't too keen on taking that approach.  I had to find a better way.

After much pondering, some caffeine, probably a nap or two, and much wasting time on Reddit, I found a light bulb and turned it on.  What access control boils down to, after you strip away all the groups, and roles, and everything is: "Can this person do this action?".  It's deceptively simple.  Can Jimbob give this customer a $1 billion credit?  Can Suzie delete this SuperUser account?  Can Dr Evil get sharks with friggin lazer beams on their heads?  It's really that simple.   So, given that simple idea, could we design a system around the idea of "Can $user do $action?".  I should note at this point that LDAP could do that at the time, but the way it did it wasn't very natural, and it wouldn't scale to the level of thousands of possible 'actions' we were envisioning.  It's possible that has since changed, but I haven't had to revisit it so I don't know.  So we went about designing a system that would answer this question quickly, frequently, and with maximum flexibility.

Designing the Thing Proper

They say when you have a nail, you want to hit it with a hammer.   Wait, no, what was it again?  I can't remember.  Anyway, when we went to design the system, we opted to use a) Perl and b) Postgres, because both are excellent tools for building performant, scalable systems, and they just happened to be what everything else at the company was using (except for some crappy legacy systems using mysql).  Oh, and I forgot one other critical piece: memcached.  Now we had a stew going.  Actually, not quite yet.  We didn't want a situation where we had to go in for each user and every possible action and flip a toggle.  That just wouldn't scale.  Thousands of users, potentially tens of thousands of actions, that's just begging to grind to a halt.  I was wishing that there was just some nice hierarchical data structure where we could define blanket permit/deny statements, then give more granular exceptions to that, like "Jimbob can do anything to an account... except delete it.".    While I was pondering this, my boss kindly walked over and yelled something in my ear.  "LTREE you idiot" he said.  His version of reality may vary on that.  If you don't know ltree, go check it out, it's amazing.  It's basically exactly what I wanted.  With that tidbit, I was off to the races.  Now I could set permissions:

jimbob CAN: Account
jimbob CAN'T: Account.Delete

And then when I ask the question:

Can jimbob create an Account? - yes!
Can jimbob update an Account? - yes!
Can jimbob delete an Account? - no!

But I didn't have to specifically tell the system about Account.Create and Account.Update, because jimbob already can do Account (there's an implicit .*, so Account = Account.*)

So going back to my original comment about clean interfaces, as long as the system would continue to answer the "Can $user do $action", it didn't matter how much complexity was leveraged to come about those answers.  All any consumer of my system required to know was the answer to that question, and its thousands of siblings.

So, despite my earlier comments about groups and roles, they are still useful constructs for defining what people can or can't do, as long as you're not exposing that level of detail through the interface.  So, to make a long story just a wee shorter, this was the basic structure we came up with for how the system would answer the ultimate question:

Action - the thing that can or can't be done
Role - a collection of rules about which actions can or can't be done
Group - a collection of users that are assigned the same roles
User - can have roles, be assigned to groups that have other roles, and can also have user-specific rules about additional actions outside the scope of its other roles

So, to answer the question, it was a possible multi-step process:

1. Can this user do this action?
2. Can any role this user has do this action?
3. Does any role on any group assigned to this user have permission to do this action?

This setup gave us a huge amount of flexibility in defining permissions for various users, and we then mapped all the company departments to groups, and put their employees (users) in those groups.   Now we had the best of both worlds: broad definition of permissible actions to whole departments, with the ability to make exceptions where needed.  And the consumer of the system still only cared about one, simple thing.

We made pretty extensive use of memcached to prevent repeated calculations of the same data in quick succession, and from that we had a system that still uses very few resources despite powering access control for both our public API and internal intra-department API, as well as many other internal systems.   Not bad for a few days work (ok ok, it really took about a month).

What's the big deal?

So, I keep going on about how a clean interface enables you to become one with the universe or something. Due to the flexible design of the system and the single clear point of interaction, we were able to adapt it for use by customer accounts with our public API and web UI with about 2 hours of work to allow for running multiple copies pointing at different databases.  When we needed to add rate-limiting to the API, we knew that it was really just a slight adjustment on "Can $user do $action" to be "Can $user do $action... again?".  All of the same idioms lined up, so we simply added a layer to track requests to the methods in a performant way so that asking that question became cheap enough to be useful for that purpose.

This was really a fluke project.  I've never had another project that lasted as long as it has with as few modifications required to keep up with the needs of the company.  Given hindsight, I should have made a simpler way to define the roles, as that was the biggest stumbling block for people who worked on the system later, and I was maybe a tad too over-aggressive with the memcached use, but besides that, it's held up extremely well.  Perhaps those who are now maintaining it will disagree with that assessment.

Tuesday, June 17, 2014

My thoughts on technical interviews

If you know me at all, you've probably heard me rant about technical interviews.  For those who don't, perhaps a little background is in order.  I didn't finish my CS degree (well, technically I was enrolled in Computer Engineering, but I didn't finish that either).  I started working in the dot-com boom of the late 90s to save up some money to go back to school, and then I started making too much money to qualify for aid, but not enough to actually pay for school.  So, I kept working.  I've consistently been a top performer on each team I've been a part of, and I've written some fairly complex systems over the years.   However, I've always worked with web-based technologies and always in dynamic, high-level languages (Perl, Javascript, Python).  Undoubtedly, if I had finished my degree, I'd have probably made different choices as to how to approach those problems, but for the most part, my lack of degree hasn't prohibited my ability to write good software (several people with degrees, some advanced, have complimented me on my code over the years).  So, when I rant about interviews, understand where I'm coming from.

The problem with technical interviews is they are almost solely focused on scholastic CS knowledge or trivia.  Despite all the evidence to the contrary, people continue to insist that this is an appropriate way to interview programmers.  Before I explain how I think interviews should go, I want to provide a few thoughts on what's wrong with the current process.

Google says there's no correlation between interview and job performance

Google, a company well-known for its difficult technical interviews, and also a company well-known for making data-based decisions, keeps track of interview performance statistics and tracks them against on-the-job performance.  They've concluded that there is absolutely no correlation between the two.  This means that as a measure of ability to perform on the job, the types of questions that Google asks (mostly advanced CS style problems) are useless.  If you don't believe me, believe it straight from the mouth of the person who tracks this data at Google.  In an interview with the NY Times, SVP of People Operations Laszlo Bock said:
Years ago, we did a study to determine whether anyone at Google is particularly good at hiring. We looked at tens of thousands of interviews, and everyone who had done the interviews and what they scored the candidate, and how that person ultimately performed in their job. We found zero relationship.
See http://www.nytimes.com/2013/06/20/business/in-head-hunting-big-data-may-not-be-such-a-big-deal.html for the full synopsis, there's some other gems to be found.

I fully agree with this assessment (it's hard to disagree with the data, anyway).  The ability to solve complex CS questions in an interview setting is so far removed from what someone will encounter in a day-to-day programming job that I don't even bother asking these sorts of questions when I do interviews.  It tells you so very little.  I've worked with people with advanced degrees that ace the interviews, then they peter out on the job because they're bored with the mundane tasks they get assigned.  The mentality of someone who enjoys building a software product vs those who enjoy understanding advanced CS theory is vastly different.  If you can find someone who does both well, then kudos to you.  I've only met maybe a handful of them in the last 15 years.

Stress is the mindkiller

Interviews are stressful.  Most introverts don't do well under stressful conditions.  You might say "well the job will be stressful", but it's not the same kind of stress.  Many introverts, myself included, view an interview as more of an amalgamation of an interrogation and a test.  Under that sort of scrutiny, it can be overwhelming to be asked something that we don't readily have an answer to.  Even after conducting a few dozen interviews myself, I have to subvocalize reminders to myself when I'm interviewing to remember that not knowing the answer is ok.  This took a long time for me to realize, and I still don't always remember.  At a minimum, you should help the interviewee feel comfortable before digging in to the difficult questions.  Even better, if you can see they're totally stumped, rather than beating a dead horse and turning a bad interview into a nightmare, move on to another question that might be more apropos to their experience.

I know he can get the job, but can he do the job?

The person coming in has no idea what sorts of questions they're going to be asked, because frankly, most interviewers ask about things completely irrelevant to the job they're hiring for.  I once got asked how to reverse a string in place for a PHP developer job.  With my background in Perl, the answer is simply: reverse $string.  It's built in to the language.  This wasn't acceptable, obviously, because it's a programming interview.  Having never encountered this problem before, I got stuck for a while, and then finally worked it out after one of the interviewers reminded me that in C, strings are arrays of characters (a fact I'd forgotten over the previous decade of working with Perl and Javascript, where strings are built-in types that can't be manipulated as arrays).  Looking back on it now that I have solved it once, it seems rather silly that I couldn't figure it out quickly, but that's how interviews are.  Chances are the interviewer took much longer than an hour to learn what they're quizzing you on, so how is it fair to expect someone to figure it out in less time than that, under the scrutiny of the interview?   I quickly answered much more challenging problems, so at the wrap up one interviewer even called that out and asked me why I struggled with what he considered to be a basic problem while acing more difficult ones (I found the others to be trivial, given my experience).

For another job, the interviewer asked all sorts of in-depth theoretical questions like "how would you design a system to determine addresses within a certain radius of a certain address, given a billion addresses".  I sure expected to be doing super-complex stuff when I managed to get the job (despite not answering those questions well, I did well enough otherwise that they took a chance on me).  Guess what I worked on for my first project?  Nope, not the next Google Maps, not even the next Gmail, we added a blog feature to our website builder.  A simple blog: posts, comments, tags.  That's it.  And since it was assumed that I couldn't do anything complicated, the boss pre-wrote all the model layer for me.  Nevermind that I'd done this sort of thing for years, and went on to fix several bugs in his implementation as well as add new features and optimized some of his queries.

So, try to keep the technical questions to things relevant to the job they've applied for.  A web developer might not know what a directed acyclic graph is, and they really shouldn't need to to be effective.  Likewise, an embedded systems programmer might not know how to optimize the delivery of web assets to the browser.

To whiteboard or not to whiteboard, that is the question

Whiteboarding code or pseudo-code is fairly common in programming interviews, but I've never had a good experience with it, on either side of the table.  It's hard to write code on a whiteboard.  Save the whiteboard for designing object hierarchies or other high-level concepts, like the things you'd actually use a whiteboard for in your real job.

Some who agree with the uselessness of whiteboard coding have moved on to something which is, IMO, far worse: live-coding.  I'd say never have someone live-code, but if you are going to, at least never have a person live-code on a machine that is vastly different from their normal experience.  Ask ahead of time and make sure you get their editor of choice and OS of choice available to them, and try to avoid awkward hardware like tiny keyboards that they might not be used to. Programmers spend a great deal of time optimizing their environments to their particular taste.  Asking them to just sit down and code in a foreign environment under time pressure is a recipe for disaster.  

Let me share another anecdotal bad example.  I once had an interviewer sit me down, and without spending any time on pleasantries apart from "this is my name", handed me his tiny laptop and gave me the choice of vim or emacs to start coding in.  I can get by in vim, but it's hardly my editor of choice, and I have a hard time typing on tiny keyboards.  He and the other interviewer (who said maybe two words the entire time), sat on either side of me and watched over my shoulders as I coded solutions to their trivia problems.  The first one was relatively easy, and despite the environment, I knocked it out pretty quickly.  After that, he gave me what he described as an "NP hard" problem (note: using terms like "NP hard" make you sound like an asshat).  The solution to this problem was only sort of complex if you've solved it before, but having never had a problem even remotely similar to this, I was stumped.  When I'm stumped, I clam up for a bit.  In real job situations, I would turn to google for a bit, and if still stumped, I'd engage coworkers.  In an interview, I have trouble discussing the problem, since I don't know the interviewer and I'm horribly shy around new people until I know where they're coming from.  So, I futzed my way through it, and we ended the interview with my having a vague idea of the direction we needed to go, but nothing even close to working.  It took me a while to recognize the need for recursion, despite using recursion fairly frequently in my work.  Needless to say, I didn't get the job.  They actually interrupted the next interview (which was going shockingly well) and sent me packing.  Another note, don't do this; it's humiliating to all involved.  The people interviewing me, the person who had to escort me out, and obviously me.  Given that experience, I wouldn't ever apply to that company again, and I've steered more than one colleague away from them.  Frustrated by my inability to solve the problem,  I spent an hour the next morning in my own environment and hashed out a solution to the problem.  Not to toot my own horn too much, but I seriously doubt the interviewer solved it that quickly the first time they encountered it.

People can only implement algorithms they've already seen


I always see the justification behind doing live-coding or whiteboarding in interviews as that people need to understand the "basics" or they won't be able to handle the bigger things. There's two faulty assumptions in these statements.

1. What you consider the "basics" is not universally true.  Some people consider set theory and graph theory to be basics.  Other people consider advanced data structures or algorithms to be basics.  Given that the interviewee is not you, what they consider necessary knowledge differs, and you shouldn't expect them to have all the same experiences as you.  Someone who gets stumped on a question about data structures, but has only worked with higher level languages, might not be a bad hire, they've simply never had to implement what you're asking about.  Talking to them about building distributed systems, if that's something they've spent a good deal of time doing, is a much better way to gauge their level of skill.
2. If the person hasn't seen the pattern before, they aren't going to be able to solve it under the stress and time constraints of an interview.  Human brains are pattern matching machines.  So, if the interviewee has seen a pattern before, even if the problem was different, they might well be able to solve your problem with a little prodding.  If they've never seen it before, no matter how much prodding you do, they aren't going to be able to solve it.  On the job, they'd have additional resources to discover the solution, so having them do it in an interview setting doesn't give you any indication of how they'd do it on the job.  I've read about this issue in several articles and books, but I don't keep notes like I should, and I can't find the references to share.  Perhaps I'll rediscover them and update this at a later date.

But, without CS trivia and whiteboard coding, what is left?

Given that I think that most traditional programming interview techniques are largely worthless, you may be wondering how one should actually conduct a programming interview. I basically consider this a three step process:

Code is worth a thousand words

The first part of an interview should involve evaluating code the candidate has written in a normal setting.  You should get this code ahead of the interview, evaluate it beforehand, then discuss your findings with the candidate during the interview.  

As for how to get the code, there are basically two options:

1. Ask for a code sample of something they've worked on that they're proud of.  A github profile is ideal, but barring that, they can email in some code. It shouldn't matter if the code is in a language that isn't used at your job.  Good programmers can pick up new programming languages.
2. If they really don't have any code to share (this should be a red flag, but is not always a nonstarter), then give them a problem to solve in the domain of the job they're applying for and a set amount of time to complete it.  The problem should not take an experienced developer more than a few hours to solve, so keep it basic.  Again, avoid CS theory quiz type problems.  Use something practical (if the job is working with REST APIs, have them write a client for an existing REST API, for example). Now you have a code sample.

Given the code sample, you can get a feel for how the programmer thinks.  Look for things like how they structure their object hierarchy, how well they comment their code, how readable it is, does the API to their objects make sense, is it consistent, are there any glaring performance issues, does it actually work, did they write tests, etc.

Dig into their experience

From the same interview with Google mentioned above, there's this gem:


Behavioral interviewing also works — where you’re not giving someone a hypothetical, but you’re starting with a question like, “Give me an example of a time when you solved an analytically difficult problem.” The interesting thing about the behavioral interview is that when you ask somebody to speak to their own experience, and you drill into that, you get two kinds of information. One is you get to see how they actually interacted in a real-world situation, and the valuable “meta” information you get about the candidate is a sense of what they consider to be difficult.

This jives with my experience conducting interviews.  Everyone has some experience working on a programming project if they're applying to be a programmer.  For recent grads, that might just be a project they did for school.  Regardless, having someone speak to their experience, and drilling in for details on parts you find interesting or disturbing, is a great way to get a feel for how a potential candidate thinks.   Go in depth on their design decisions, what other approaches they considered, what they'd do differently now, etc. This will tell you far more about their abilities than any CS knowledge question possibly can.

Determine cultural fit

Last, but certainly not least, is to determine cultural fit.  Is the candidate going to work well with your team?  Do they react badly when confronted?  If you're a startup type company, do they work well with shifting priorities and quick iterations?  If you're a huge behemoth, do they work well having to coordinate with 15 other departments just to get their work done?  Do they have an agreeable personality that you would be able to work with?  I once gently corrected an interviewee, who then went on to tell me I was wrong (I wasn't).  Despite my objections, he was hired, and he lasted maybe 6 months before he left.  He was abrasive and people weren't sad to see him leave.  He left because he didn't think people listened to him (they likely didn't, because he was wrongheaded a lot and didn't react well to constructive criticism).  This one is harder to accomplish than the previous two, as some people are experts at hiding their personalities in interviews, but do your best to weed out the people who aren't going to fit in on your team, no matter how technically brilliant they might be.

The exception to the rule

Every rule has exceptions, so let me state that there are exceptions here.  If your team is developing a programming language, or designing a new data storage system, or working in embedded systems, the normal types of programming interview questions probably apply.  However, these sorts of positions are exceedingly rare, yet these sorts of questions are exceedingly common, so there's a huge disconnect here.