Friday, February 26, 2016

In need of a muse

You often hear about artists having a muse, someone or something that inspires them and pushes them to excel at their craft.  I think that concept applies to programmers as well, although I've never heard someone use it in that context before.

muse

 noun

Definition of MUSE

1
capitalized :  any of the nine sister goddesses in Greek mythology presiding over song and poetry and the arts and sciences
2
:  a source of inspiration; especially :  a guiding genius
(Source: http://www.merriam-webster.com/dictionary/muse)

I've had a few coworkers over the years whose excellence and creativity pushed me to try harder and push further than I expected myself capable.  I do have a fair bit of intrinsic motivation that sees me through in their absence, but sometimes I wonder how much more I could be if those folks were still around to push me further.  I don't know exactly what it is that makes the relationship so special, and why it doesn't seem to exist for most coworkers.  In those special cases, the coworker just seems to get where you're coming from, but rather than serving as an echo chamber helps you refine and improve your ideas.

The first muse I can remember really pushed me to understand the performance characteristics of the code I was producing.  He understood the internals of the programming language we were using better than I did, and the things he brought up in code reviews helped me to start really considering what my code was doing under the hood.

In another case multiple coworkers actively disliked my muse, but for me, he always cut through the BS and pushed me to do a better job, which is what I personally needed.  He was able to explain what I meant to others so that we actually made the tangible improvements I'd wanted to see but couldn't adequately sell.  He had great ideas but often would not get much beyond the minimum viable product before moving on.  But seeing what he started, I could easily envision where it needed to go and was able to more fully flesh things out.  And then he'd build on the ideas I added.  And so on.  I've always been fairly microcosmic in my approach, and he taught me to look a bit more at the macrocosm.

A different muse was able to talk me through my ideas and whittle out the weak points so I could both better understand them myself and better explain them to others.  He would often just ask "why" and that required me to think more objectively about the issue.  He pushed me to not give up on making improvements to the system even when I was initially shot down by management.  He taught me the value of teaching others, especially those with more decision-making authority, about the value of my ideas, that ideas don't always carry the same weight to others as they do to me solely based on their merits.  He pointed out the flaws in my logic and made me rethink some misguided decisions of my own.  He helped me put things in context.

Maybe calling them muses is inaccurate, but to me that's basically what they were.  I miss them.  I miss how great I felt learning and growing at such a rapid pace.  I miss the rapid feedback we gave each other.  I'm forever grateful for their influence.  I hope I was able to be for them even a fraction of what they were for me.

I'm not dissing my current coworkers; they are great and I'm glad to be on the team I am.  We really are doing some cool things, and the company is far better to work for than any of my previous employers. I guess I'm just feeling a little nostalgia for unreclaimable moments of great influence on my career and life.

Sentiment Analysis and so can you!

I wrote a 3-parter for the Rackspace blog about our Sentiment Analysis sample application.  Mad props to Pankaj Channe (scoring algorithm) and Jeremy Arntz (UI) for working with me on it.


Part 1 (how to set things up):
http://blog.rackspace.com/how-to-twitter-sentiment-analysis-demo-on-rackspace-cloud-big-data/

Part 2 (Spark code):
http://blog.rackspace.com/behind-the-curtain-twitter-sentiment-analysis-demo-sample-app-code/

Part 3 (node.js code):
http://blog.rackspace.com/behind-the-curtain-visual-sentiment-results-with-node-js/

Software Development Best Practices Aren't

Every once in a while I hear someone mention, in relation to developing software, that we should follow "best practices".  I often wonder if the speaker is just couching their suggestion in vague language because they have no idea what it means or if they really think there is some grand list of "best" practices that everyone should follow to guarantee success.  Really, there's two types of programming practices: "hipster" (or bleeding-edge, unproven, ooh-squirrel, or whatever you want to call the new shiny) and "legacy" (the stuff the old guys do because it works, but it's boring, and really a lot of times it's kind of crappy, but it's a known evil).  I think there's some imaginary middle ground there where "best" lives, but it's such a moving target that nobody can hit it.  Just in my time as a developer, web development has gone through at least these "best" practices:


  1. LAMP stack (Linux, Apache, Mysql, Perl)
  2. LAMP stack (P is now PHP)
  3. MVC Frameworks (Ruby on Rails, Django, etc)
  4. Async all the things (memcached, node.js, nginx, etc)
  5. MEAN stack (Mongo, Express.js, Angular, Nginx? - I forget)
  6. Single-page apps
  7. MSA (Micro-Service Architecture)
  8. I could go on, but I won't
Which of those is the "best" practice now?  I honestly don't know.  I'm totally ignoring the insane number of client-side frameworks that have come and gone (prototype/scriptaculous, yui, dojo, jquery, angular, gwt, meteor, I'm forgetting about a thousand).  RSS was cool, then it was all SOAP, but SOAP was too complex, so XML-RPC, but wait, we hate XML, so RSS again, wait Javascript is easy and XML is basically a piece of garbage, so JSON, and why don't we just use HTTP for its designed purpose, so REST all the things!  Chroot works for sharing hardware, but no we should use VMs because security, oh wait VMs actually have a lot more exploits than containers, and containers are more performant, and ooh neat, Docker makes them not painful, and ooh Google uses containers. Configuration management for a fleet of servers is hard, so cfengine, but Perl is icky, so Puppet/Chef, but Ruby is ugly so Ansible/Salt, but centralized servers don't scale that great, so Consul/etcd. And service monitoring, and metrics, and distributed filesystems, and and.

Don't get me wrong, a lot of the things in there are neat and could definitely prove themselves long-term, but just given how quickly the community has rallied around some of them and then dropped them like Neil Degrasse Tyson dropped the mic on that clueless rapper just as quickly, I'd guess that you shouldn't be adopting half of them just yet.  And often the better technology loses to the one with better marketing, or the one that's easier to get started with. 

So, the next time you say "best practice" about software development, you might want to clarify what you think is "best".  But be prepared for a lot of people to think you're the dumbest person that ever lived because you don't use whatever pet technology they think has redefined the game.  Because that will happen if you work with other developers or engineers.

Remember, when people say "use best practices", what they really mean is "do things my way".  Software engineering is a relatively young discipline, and technology moves so rapidly that there is no "best" way.  Build it the best way you know how and try to learn a few new things in the process.  Get feedback via code reviews to learn from others with different backgrounds.  Try new things, see where they improve upon the old and where they fall down because they haven't had the time to harden.  Keep moving forward, but don't abandon everything you already know to work well.  That's really the "best" practice you can do.

Good luck!

Monday, June 22, 2015

On being a full-stack web developer

One of the recent buzzwords in the industry is "full-stack engineer" or its many derivatives.  I've seen some backlash against the term to the effect that nobody can possibly excel in all the disciplines necessary to truly be a full-stack engineer, so we should stop using it.  That saddens me a bit because a) I do and b) a lot of people who include the buzzword on their resume are not and because of that, people don't believe me that I am.  I'm not prone to bragging about myself.  I tend to shy away from the spotlight.  But I'm going to put that aside for today.  I'm here to tell you what I think a good full-stack engineer can accomplish, based entirely on examples of my own experience in my career.  At my previous job at Liquid Web, I was hired on as the 3rd developer in the company (not ever, just concurrent, several had come and gone).  With such a small team, there's no room for specialization, you just do what needs to get done and learn what you need.  I excelled in this environment because I had a strong foundation in many of the areas of responsibility and a keen desire to learn the remaining.  I was there before we had any QA.  I was there before we had any ops for our internal software (the whole company were basically ops for the customers, so I have to make that distinction).  Here are some highlights of what I worked on there, from frontend to backend to ops to networking.  I do this to better explain what I mean when I say I'm a full-stack engineer and that being such a thing is possible.

Storm Management Console

We built a web interface for managing your account and servers.  Initially, the Storm product was a single page in our old PIMS web interface, but we quickly realized that the new system was going to be vastly more complex and decided to start from scratch.  Our sole designer in the marketing department delivered me some jpegs of mocked-up screenshots of how the site should look.  From that, I wrote the HTML, CSS, and Javascript required to bring it to life.  This was back before the Javascript renaissance that came after node.js got all hip and trendy, but was after things like jquery were established, so it wasn't the dark ages or anything.  We decided to use jquery for the DOM manipulation, but the rest of the codebase was entirely custom.  It was built first only for our Storm on Demand sub-brand, but eventually it was repurposed through some clever code-sharing to be used for both brands.  To accomplish what we needed, I built the following:

  • A publish/subscribe event system for objects to interact with each other cleanly
  • An AJAX caching system to avoid repeated queries for the same data
  • A CSS/Javascript minification/concatenation system that was cache-friendly (i.e. it would expire the caches whenever you made changes, so you never had stale code, but otherwise would serve from cache)
  • A component framework to allow reuse of self-contained sections of functionality among multiple different pages/views (what I wanted was full data-binding of the model/view layer, and I got close, but not quite)
Most of those things are easily found out of the box now in the new shiny frameworks, but in 2008 they weren't common.  

The Storm API

When we were initially discussing building out the Storm product, we all agreed that we wanted to have an API for customers to interact with our system.  Similarly, we wanted to shift towards a service-oriented architecture, because Amazon (and it being a good idea).  So we decided two things:
  1. We would build out our UI entirely from our API, ensuring that we exposed everything necessary for our customers in our API and giving us much better test coverage for our API.
  2. Our public API would wrap an internal-use APIs that we could expose to other departments to accomplish a true service-oriented architecture internally at the company.
Both of those decisions had performance implications, so we tried to optimized the stack as much as possible, but I was never happy with the overall performance.  I felt, and still do, that we should have done more to streamline that as much as possible to improve the responsiveness in the UI.  But I was vetoed due to more pressing concerns.  Anyway, that's tangential to the discussion at hand. 

I built the framework on which both the internal use and public APIs were built.  I extended the auth/access control system I had previously built to cover the needs for both APIs, including adding rate-limiting and making it reusable for customers as well as employees.  I built the framework that translated the input validation and code documentation into customer-facing public docs.  I built the versioning framework that allowed us to selectively update methods in new versions of the API and still keep the original versions intact for backwards-compatibility.  I came up with and wrote code to enforce our naming conventions that allowed us to provide a very consistent API to our customers.

The Core

At the core of all of our systems at LiquidWeb was a set of libraries that began life in the LiquidWeb billing system.  They expanded to encompass most of the various areas of the company, including tracking things like accounts, IP assignments, server locations, credentials, and, of course, billing records.  The only thing that wasn't part of that system was HelpDesk, which was used to track customer support interactions.  The internal-use API wrapped around these libraries, which contained all of the core business logic for the company.  A few of the projects I worked on here included:

  • Automating the assignment of IP addresses from available pools of IPs
  • Automating configuration of routers and switches via SNMP and SSH for things like VLAN and ACL management, ARP cache flushing, etc.
  • Adding hourly billing to both the old billing system (a colossal hack on top of a cruft-ridden legacy system) and to our new billing system as well
  • Added support for roles to our core framework
  • Profiled and optimized our input validation code that was hit quite frequently
  • Added generic support for many types of relationships to our model classes in our custom framework

Testing

As part of the rollout of Storm, we hired our first QA engineers via internal promotions.  At first, they mostly focused on manual testing, and we were running into a lot of bugs with the provisioning system, so I extended our unit-testing framework to do some automated functional/integration tests of the provisioning system.  We already had good unit test coverage for everything in the API layer and a lot of the core code, but the provisioning system didn't have unit tests.  So, I wrote tests that ran as part of our automated unit test run every night that tested the following:
  • Create a server, wait until it's done, verify that the state is correct in the core system and that you can ping it and log into it via ssh.
  • Resize that server to a larger size and a smaller size, verify that the state is correct in the database, that the billing adjustments were made as expected, and both ping and ssh still work
  • Clone a server, verify that the clone is the same in all respects except for the name and any settings that were overridden as part of the clone process, and verify that both servers are still reachable via ping and ssh
  • Adjust firewall rules, verify that the specified ports are enabled/disabled as expected
  • Delete a server, make sure billing was adjusted, make sure you can no longer ping it or log into it.

Devops

When I started, all of the developers had a single shared development server where we could push edits to test, but we had to manually coordinate that so we didn't overwrite each others changes.  My boss had already begun on trying to build individual development environments, but was too busy fighting fires, so one of the first tasks he gave me was to build my own development environment in a way that could be repeated for others.  So I did.  I had to learn a lot about the existing application stack, that was different from what I had used in the past (fastcgi vs mod_perl).  When we decided to build out a staging environment, I pointed out that we should take the opportunity to automate environment deployments so we wouldn't have to keep manually updating servers.  Both Chef and Puppet were in alpha/beta stages of their development, and cfengine seemed too complex for what we needed, so I ended up writing some scripts to automate the installation of our software and the configuration of the servers we deployed on.  I called them the push scripts, because they controlled our code pushing process.  I iterated on them as we built out the staging environment as a first step, and eventually we migrated all of our production and dev environments to use them as well.  It was built on a layered YAML configuration file, where you could paint broad strokes at the application, service, or environment level, then have environment or server-specific overrides for differences.  Some people found the layering confusing, so maybe it wasn't the best idea in retrospect, but it was a powerful concept once you wrapped your head around it.  It prevented a lot of repetition of the same values for different environments or applications that shared the same underlying settings.  The values in that file would be used to populate templates or passed into modules that would then use the values to decide how to install and configure their service.  I could write a whole blog post about the architecture, and I might some day, but it was very flexible and saved us a lot of headaches. I've since used Chef and Jenkins to do much of the same things, and I have to say that for our needs, using our push scripts was much easier than Chef, and it ran in a fraction of the time that Chef takes to do an update, so sometimes having something that's tailored just for your needs has benefits.

What I didn't do

The only area that I didn't work on much was the server provisioning system itself.  I would have liked to, but when we started the project, the other developer didn't have much frontend experience, so he ended up building the provisioning system and I handled the rest. As the team expanded, we kind of took lead on those areas separately and didn't cross-pollinate much. I did contribute some bug fixes here and there, and helped diagnose many issues through my automated testing, but other than that I didn't get the opportunity.  Thankfully my current job affords me the opportunity to work on a provisioning engine, and while it's at a different level (provisioning clusters vs servers - we rely on the cloud to provision the actual servers), many of the same concepts still apply.

I apologize if this comes across as grandstanding or bravado; that's not my intent.  I just get tired of the rhetoric.  I've heard/read many people saying that "full stack is impossible" and that people proclaiming to qualify are lying.  I can't vouch for anyone else, but I can vouch for myself.  I am a full-stack engineer.  I don't know everything about everything; I know very little about systems programming, mobile app development, or desktop application development.  I'm not the guy to build the next big datastore, or write a new OS, come up with a new machine learning algorithm, or devise the next big protocol.  For web-development and something-as-a-service development, I do know and have worked on the entire stack, including devops.  I can build on the core concepts developed by those with more knowledge of theory than I have and create great software for actual users.  I've spent the hours fighting with antiquated browsers that cling to market share (*cough* IE6 *cough*).  I've dug deep on extremely complicated billing issues to figure out why we billed a customer incorrectly and how to correct it. I've profiled and optimized frequently-used algorithms to speed them up by as much as 80x (* gains not typical).  I've built authorization and access control systems from scratch.  I don't personally find any of these things amazing.  I kind of expect other developers to just dive in and learn what they need in order to do what they need to get done, but I'm learning that not everyone is this way.   For those that aren't, don't assume others are the same as you.  Some people do dive in head first and learn all of the disciplines around web-development enough to be called full-stack engineers.

My current position doesn't afford me the opportunity to work on the entire stack, sadly, and I've recently realized that as screwed up as web development can be, I do miss it.  It's always exciting to see your creations come to life in a browser, and not having control over that aspect weighs on me a bit.  Maybe some day.  In the meantime, I'm learning a lot about data stores and having fun finally working on a provisioning engine.

Some random thoughts on webapp performance

A recent discussion got me thinking back in terms of website development, so I thought I'd jot down a few ideas that I'd like to try to implement when I get some free tuits. Maybe they'll spur some creativity in others, or at least you can tell me why I'm insane and this couldn't possibly work.

Server-side vs client-side rendering

Back when node.js was springing up in popularity, a lot of the focus around it was to do with how it uses an asynchronous event-loop to manage a lot of connections per process (since Javascript is single-threaded, parallelism is only managed by pre-emptive multitasking similar to how things worked back in the Windows 3.1 days).  I was confused by this for a few reasons.  Why not use process pooling to run multiple copies of the node server then?  You'd be able to peg each core with a separate process at least, and at least improve how many tasks you could do in parallel.  This would at least scale the number of requests you could serve simultaneously and maximize your hardware use.  Have each listen on a different port, throw a loadbalancer in front, voila!  Maybe they're doing that now, but there was no talk of it at the time.

Anyway, I've gone on a tangent.  For me, the killer use-case for having Javascript on the server is to share libraries on both server and client.  On the server side, your data source would be a database or an API or whatever you're using, on the client side, the server's REST interface would be the data source (or if you set up CORS, the original REST API could be the source on the client as well).  This would buy you a few things.

Pre-render the whole page server-side.  

One of the biggest problems with SPAs (Single Page Apps) is it completely offloads the rendering to the client.  You send down a minimal set of javascript code and templates, then the javascript code fires off a bunch of AJAX requests to pull down the data and renders the templates.  This works well on modern laptops and desktops, but for phones, resources are much more constrained.  Not only are you doing all of the rendering on their CPU, you're hammering the network with all of the AJAX requests to pull in data.  Rather than do that, you could, using the same exact rendering path as the client-side, pre-render the full page's initial state and send only that over the network.  Initial page load times will be much faster, and no more "the page is here, but wait while we load the data" frustration.

Render updates client-side.  

Now, here's the rub.  Once you have the page downloaded, you don't want to have to hit the server to re-render the whole page every time something changes.  That would just be silly.  Since you've got the same rendering paths available client-side, everything being Javascript and all, you could simply re-render the section of the page that was updated by the new data received by whatever AJAX request you sent.  Smaller, incremental updates done with a minimum of overhead.

It seems that my ideas were a bit ahead of their time.  I had a conversation with a coworker about this back in 2012 or so, and a recent poll suggests that this sort of dual-rendering path is finally becoming popular.

https://medium.com/javascript-scene/javascript-scene-tech-survey-d2449a529ed

The only advantage left for SPAs is that you can push out all of your content to the edges by using a CDN, but you can't push your REST interface to the edge, so all of those AJAX requests to grab data are still hitting the same limitation.  You could still put all of your Javascript and templates in a CDN and get some of the benefit there, I suppose.

Queued AJAX requests/multipart REST

Maybe something like this exists, but a quick googling hasn't found any results based on keywords I thought to look for.  I was thinking more about how to reduce the number of requests required by a website.  For context, at my previous job, we had a UI that was broken down into a bunch of components, and each component was responsible for updating itself via AJAX.  Some did this based on a timer, others did this based on a publish/subscribe framework that would alert them when their data source had potentially changed.  I wanted to expand this to use websockets to listen for updates coming in from the server-side as well, in case you had multiple users accessing the same account at the same time, but that use-case wasn't deemed common enough to justify the development effort, and the idea sat.  Anyway, in this case, we were very often firing off a bunch of AJAX GET requests to pull the latest data and re-render sections of the page.  So, given a framework like that, which I imagine isn't all that uncommon, I was thinking, "what if REST had a multipart GET request to grab a bunch of resources in a single request".  I haven't fully fleshed the idea out, but similar to how you upload files using multipart/form-data, you could do a multipart GET request.  The request would look something like:
GET /
Content-Type: multipart/rest; boundary=----------------------------12345
------------------------------12345
GET /resource/A
------------------------------12345
GET /resource/B
And the response would be something like:
Content-Type: multipart/rest; boundary=----------------------------12345
------------------------------12345
Location: /resource/A
Content-Type: application/json
Content-Length: 10
Status: 200 OK
{"a": "b"}
------------------------------12345
Location: /resource/B
Status: 404 Not Found
This would avoid the overhead of the network and HTTP protocol for all the additional requests and let you get a bunch of resources in a single request.  Since this would require modifying the HTTP spec, something a little easier would be to just have an API endpoint that does this for you in a single body:
GET /multiple?resource=/resource/A&resource=/resource/B
And the response would just encapsulate all of the resources in a single body:
Content-type: application/json
Status: 200 OK
{ "/resource/A": {"a": "b"}, "/resource/B": null }
To take advantage of something like this and still let your development be sane, you'd need a construct to queue up your AJAX requests rather than firing them immediately, then you could fire off a single "multiple" request on a predefined interval, say 100ms.  Something like this (this code would not work as-is):
var queue = [];
function get(url) {
    queue.push(url);
}
function parse_multiple(response) {
    for url in response {
        event.fire('data-received', url, response[url]);
    }
}
function multiget() {
    var query_params = ""; // build the query params from queue
    queue = [];
    $.ajax('GET', '/multiple?' + query_params, parse_multiple);
}
setTimeout(100, multiget);

Now, every 100ms you fire off a single AJAX request to get everything that needs to be updated rather than firing off a ton of ad-hoc AJAX requests for every individual item on the page.  Seems like that could work, but I haven't yet put it to the test.

On a similar note, to reduce the amount of redrawing you do in the browser, you could queue up DOM changes and do all of them at once on a timer as well.  That's a little more difficult to pull off, but if you built the new dom nodes in memory and just pointed them at the node that they were replacing, you could cache that in an object and since redraws only happen in between function calls (due to the single-threaded nature of Javascript, it has to wait until it exits user code to execute the browser code, unless something has changed recently).  So you could replace all the pending nodes in a single function and then let the browser redraw the whole thing after your function exited.  One redraw every 100ms is better than 100 redraws at random intervals, and fast enough that the user wouldn't notice the lag.  I dunno, maybe I'm taking crazy pills, since I haven't seen anyone attempt to do something like this in the major frameworks (or maybe they did while I wasn't looking).

Ok, good, I wrote those down so I can stop thinking about them and get back to what I'm supposed to be working on.  Hopefully some day I'll find out if they're feasible ideas or not.

Monday, January 12, 2015

A programming workout buddy

Personal trainers and fitness gurus always recommend getting a workout buddy to help keep you motivated to work on your fitness goals.  It makes sense.  You inevitably hit a lull and want to slip back into old, bad habits, and give up.  If you have a buddy who is also motivated, hopefully you won't both hit a lull at the same point, and you can cross-motivate.  So, why can't we apply this same idea to other areas, like programming?  Programming takes a certain amount of rigor to learn, requires a lot of motivation to keep going, and has amazing benefits if learned well.

One of my goals this year, and every year for a while, is to learn a new programming language.  I guess I did learn Python for my job last year, but it didn't really stretch me much, coming from another dynamic, interpreted language like Perl.  So, this year, I've decided to seek out a programming workout buddy to help push me along, and vice-versa.  So, I pinged a good friend that I thought might be interested, and he liked the idea.

We're going to learn Rust, which is an up-and-coming systems language from Mozilla.  We debated between a few different languages, but decided Rust would push us both to learn more fundamental CS stuff that is abstracted away from you in the other languages we were debating among.  Maybe we'll go back to some of the others afterwards.

We haven't worked out all the specifics yet, but the first part will be a sort of book club where we choose a common set of learning material and meet at a regular interval to discuss it.  After that, we'll probably pick some algorithms or data structures to implement, then compare results.  After that, who knows, maybe team up on a side project or something.  It's a bit loose and undefined at the moment, but that's how I roll.  Wish us luck.

If you want to join us, let me know.  We're just starting up this week.  I'd like to avoid having too large a group, but we can probably accommodate a few folks.

Tuesday, November 18, 2014

Introducing python-ambariclient

Apache Ambari is an open-source project that configures and manages Hadoop clusters.  The product I work on at work configures and manages Hadoop clusters... in the cloud (ooooh).  Now that Ambari has matured enough to be stable and have a fairly usable API, we've decided to discontinue the part of our product that overlaps with Ambari and use and contribute to that project instead.  Our product isn't going anywhere, just the ~50% of our codebase that does pretty much exactly what Ambari does will be replaced with Ambari and we'll put those resources we would spend building and maintaining our own code into improving Ambari.  Given that Ambari's primary audience is Java developers, their main efforts at a client library are written in Groovy, another JVM-backed language that works with Java seamlessly.  They also have a Python client, but it's fairly immature, incomplete, and buggy.  Efforts to contribute to it proved onerous, mostly due to concerns about breaking backwards-compatibility, so we decided that I would create a new client that we'd release to the public.  And so I'm here to announce our Python Ambari client libraries, aptly named python-ambariclient.

There were a few things I wanted out of the client that I felt we weren't able to accomplish easily with the existing one:
  1. An easy-to-intuit, consistent interface that mimicked the API structure.
  2. Native support for polling the long-running background operations that are common to working with Ambari.
  3. Easy to add new types of objects to as the Ambari API added new features.
  4. Minimize the number of actual HTTP requests executed.
  5. An ORM-style interface that felt natural to use coming from projects like SQLAlchemy and libcloud.
To accomplish all those goals, I felt like a vaguely promises-style API would suit it best.  This would allow us to delay firing off HTTP requests until you actually needed the response data to proceed, and I wanted the method-chaining style reminiscent of Javascript projects like jquery.  I was able to accomplish both, and I think it turned out pretty well.  It's a good example of what I've always wanted in an API client.  So, let's dive in to some of the design decisions.

Delegation and Collections and Models, oh my

The main API client is just an entry point that delegates all of the actual logic to a set of collection objects, each of which represents a collection of resources on the Ambari server.  For those who are used to REST APIs, this might make sense, but here's some examples to show what I mean:
# get all of the users in the system
users = ambari.users
for user in users:
    print user.user_name
# get all of the clusters in the system
clusters = ambari.clusters
for cluster in clusters:
    print cluster.identifier
The collections are iterable objects that contain a list of model objects, each representing a resource on the server.  There are some helper methods on the collections to do bulk operations, such as:
# delete all users (this will likely fail or break everything if it doesn't)
ambari.users.delete()
# update all users with a new password (bad idea, but hey)
ambari.users.update(password='new-password')
If you want to get a specific model out of a collection, that's easily accomplished by passing a single parameter into the accessor for the collection.
# get the admin user
admin_user = ambari.users('admin')
# get a specific cluster
cluster = ambari.clusters(cluster_name)
# get a specific host
host = ambari.hosts(host_name)
Additionally, you can get a subset of a collection by passing in multiple arguments.
# get a subset of all hosts
hosts = ambari.hosts([hostname1, hostname2, hostname3])
So, this is just the basic entry point model collections.  In Ambari, there's a large hierarchy of related resource and sub-resources.  Users have privileges, clusters have hosts, services have components, etc.  To handle that, each model object can have a set of related collections for the objects that are contained by it.  So, for example:
# get all hosts on a specific cluster
ambari.cluster(cluster_name).hosts
# get a specific host on that cluster
host = ambari.cluster(cluster_name).hosts(host_name)
Some of the hierarchies are very deep.  These are the deepest examples I can find so far:
# get a repo for a specific OS for a specific version of a specific stack
ambari.stacks(stack_name).versions(stack_version).operating_systems(os_type).repositories(repo_id)
# get a component for a specific service for a specific version of a specific stack
ambari.stacks(stack_name).versions(stack_version).services(service_name).components(component_name)
Obviously those are outliers, in general use you only need to go one or two levels deep for most things, but it's good to know the pattern holds even for deep hierarchies.

When you get to the individual model objects, they behave much like a normal ORM.  They have CRUD methods like create, update, delete, and they use attribute-based accessors for the fields returned by the API for that resource.  For example:
cluster = ambari.clusters(cluster_name)
print cluster.cluster_id
print cluster.health_report
There's no fancy data validation or type coercion like in SQLAlchemy, just a list of field names that define which attributes are available, but really that's all that I think is necessary in an API client.  The server will do more robust validation, and I didn't see any places where automatic coercion made sense.  What I mean by automatic coercion is automatically converting datetime fields into datetime objects, or things of that nature.  I'm not doing that, and it's possible that that decision turns out to be shortsighted, but I'm guessing the simplicity of the current design will win out.

Wait for it...

Because the client is a promises style API, it doesn't necessarily populate the objects when you expect.  For the most part, if it can't accomplish what you're requesting without populating the object with data from the server, it will do it automatically for you.  Many operations also are fairly asynchronous, and what you as a user really care about is that you are safe to operate on a resource.  To accomplish that, there is a method called wait() on each object.  Calling wait() will basically do whatever is required for that model or collection to be in a "ready" state for you to act on it.  Whether that's simply just requesting data from the server or waiting for a background operation to complete or waiting for a host to finish registering itself with the Ambari server, the method is the same.  .wait():
# wait for a recently-added host to be available in Ambari
ambari.hosts(host_name).wait()
# wait for a bootstrap call to finish and all hosts to be available
ambari.bootstrap.create(hosts=[hostname1, hostname2], **other_params).wait()

I have a request

In the Ambari API, if your POST or PUT command triggers a background operation, a 'request' object is returned in the response body.  It will look something like this:
{
  "href" : "http://c6401.ambari.apache.org:8080/api/v1/clusters/testcluster/requests/1",
  "Requests" : {
    "id" : 1,
    "status" : "InProgress"
  }
}
If any API call returns this information, the Ambari client will automatically recognize that and store that information away.  Then, if you call .wait() on the object, it will poll the Ambari API until that request has completed.  At some point, it will start throwing exceptions if the request doesn't complete successfully, but that logic hasn't been built in yet.
# install all registered components on a host and wait until that's done
ambari.clusters(cluster_name).hosts(host_name).components.install().wait()
And to be consistent and obey the principle of least surprise, you can chain off wait() calls to do further actions, so this also works:
# install and start all registered components on a host and wait until it's done
ambari.clusters(cluster_name).hosts(host_name).components.install().wait().start().wait()
It's not generally a great idea to just have a huge long method chain like that, but it's possible.  It would be written better like:
components = ambari.clusters(cluster_name).hosts(host_name).components
components.install().wait()
components.start().wait()

Wait, that's it?

I wanted it to be extremely easy to add new model classes to the client, because that was one of my biggest complaints with the existing client.  So most of the common logic is built into two base classes, called QueryableModel and DependentModel.  Now defining a model class is as simple as defining a few pieces of metadata, for example:
class Cluster(base.QueryableModel):
    path = 'clusters'
    data_key = 'Clusters'
    primary_key = 'cluster_name'
    fields = ('cluster_id', 'cluster_name', 'health_report', 'provisioning_state',
              'total_hosts', 'version', 'desired_configs',
              'desired_service_config_versions')
    relationships = {
        'hosts': ClusterHost,
        'requests': Request,
        'services': Service,
        'configurations': Configuration,
        'workflows': Workflow,
    }
  1. 'path' is the piece of the URL that should be appended to access this model.  i.e. /api/v1/clusters
  2. 'data_key' defines which part of the returned data structure contains the data for this particular model.  The Ambari API returns the main model's data in a subordinate structure because it also returns a lot of related objects.
  3. 'primary_key' is the field that is used to generate the URLs to a specific resource.  i.e. /api/v1/clusters/cluster_name
  4. 'fields' is a list of field names that should be returned in the model's data.
  5. 'relationships' is a list of accessors that should build related collection objects. i.e. ambari.clusters(cluster_name).hosts == collection of ClusterHost models
Some objects are not represented by actual URLs on the server and are only returned as related objects to other models.  These are called DependentModels in my client.  Here's a pretty simple one:
class BlueprintHostGroup(base.DependentModel):
    fields = ('name', 'configurations', 'components')
    primary_key = 'name'

class Blueprint(base.QueryableModel):
    path = 'blueprints'
    data_key = 'Blueprints'
    primary_key = 'blueprint_name'
    fields = ('blueprint_name', 'stack_name', 'stack_version')
    relationships = {
        'host_groups': BlueprintHostGroup,
    }
When you get a specific blueprint, it returns something like this:
{
  "href" : "http://c6401.ambari.apache.org:8080/api/v1/blueprints/blueprint-multinode-default",
  "configurations" : [
    {
      "nagios-env" : {
        "properties" : {
          "nagios_contact" : "greg.hill@rackspace.com"
        }
      }
    }
  ],
  "host_groups" : [
    {
      "name" : "namenode",
      "configurations" : [ ],
      "components" : [
        {
          "name" : "NAMENODE"
        }
      ],
      "cardinality" : "1"
    }
  ],
  "Blueprints" : {
    "blueprint_name" : "blueprint-multinode-default",
    "stack_name" : "HDP",
    "stack_version" : "2.1"
  }
}
As you can see, the 'Blueprints' key is the 'data_key', so that structure has the data related to the blueprint itself.  The 'host_groups' and 'configurations' structures are related objects that don't have URLs associated with them.  For those, we can define DependentModel classes to automatically expand them into usable objects.  So, now this works:
for host_group in ambari.blueprints(blueprint_name).host_groups:
    print host_group.name
    for component in host_group.components:
        print component['name']
I tried to make things act consistently even where they weren't consistent in the API.  It should be noted that objects that are backed by URLs are also returned in related collections like this, and the client will automatically use that data to prepopulate the related collections to avoid more HTTP requests.  For example, here is a very trimmed down cluster response:
{
  "href" : "http://c6401.ambari.apache.org:8080/api/v1/clusters/testcluster",
  "Clusters" : {
  },
  "requests" : [
    {
      "href" : "http://c6401.ambari.apache.org:8080/api/v1/clusters/testcluster/requests/1",
      "Requests" : {
        "cluster_name" : "testcluster",
        "id" : 1
      }
    }
  ],
  "services" : [
    {
      "href" : "http://c6401.ambari.apache.org:8080/api/v1/clusters/testcluster/services/GANGLIA",
      "ServiceInfo" : {
        "cluster_name" : "testcluster",
        "service_name" : "GANGLIA"
      }
    }
  ]
As you can see, both the 'requests' and 'services' related collections were returned here.  So, if you were to then, do:
for service in ambari.clusters(cluster_name).services:
    print service.service_name
It would only have to do the single GET request to populate the cluster object, then use the data returned there to populate the service objects.  There is a caveat here.  When getting collections in the Ambari API, it generally only returns a minimal subset of information, usually just the primary_key and possibly the primary_key of its parent (in this case, service_name and cluster_name).  If you want to access any other fields on that object, it will have to do another GET call to populate the remaining fields.  It does this for you automatically:
for service in ambari.clusters(cluster_name).services:
    print service.maintenance_state
'maintenance_state' was not among the fields returned by the original call, so it will do a separate GET request for  http://c6401.ambari.apache.org:8080/api/v1/clusters/testcluster/services/GANGLIA to populate that information and then return it.

Smoothing out the rough edges

The Ambari API is mostly consistent, but there are some warts from old designs or one-off pieces.  The bootstrap API and the configurations are the worst offenders in this regard.  All efforts were made to make those areas behave like the other areas as much as possible.  I didn't want the user to have to know that, for example, bootstrap requests aren't the same as every other asynchronous task, or that even when a bootstrap finishes the hosts are not visible to Ambari until their agents have booted up and registered themselves.  So, I overloaded the wait() method on those objects so that it just does the needful.
# wait until these hosts are in a ready state
ambari.hosts([hostname1, hostname2]).wait()
Similarly, adding a host to a cluster normally involves manually assigning all of the components, but an upcoming Ambari feature will make it so you simply have to pass in a blueprint and host_group and it will automatically do it for you.  I pre-emptively smoothed this out in the client so you can do this now, it just involves a few more API requests to be made automatically on your behalf.  Wherever things are inconsistent on the API server, my client makes them consistent to the user.
# add a new host to an existing host_group definition
ambari.clusters(cluster_name).hosts.create(host_name, blueprint=blueprint_name, host_group=host_group_name)
When the server-side is updated to include support for this, I can simply pass the information along and let it sort it out.  There are a few other cases where warts in the API were smoothed over, but for the most part the idioms in the client matched up with the API server pretty well.

Where do we go from here?

There was one feature that I really wanted to have that I wasn't able to wrap my head around sufficiently to implement in a clean, intuitive way.  That is the ability to act on collections of collections.  Wouldn't it be awesome if this worked?
# restart all components on all hosts on all clusters
ambari.clusters.hosts.components.restart().wait()
The .wait() would get a list of clusters, then get a list of hosts per cluster in parallel, then get a list of components for each host in parallel, then call the restart API method for each of those, gobble up all the request objects, and wait until all of them completed before returning.  This should be possible, but it will require a bit more thought into how to implement it sanely, and there wasn't enough bang for the buck for our use-cases to justify spending the time right now.  But maybe I'll get back to it later.

What's it to me?

I realize Ambari is a niche product, and that most of this post will be gobbledy-guck for most of you, but I think the general principles behind the client's design apply well to any REST-based API client.  I hope that people find them useful and maybe lift a few of them for their own projects.  Most of all, I think this is probably the best client library I've ever written, and it embodies pretty much everything I've wanted in a client library in the past.  We plan on rewriting the client library for our own API in a similar fashion and releasing that to the public in the near future*.

* Usual disclaimer about forward-looking statements and all that.  I make no guarantee that this will actually happen.