Monday, June 22, 2015

On being a full-stack web developer

One of the recent buzzwords in the industry is "full-stack engineer" or its many derivatives.  I've seen some backlash against the term to the effect that nobody can possibly excel in all the disciplines necessary to truly be a full-stack engineer, so we should stop using it.  That saddens me a bit because a) I do and b) a lot of people who include the buzzword on their resume are not and because of that, people don't believe me that I am.  I'm not prone to bragging about myself.  I tend to shy away from the spotlight.  But I'm going to put that aside for today.  I'm here to tell you what I think a good full-stack engineer can accomplish, based entirely on examples of my own experience in my career.  At my previous job at Liquid Web, I was hired on as the 3rd developer in the company (not ever, just concurrent, several had come and gone).  With such a small team, there's no room for specialization, you just do what needs to get done and learn what you need.  I excelled in this environment because I had a strong foundation in many of the areas of responsibility and a keen desire to learn the remaining.  I was there before we had any QA.  I was there before we had any ops for our internal software (the whole company were basically ops for the customers, so I have to make that distinction).  Here are some highlights of what I worked on there, from frontend to backend to ops to networking.  I do this to better explain what I mean when I say I'm a full-stack engineer and that being such a thing is possible.

Storm Management Console

We built a web interface for managing your account and servers.  Initially, the Storm product was a single page in our old PIMS web interface, but we quickly realized that the new system was going to be vastly more complex and decided to start from scratch.  Our sole designer in the marketing department delivered me some jpegs of mocked-up screenshots of how the site should look.  From that, I wrote the HTML, CSS, and Javascript required to bring it to life.  This was back before the Javascript renaissance that came after node.js got all hip and trendy, but was after things like jquery were established, so it wasn't the dark ages or anything.  We decided to use jquery for the DOM manipulation, but the rest of the codebase was entirely custom.  It was built first only for our Storm on Demand sub-brand, but eventually it was repurposed through some clever code-sharing to be used for both brands.  To accomplish what we needed, I built the following:

  • A publish/subscribe event system for objects to interact with each other cleanly
  • An AJAX caching system to avoid repeated queries for the same data
  • A CSS/Javascript minification/concatenation system that was cache-friendly (i.e. it would expire the caches whenever you made changes, so you never had stale code, but otherwise would serve from cache)
  • A component framework to allow reuse of self-contained sections of functionality among multiple different pages/views (what I wanted was full data-binding of the model/view layer, and I got close, but not quite)
Most of those things are easily found out of the box now in the new shiny frameworks, but in 2008 they weren't common.  

The Storm API

When we were initially discussing building out the Storm product, we all agreed that we wanted to have an API for customers to interact with our system.  Similarly, we wanted to shift towards a service-oriented architecture, because Amazon (and it being a good idea).  So we decided two things:
  1. We would build out our UI entirely from our API, ensuring that we exposed everything necessary for our customers in our API and giving us much better test coverage for our API.
  2. Our public API would wrap an internal-use APIs that we could expose to other departments to accomplish a true service-oriented architecture internally at the company.
Both of those decisions had performance implications, so we tried to optimized the stack as much as possible, but I was never happy with the overall performance.  I felt, and still do, that we should have done more to streamline that as much as possible to improve the responsiveness in the UI.  But I was vetoed due to more pressing concerns.  Anyway, that's tangential to the discussion at hand. 

I built the framework on which both the internal use and public APIs were built.  I extended the auth/access control system I had previously built to cover the needs for both APIs, including adding rate-limiting and making it reusable for customers as well as employees.  I built the framework that translated the input validation and code documentation into customer-facing public docs.  I built the versioning framework that allowed us to selectively update methods in new versions of the API and still keep the original versions intact for backwards-compatibility.  I came up with and wrote code to enforce our naming conventions that allowed us to provide a very consistent API to our customers.

The Core

At the core of all of our systems at LiquidWeb was a set of libraries that began life in the LiquidWeb billing system.  They expanded to encompass most of the various areas of the company, including tracking things like accounts, IP assignments, server locations, credentials, and, of course, billing records.  The only thing that wasn't part of that system was HelpDesk, which was used to track customer support interactions.  The internal-use API wrapped around these libraries, which contained all of the core business logic for the company.  A few of the projects I worked on here included:

  • Automating the assignment of IP addresses from available pools of IPs
  • Automating configuration of routers and switches via SNMP and SSH for things like VLAN and ACL management, ARP cache flushing, etc.
  • Adding hourly billing to both the old billing system (a colossal hack on top of a cruft-ridden legacy system) and to our new billing system as well
  • Added support for roles to our core framework
  • Profiled and optimized our input validation code that was hit quite frequently
  • Added generic support for many types of relationships to our model classes in our custom framework

Testing

As part of the rollout of Storm, we hired our first QA engineers via internal promotions.  At first, they mostly focused on manual testing, and we were running into a lot of bugs with the provisioning system, so I extended our unit-testing framework to do some automated functional/integration tests of the provisioning system.  We already had good unit test coverage for everything in the API layer and a lot of the core code, but the provisioning system didn't have unit tests.  So, I wrote tests that ran as part of our automated unit test run every night that tested the following:
  • Create a server, wait until it's done, verify that the state is correct in the core system and that you can ping it and log into it via ssh.
  • Resize that server to a larger size and a smaller size, verify that the state is correct in the database, that the billing adjustments were made as expected, and both ping and ssh still work
  • Clone a server, verify that the clone is the same in all respects except for the name and any settings that were overridden as part of the clone process, and verify that both servers are still reachable via ping and ssh
  • Adjust firewall rules, verify that the specified ports are enabled/disabled as expected
  • Delete a server, make sure billing was adjusted, make sure you can no longer ping it or log into it.

Devops

When I started, all of the developers had a single shared development server where we could push edits to test, but we had to manually coordinate that so we didn't overwrite each others changes.  My boss had already begun on trying to build individual development environments, but was too busy fighting fires, so one of the first tasks he gave me was to build my own development environment in a way that could be repeated for others.  So I did.  I had to learn a lot about the existing application stack, that was different from what I had used in the past (fastcgi vs mod_perl).  When we decided to build out a staging environment, I pointed out that we should take the opportunity to automate environment deployments so we wouldn't have to keep manually updating servers.  Both Chef and Puppet were in alpha/beta stages of their development, and cfengine seemed too complex for what we needed, so I ended up writing some scripts to automate the installation of our software and the configuration of the servers we deployed on.  I called them the push scripts, because they controlled our code pushing process.  I iterated on them as we built out the staging environment as a first step, and eventually we migrated all of our production and dev environments to use them as well.  It was built on a layered YAML configuration file, where you could paint broad strokes at the application, service, or environment level, then have environment or server-specific overrides for differences.  Some people found the layering confusing, so maybe it wasn't the best idea in retrospect, but it was a powerful concept once you wrapped your head around it.  It prevented a lot of repetition of the same values for different environments or applications that shared the same underlying settings.  The values in that file would be used to populate templates or passed into modules that would then use the values to decide how to install and configure their service.  I could write a whole blog post about the architecture, and I might some day, but it was very flexible and saved us a lot of headaches. I've since used Chef and Jenkins to do much of the same things, and I have to say that for our needs, using our push scripts was much easier than Chef, and it ran in a fraction of the time that Chef takes to do an update, so sometimes having something that's tailored just for your needs has benefits.

What I didn't do

The only area that I didn't work on much was the server provisioning system itself.  I would have liked to, but when we started the project, the other developer didn't have much frontend experience, so he ended up building the provisioning system and I handled the rest. As the team expanded, we kind of took lead on those areas separately and didn't cross-pollinate much. I did contribute some bug fixes here and there, and helped diagnose many issues through my automated testing, but other than that I didn't get the opportunity.  Thankfully my current job affords me the opportunity to work on a provisioning engine, and while it's at a different level (provisioning clusters vs servers - we rely on the cloud to provision the actual servers), many of the same concepts still apply.

I apologize if this comes across as grandstanding or bravado; that's not my intent.  I just get tired of the rhetoric.  I've heard/read many people saying that "full stack is impossible" and that people proclaiming to qualify are lying.  I can't vouch for anyone else, but I can vouch for myself.  I am a full-stack engineer.  I don't know everything about everything; I know very little about systems programming, mobile app development, or desktop application development.  I'm not the guy to build the next big datastore, or write a new OS, come up with a new machine learning algorithm, or devise the next big protocol.  For web-development and something-as-a-service development, I do know and have worked on the entire stack, including devops.  I can build on the core concepts developed by those with more knowledge of theory than I have and create great software for actual users.  I've spent the hours fighting with antiquated browsers that cling to market share (*cough* IE6 *cough*).  I've dug deep on extremely complicated billing issues to figure out why we billed a customer incorrectly and how to correct it. I've profiled and optimized frequently-used algorithms to speed them up by as much as 80x (* gains not typical).  I've built authorization and access control systems from scratch.  I don't personally find any of these things amazing.  I kind of expect other developers to just dive in and learn what they need in order to do what they need to get done, but I'm learning that not everyone is this way.   For those that aren't, don't assume others are the same as you.  Some people do dive in head first and learn all of the disciplines around web-development enough to be called full-stack engineers.

My current position doesn't afford me the opportunity to work on the entire stack, sadly, and I've recently realized that as screwed up as web development can be, I do miss it.  It's always exciting to see your creations come to life in a browser, and not having control over that aspect weighs on me a bit.  Maybe some day.  In the meantime, I'm learning a lot about data stores and having fun finally working on a provisioning engine.

Some random thoughts on webapp performance

A recent discussion got me thinking back in terms of website development, so I thought I'd jot down a few ideas that I'd like to try to implement when I get some free tuits. Maybe they'll spur some creativity in others, or at least you can tell me why I'm insane and this couldn't possibly work.

Server-side vs client-side rendering

Back when node.js was springing up in popularity, a lot of the focus around it was to do with how it uses an asynchronous event-loop to manage a lot of connections per process (since Javascript is single-threaded, parallelism is only managed by pre-emptive multitasking similar to how things worked back in the Windows 3.1 days).  I was confused by this for a few reasons.  Why not use process pooling to run multiple copies of the node server then?  You'd be able to peg each core with a separate process at least, and at least improve how many tasks you could do in parallel.  This would at least scale the number of requests you could serve simultaneously and maximize your hardware use.  Have each listen on a different port, throw a loadbalancer in front, voila!  Maybe they're doing that now, but there was no talk of it at the time.

Anyway, I've gone on a tangent.  For me, the killer use-case for having Javascript on the server is to share libraries on both server and client.  On the server side, your data source would be a database or an API or whatever you're using, on the client side, the server's REST interface would be the data source (or if you set up CORS, the original REST API could be the source on the client as well).  This would buy you a few things.

Pre-render the whole page server-side.  

One of the biggest problems with SPAs (Single Page Apps) is it completely offloads the rendering to the client.  You send down a minimal set of javascript code and templates, then the javascript code fires off a bunch of AJAX requests to pull down the data and renders the templates.  This works well on modern laptops and desktops, but for phones, resources are much more constrained.  Not only are you doing all of the rendering on their CPU, you're hammering the network with all of the AJAX requests to pull in data.  Rather than do that, you could, using the same exact rendering path as the client-side, pre-render the full page's initial state and send only that over the network.  Initial page load times will be much faster, and no more "the page is here, but wait while we load the data" frustration.

Render updates client-side.  

Now, here's the rub.  Once you have the page downloaded, you don't want to have to hit the server to re-render the whole page every time something changes.  That would just be silly.  Since you've got the same rendering paths available client-side, everything being Javascript and all, you could simply re-render the section of the page that was updated by the new data received by whatever AJAX request you sent.  Smaller, incremental updates done with a minimum of overhead.

It seems that my ideas were a bit ahead of their time.  I had a conversation with a coworker about this back in 2012 or so, and a recent poll suggests that this sort of dual-rendering path is finally becoming popular.

https://medium.com/javascript-scene/javascript-scene-tech-survey-d2449a529ed

The only advantage left for SPAs is that you can push out all of your content to the edges by using a CDN, but you can't push your REST interface to the edge, so all of those AJAX requests to grab data are still hitting the same limitation.  You could still put all of your Javascript and templates in a CDN and get some of the benefit there, I suppose.

Queued AJAX requests/multipart REST

Maybe something like this exists, but a quick googling hasn't found any results based on keywords I thought to look for.  I was thinking more about how to reduce the number of requests required by a website.  For context, at my previous job, we had a UI that was broken down into a bunch of components, and each component was responsible for updating itself via AJAX.  Some did this based on a timer, others did this based on a publish/subscribe framework that would alert them when their data source had potentially changed.  I wanted to expand this to use websockets to listen for updates coming in from the server-side as well, in case you had multiple users accessing the same account at the same time, but that use-case wasn't deemed common enough to justify the development effort, and the idea sat.  Anyway, in this case, we were very often firing off a bunch of AJAX GET requests to pull the latest data and re-render sections of the page.  So, given a framework like that, which I imagine isn't all that uncommon, I was thinking, "what if REST had a multipart GET request to grab a bunch of resources in a single request".  I haven't fully fleshed the idea out, but similar to how you upload files using multipart/form-data, you could do a multipart GET request.  The request would look something like:
GET /
Content-Type: multipart/rest; boundary=----------------------------12345
------------------------------12345
GET /resource/A
------------------------------12345
GET /resource/B
And the response would be something like:
Content-Type: multipart/rest; boundary=----------------------------12345
------------------------------12345
Location: /resource/A
Content-Type: application/json
Content-Length: 10
Status: 200 OK
{"a": "b"}
------------------------------12345
Location: /resource/B
Status: 404 Not Found
This would avoid the overhead of the network and HTTP protocol for all the additional requests and let you get a bunch of resources in a single request.  Since this would require modifying the HTTP spec, something a little easier would be to just have an API endpoint that does this for you in a single body:
GET /multiple?resource=/resource/A&resource=/resource/B
And the response would just encapsulate all of the resources in a single body:
Content-type: application/json
Status: 200 OK
{ "/resource/A": {"a": "b"}, "/resource/B": null }
To take advantage of something like this and still let your development be sane, you'd need a construct to queue up your AJAX requests rather than firing them immediately, then you could fire off a single "multiple" request on a predefined interval, say 100ms.  Something like this (this code would not work as-is):
var queue = [];
function get(url) {
    queue.push(url);
}
function parse_multiple(response) {
    for url in response {
        event.fire('data-received', url, response[url]);
    }
}
function multiget() {
    var query_params = ""; // build the query params from queue
    queue = [];
    $.ajax('GET', '/multiple?' + query_params, parse_multiple);
}
setTimeout(100, multiget);

Now, every 100ms you fire off a single AJAX request to get everything that needs to be updated rather than firing off a ton of ad-hoc AJAX requests for every individual item on the page.  Seems like that could work, but I haven't yet put it to the test.

On a similar note, to reduce the amount of redrawing you do in the browser, you could queue up DOM changes and do all of them at once on a timer as well.  That's a little more difficult to pull off, but if you built the new dom nodes in memory and just pointed them at the node that they were replacing, you could cache that in an object and since redraws only happen in between function calls (due to the single-threaded nature of Javascript, it has to wait until it exits user code to execute the browser code, unless something has changed recently).  So you could replace all the pending nodes in a single function and then let the browser redraw the whole thing after your function exited.  One redraw every 100ms is better than 100 redraws at random intervals, and fast enough that the user wouldn't notice the lag.  I dunno, maybe I'm taking crazy pills, since I haven't seen anyone attempt to do something like this in the major frameworks (or maybe they did while I wasn't looking).

Ok, good, I wrote those down so I can stop thinking about them and get back to what I'm supposed to be working on.  Hopefully some day I'll find out if they're feasible ideas or not.