Storm Management Console
We built a web interface for managing your account and servers. Initially, the Storm product was a single page in our old PIMS web interface, but we quickly realized that the new system was going to be vastly more complex and decided to start from scratch. Our sole designer in the marketing department delivered me some jpegs of mocked-up screenshots of how the site should look. From that, I wrote the HTML, CSS, and Javascript required to bring it to life. This was back before the Javascript renaissance that came after node.js got all hip and trendy, but was after things like jquery were established, so it wasn't the dark ages or anything. We decided to use jquery for the DOM manipulation, but the rest of the codebase was entirely custom. It was built first only for our Storm on Demand sub-brand, but eventually it was repurposed through some clever code-sharing to be used for both brands. To accomplish what we needed, I built the following:- A publish/subscribe event system for objects to interact with each other cleanly
- An AJAX caching system to avoid repeated queries for the same data
- A CSS/Javascript minification/concatenation system that was cache-friendly (i.e. it would expire the caches whenever you made changes, so you never had stale code, but otherwise would serve from cache)
- A component framework to allow reuse of self-contained sections of functionality among multiple different pages/views (what I wanted was full data-binding of the model/view layer, and I got close, but not quite)
Most of those things are easily found out of the box now in the new shiny frameworks, but in 2008 they weren't common.
The Storm API
When we were initially discussing building out the Storm product, we all agreed that we wanted to have an API for customers to interact with our system. Similarly, we wanted to shift towards a service-oriented architecture, because Amazon (and it being a good idea). So we decided two things:
- We would build out our UI entirely from our API, ensuring that we exposed everything necessary for our customers in our API and giving us much better test coverage for our API.
- Our public API would wrap an internal-use APIs that we could expose to other departments to accomplish a true service-oriented architecture internally at the company.
Both of those decisions had performance implications, so we tried to optimized the stack as much as possible, but I was never happy with the overall performance. I felt, and still do, that we should have done more to streamline that as much as possible to improve the responsiveness in the UI. But I was vetoed due to more pressing concerns. Anyway, that's tangential to the discussion at hand.
I built the framework on which both the internal use and public APIs were built. I extended the auth/access control system I had previously built to cover the needs for both APIs, including adding rate-limiting and making it reusable for customers as well as employees. I built the framework that translated the input validation and code documentation into customer-facing public docs. I built the versioning framework that allowed us to selectively update methods in new versions of the API and still keep the original versions intact for backwards-compatibility. I came up with and wrote code to enforce our naming conventions that allowed us to provide a very consistent API to our customers.
The Core
At the core of all of our systems at LiquidWeb was a set of libraries that began life in the LiquidWeb billing system. They expanded to encompass most of the various areas of the company, including tracking things like accounts, IP assignments, server locations, credentials, and, of course, billing records. The only thing that wasn't part of that system was HelpDesk, which was used to track customer support interactions. The internal-use API wrapped around these libraries, which contained all of the core business logic for the company. A few of the projects I worked on here included:- Automating the assignment of IP addresses from available pools of IPs
- Automating configuration of routers and switches via SNMP and SSH for things like VLAN and ACL management, ARP cache flushing, etc.
- Adding hourly billing to both the old billing system (a colossal hack on top of a cruft-ridden legacy system) and to our new billing system as well
- Added support for roles to our core framework
- Profiled and optimized our input validation code that was hit quite frequently
- Added generic support for many types of relationships to our model classes in our custom framework
Testing
As part of the rollout of Storm, we hired our first QA engineers via internal promotions. At first, they mostly focused on manual testing, and we were running into a lot of bugs with the provisioning system, so I extended our unit-testing framework to do some automated functional/integration tests of the provisioning system. We already had good unit test coverage for everything in the API layer and a lot of the core code, but the provisioning system didn't have unit tests. So, I wrote tests that ran as part of our automated unit test run every night that tested the following:
- Create a server, wait until it's done, verify that the state is correct in the core system and that you can ping it and log into it via ssh.
- Resize that server to a larger size and a smaller size, verify that the state is correct in the database, that the billing adjustments were made as expected, and both ping and ssh still work
- Clone a server, verify that the clone is the same in all respects except for the name and any settings that were overridden as part of the clone process, and verify that both servers are still reachable via ping and ssh
- Adjust firewall rules, verify that the specified ports are enabled/disabled as expected
- Delete a server, make sure billing was adjusted, make sure you can no longer ping it or log into it.
Devops
When I started, all of the developers had a single shared development server where we could push edits to test, but we had to manually coordinate that so we didn't overwrite each others changes. My boss had already begun on trying to build individual development environments, but was too busy fighting fires, so one of the first tasks he gave me was to build my own development environment in a way that could be repeated for others. So I did. I had to learn a lot about the existing application stack, that was different from what I had used in the past (fastcgi vs mod_perl). When we decided to build out a staging environment, I pointed out that we should take the opportunity to automate environment deployments so we wouldn't have to keep manually updating servers. Both Chef and Puppet were in alpha/beta stages of their development, and cfengine seemed too complex for what we needed, so I ended up writing some scripts to automate the installation of our software and the configuration of the servers we deployed on. I called them the push scripts, because they controlled our code pushing process. I iterated on them as we built out the staging environment as a first step, and eventually we migrated all of our production and dev environments to use them as well. It was built on a layered YAML configuration file, where you could paint broad strokes at the application, service, or environment level, then have environment or server-specific overrides for differences. Some people found the layering confusing, so maybe it wasn't the best idea in retrospect, but it was a powerful concept once you wrapped your head around it. It prevented a lot of repetition of the same values for different environments or applications that shared the same underlying settings. The values in that file would be used to populate templates or passed into modules that would then use the values to decide how to install and configure their service. I could write a whole blog post about the architecture, and I might some day, but it was very flexible and saved us a lot of headaches. I've since used Chef and Jenkins to do much of the same things, and I have to say that for our needs, using our push scripts was much easier than Chef, and it ran in a fraction of the time that Chef takes to do an update, so sometimes having something that's tailored just for your needs has benefits.
What I didn't do
The only area that I didn't work on much was the server provisioning system itself. I would have liked to, but when we started the project, the other developer didn't have much frontend experience, so he ended up building the provisioning system and I handled the rest. As the team expanded, we kind of took lead on those areas separately and didn't cross-pollinate much. I did contribute some bug fixes here and there, and helped diagnose many issues through my automated testing, but other than that I didn't get the opportunity. Thankfully my current job affords me the opportunity to work on a provisioning engine, and while it's at a different level (provisioning clusters vs servers - we rely on the cloud to provision the actual servers), many of the same concepts still apply.
I apologize if this comes across as grandstanding or bravado; that's not my intent. I just get tired of the rhetoric. I've heard/read many people saying that "full stack is impossible" and that people proclaiming to qualify are lying. I can't vouch for anyone else, but I can vouch for myself. I am a full-stack engineer. I don't know everything about everything; I know very little about systems programming, mobile app development, or desktop application development. I'm not the guy to build the next big datastore, or write a new OS, come up with a new machine learning algorithm, or devise the next big protocol. For web-development and something-as-a-service development, I do know and have worked on the entire stack, including devops. I can build on the core concepts developed by those with more knowledge of theory than I have and create great software for actual users. I've spent the hours fighting with antiquated browsers that cling to market share (*cough* IE6 *cough*). I've dug deep on extremely complicated billing issues to figure out why we billed a customer incorrectly and how to correct it. I've profiled and optimized frequently-used algorithms to speed them up by as much as 80x (* gains not typical). I've built authorization and access control systems from scratch. I don't personally find any of these things amazing. I kind of expect other developers to just dive in and learn what they need in order to do what they need to get done, but I'm learning that not everyone is this way. For those that aren't, don't assume others are the same as you. Some people do dive in head first and learn all of the disciplines around web-development enough to be called full-stack engineers.
My current position doesn't afford me the opportunity to work on the entire stack, sadly, and I've recently realized that as screwed up as web development can be, I do miss it. It's always exciting to see your creations come to life in a browser, and not having control over that aspect weighs on me a bit. Maybe some day. In the meantime, I'm learning a lot about data stores and having fun finally working on a provisioning engine.
My current position doesn't afford me the opportunity to work on the entire stack, sadly, and I've recently realized that as screwed up as web development can be, I do miss it. It's always exciting to see your creations come to life in a browser, and not having control over that aspect weighs on me a bit. Maybe some day. In the meantime, I'm learning a lot about data stores and having fun finally working on a provisioning engine.