Cloud hosting platforms like Amazon Web Services, Microsoft Azure, Rackspace Cloud, Digital Ocean, and Google Cloud Platform have become great equalizers in IT over the past several years. They’ve availed complex n-tier system architectures with robust fault tolerances and contingencies, once only accessible to deep pocketed enterprises, for mere pennies on the hour.
However, many small and medium sized businesses overlook or resist these offerings, assuming they are too expensive, too complicated, or they simply discount them as over-engineering for their ‘simple’ web serving needs.
We couldn’t disagree more! Hosting in the cloud provides the opportunity to have a responsive, scalable, fault tolerant web application at any budget. In this article, we’ll review the traditional dedicated server model, introduce some core concepts for cloud hosting, and discuss tactics for effectively migrating into a cloud platform. Let’s get to it!
The Dedicated Server Model
For a long time, small and medium sized businesses have continued the tradition of building or leasing dedicated web servers sized appropriately for a particular application with “room to grow”, accepting that a scalable, bulletproof architecture would be cost prohibitive.
While this approach avails a certain simplicity for both management and cost, it also exposes significant risk as a single point of failure. A hardware failure, system compromise, or unprecedented traffic could mean significant downtime. Even with the (somewhat dubious) standard promise of “no more than a half day” to bring a replacement server online (or add more hardware), the optics and opportunity cost of having a web application offline could be disastrous for any business. Additionally, the dedicated server model forces the buyer to overpay for dedicated resources capable of handling anticipated peak traffic, only to let that capacity rot during off-hours.
Enter The Cloud
Beyond removing the single point of failure, hosting in a cloud platform provides several advantages over a dedicated server for web applications:
- Auto-scaling with ephemeral servers allows for supporting real-time demand without overpaying for computing resources.
- Database containers, untethered to fixed hardware, remove a common bottleneck by allowing vertical and horizontal scaling.
- Low cost load balancing to delineate pools of servers and even spread load across multiple data centers.
- Globally accessible central messaging queues for communication between decentralized application components.
Cloud platforms also provide performance metrics and dashboards that rival Nagios, Ganglia, and even New Relic, as well as APIs / SDKs for every popular application stack. Combined with expanding infrastructure capabilities and microservices, a competitive marketplace constantly pressuring prices downward, and rich communities of free and paid support, there’s never been a better time to consider making the transition.
Cloud Hosting Core Concepts
Before architecting a migration, it’s important to understand a few core concepts and design patterns that will allow a web application to become capable of expanding and contracting as traffic ebbs and flows. Many of these concepts have been around since the early days of datacenters, others arrived as virtualization evolved, but none of them have been generally available to anyone with a credit card until cloud computing became a commodity.
Persistent vs. Ephemeral Servers
In the traditional IT infrastructure, servers are persistent physical assets. While it’s possible to have persistent servers within cloud hosting platforms (and certainly desirable for certain needs), simply virtualizing physical hardware in the cloud for serving web traffic is not usually the most effective way to harness the advantages these platforms provide.
Rather, cloud architecture is designed for ephemeral, disposable computing resources that come into and out of existence as needed, achieved by instantiating a machine image in virtualized hardware. A machine image is simply a snapshot of a virtual machine, containing a complete operating system and typically all of the baseline software, tools, and libraries needed for whatever purpose the server is designed to fulfill.
The cloud platform vendor handles provisioning resources for the newly instantiated machine, including allocating disk space, assigning IP addresses, joining private and/or public networks, and applying security policies. The provisioning process can also include hooks to automatically apply OS security patches, test communication to other computing resources, register with load balancers, and so on.
Where the typical lifecycle of a web application is perpetually evolving, it’s imperative to consider integrating a provisioning framework such as Puppet, Chef, Ansible, SaltStack, or even a custom method (such as simple automated synchronization from a remote git repository) to ensure every running instance receives the freshest code (and configuration) when created, and when incremental releases are made to an application’s codebase.
Database Containers
Cloud platforms have introduced specialized database container services that are not only fine tuned on bare-metal specifically for the database engine, but also offer enticing features beyond just dedicated processing power.
Most platforms, for example, incorporate facilities for elasticity (resizing/scaling the database instantly and transparently), automatic failover, automated incremental backups, instant rollbacks, and the ability to utilize read replicas (read-only concurrent copies) that can become instrumental in distributing an application across multiple datacenters, if needed.
Load Balancing
Load balancers serve as traffic cops, binding to an endpoint with an IP address and transparently distributing or proxying requests to and from one or many registered computing resources.
While load balancing can be used for many different kinds of services and protocols, web applications can utilize load balancers as endpoints to pools of web servers listening for HTTP and HTTPS requests. Smaller applications may have one load balancer to serve as the endpoint for www.example.com in DNS and distribute load across a few servers, while large applications may have load balancers for various tiers of web servers (public, API, CMS, extranet, etc.) and even for distributing load across datacenters for expansive redundancy and fault tolerance.
Like their hardware appliance forebears (F5, Cisco, Barracuda, Juniper, etc.) cloud load balancers have configurable strategies for traffic distribution, can transparently preserve session cookies across requests, are capable of serving SSL certificates (taking overhead away from web servers), have models for determining registered instance health, and typically come with GUIs and/or rich APIs for coordinating with auto-scaling activities.
Messaging Queues
A centralized messaging queue provides a ubiquitous channel of communication between decentralized computing resources. There are many different use cases for queues, but as this model applies to web applications, they are quite useful for deferring units of work or “jobs” from one resource to another.
A classic example of deferring work in a web application is in the submission of a new user registration form. After the user clicks submit, the underlying application might typically create the user in the database, communicate with a CRM API, cut a thumbnail from a user provided profile image, and finally send out a welcome email to the new user. Offloading as much of this work by deferring it to a queue allows the HTTP POST request to complete and terminate much faster, allowing the user to arrive at the token thank you page much quicker.
Queues can also be used for broadcasting notifications to and from components of the application such as New software release; refresh now!
, The CRM API keeps timing out, notify DevOps?
, Hi, new web server here, can someone add me to the load balancer?
, Everyone go into maintenance mode!
, etc.
While implementing a queue in-house with a database might seem an easy enough task, there are already many fantastic queueing microservices out there, such as Amazon SQS, Rabbit MQ, Iron MQ, and many others (most platforms have their own) that are worth strong consideration. All of these services provide robust SDKs in many languages, have ubiquitous uptime, and are extremely affordable. As the use of centralized queues in web applications has become more and more common, many application frameworks (such as Laravel and Zend, in the PHP world) have implemented facilities for the queueing lifecycle directly into their core architectures.
Auto-scaling
Auto-scaling is the simple but revolutionary concept of allowing an application footprint to expand and contract based on current load, or demand. As the demand for resources increases toward the limits of the running system, additional ephemeral servers are instantiated, made available to help meet demand, then subsequently destroyed when load subsides.
Most cloud platforms provide GUI tools and APIs / SDKs that allow their customers to define rules for triggering scaling actions based on different conditions. Some include templates or scripts for adding and removing servers, as well as controls to prevent over-spawning of servers.
For those platforms that do not (yet) have explicit tools and rules/templates for scaling built into their offering, it’s still possible (and relatively easy) to implement automated scaling by building a sort of “conductor” process to survey the health/load of each instance and add/remove resources.
Putting it All Together
While there’s no one-size-fits-all approach for migrating from a dedicated server to a cloud hosting platform, there are a few techniques that we find instrumental in making a web application of any size scalable by untangling an application’s layers and redefining them as distinct components, or roles, that are capable of scaling independently.
Databases
While there are certainly business cases for managing persistent database instance(s), for most web applications relying on commodity relational database platforms (MySQL/MariaDB, Oracle, SQL Server, PostgreSQL, etc.), platform-provided specialized database containers provide the sweet spot of elasticity, performance, fault tolerance, price, and ease of administration.
If an application has multiple databases, it may (or may not) make sense to break them into multiple containers, depending on their purpose and frequency of use, or to delineate access controls for improved security. Tables that contain summary data for internal reports, for example, might benefit from being isolated from the rest of the system and locked down tightly.
Architecting with read replicas is also a consideration. For example, a large CMS installation may want read/write access from web servers hosting the CMS control panel, but only allow read-only access from public facing web servers. This tactic not only minimizes security risks, but with the speedy one-way synchronization of data through read replicas can lay the groundwork for an application to scale broadly across multiple data centers.
Also of note, many cloud vendors offer various low-cost proprietary database types (I’m looking at you, DynamoDB) and implementations of NoSQL and simple key/value pair systems which are worth considering in situations where a traditional relational database is not really necessary.
Web Servers
To get the most out of auto-scaling, web server machine images should be lean and mean, containing only what is required to serve their application and offloading as much work as possible to back end computing resources and microservices.
However, a web server is not just a web server anymore. Modern web applications often have multiple presentation tiers with different userbases (marketing, application, CMS, etc.) and are more and more commonly developed to consume (and/or deliver) a unified API. As each of these tiers is capable of growing and evolving on their own (and may be desirable to isolate for security), creating a distinct web server image for each role (to be load balanced independently) is worth considering instead of one web server image to rule them all.
Whether there is one web server image or many, configuring a load balancer as the endpoint for each role’s traffic (configured in DNS) creates transparent “pools” of application resources that can each scale up and down as demand dictates. All of this is of course managed through the magic of auto-scaling, rules for which can either be defined in the GUI of the cloud hosting provider, or managed through a custom solution that will trigger creation and destruction of ephemeral servers and their registration with the proper load balancer for the server role.
Back End Application Servers
The last piece of the monolithic dedicated server mess we’re untangling is where to run all of the various unseen housekeeping processes that keep the ship running. Such tasks might include cron scripts that synchronize with a remote CRM, resolve credit card billing discrepancies, update summary analytics tables, retrieve data from remote APIs, send user and administrative email notifications, backups, and a myriad of other possibilities.
While spinning up a persistent instance exclusively for handling back end tasks is an option, auto-scaling isn’t just for web servers. A more elegant and scalable solution would integrate a centralized messaging queue, with listeners (background processes) that are constantly polling for new jobs to complete. When the number of jobs on the queue start to pile up, more application servers can be instantiated to carry out the work, then be destroyed as the queue subsides, in a manner similar to the web server auto-scaling model.
Like web servers, there might be a need to delineate types of back end application servers as well. For example, some operations might require specialized tools and libraries, or have security needs or risk exposure that need to be contained. Architecting with these considerations up-front can save headaches later.
Additional Considerations for Optimization
Beyond creating machine images with scaling in mind, it’s important to consider ways to optimize them to maximize performance and availability for serving requests. This topic is worthy of it’s own discourse, as it can encompass everything in the application stack, but there are a few relatively easy to implement best practices that can help keep web servers light and nimble.
Move Assets to a CDN
CDNs help free up web server workloads by offloading HTTP requests and bandwidth for static assets (images, video, PDFs, etc.) from a web server to a globally distributed network such as MaxCDN, CloudFiles, CloudFlare, CloudFront, and even S3.
In addition, many prolific javascript libraries (like AngularJS and jQuery) are published by trusted public CDNs provided by Google, CDNJS, and others. Linking to these resources can not only save web servers a few requests, but as more and more sites leverage these resources, it can mean the library is already cached and ready in a user’s browser.
Application Caching
Most modern application frameworks and CMSs all have some sort of caching mechanism, allowing dynamically generated web pages to exist as static resources, drastically reducing processor overhead per HTTP request. If caching capabilities are not already exposed to an application, implementing a custom solution with Redis, memcached, or even the venerable Varnish as a caching platform can have a dramatic impact.
Microservices, Microservices, Microservices!
Like the queuing services mentioned above, there are myriad of SaaS microservices provided by each of the major cloud platforms and other third parties. Platform services exist for email delivery, logging, search, transcoding, notifications, and just about everything else.
Effective selective use of these services and facilities can dramatically decrease computing resource consumption, improve performance, reduce operating cost, and save significant development time.
Final Thoughts
Shifting thinking about server infrastructure from an inventory of capacity-capped physical hardware to an infinite supply of disposable low-cost computing resources is quite liberating, and avails many opportunities for improved performance, unprecedented concurrency and availability, and significant cost reductions for every sized web application and business.
Are you looking for a partner to help you navigate the transition to the cloud? Let’s make it rain.