Why you should run UI testing in Windows containers

Earlier this year, Microsoft officially released the “Windows Family” 1809 image, which they have touted in previous blog posts as specifically being engineered to support automated UI testing of Windows applications. For those of you familiar with containerization technology, this probably doesn’t sound like a great use case for containers, and when utilizing containers as a development technology, this is largely correct. As a guess containing wild, near baseless assumptions, I’d wager that 95% of the benefits you get from using containers is largely lost or ignored by including enough of the OS to make use of GUI functionality. For one thing, these images get pretty large: the windows:1809 base image that I use for testing is already at 11GB as of their latest update, 2019/04/09.

Microsoft has done a really good job at reducing their base image size in recent releases. As shown on this blog post by Stefan Scherer, the servercore and nanoserver images have made vast improvements in image size. OK, this is coming from the perspective of a Windows fanboy, so you superior Linux diehards out there may disagree that getting an image “down” to 5GB in the case of servercore is an improvement, but that’s OK! Additionally, if you haven’t read that post before, it is a great starting point on this journey. It’s where I got started down this path, as it turns out!

What's the Debate?

The debate seems to center largely around the argument, “If you’re going to use a container image that beefy, why aren’t you just using VMs?” This argument does have some valid points, but my counter-argument (aside from the fact that I just really like learning new things, and playing with containers is fun, even if you’re doing it ‘wrong’) is that we are maximizing the benefit of one of my favorite behaviors of containers: they are deterministic by nature.

Being deterministic means that barring catastrophic failures on the host system, a container provided the same input will always present the same output. Essentially, containers are intended to be ephemeral and unless you are going out of your way to violate this principle when using a container, the container couldn’t care less about the state. This is super useful for developers, where it helps them rely less on the crutch of “well, it worked on my machine…”, and also helps them only need to troubleshoot system issues when they are actually causing problems to the application, and not just because the app is now running into a single, hidden system setting that was configured unintentionally that one time another app was installed. If we could just get people to stop using the machines where they install our applications, I bet I’d be out of a job, which is a statement that has the benefit of being true no matter how you decide to interpret it.

In the Scope of UI Testing, Deterministic Behavior has a Huge Benefit

Properly written tests are only going to fail when the component that they are testing has actually failed, and not just due to arbitrary environmental issues, like, oh, I don’t know, just the fact that another test has ever run on this machine before and, depending on how recently, resources that it tied up may not have been released, or changes haven’t been undone, or… a lot, really. A lot of things can get broken or not get accounted for simply by virtue of something else having been done in that environment before. Since this level of testing is inherently Integration Testing, where separate components of the application are all acting together to (hopefully) produce a correct output, there is still a non-zero chance that external factors (like poor DNS, networks going down, external resources like file servers or lab machines being offline, etc.) can interfere with your test suite as a whole. Making these tests run in a deterministic environment cannot remove all of these factors unless you’re going to create the entire testing environment inside of this deterministic testing zone. What it can, and does, improve for us is the failure rate of tests due to seemingly arbitrary issues. If you’re running into lots of fun issues where you resolve a failing test simply by re-running it, or have logic set up around your UI test run engine to allow for automatic retries up to a set number of times, and then reporting a failure if none of the retries had actually succeeded… yeah, this is a MUCH better option (assuming you have the infrastructure to support it, of course!)

I lightly touched on it before, but I cannot overstate the benefit of removing the “it works on my machine!” argument from the equation, here. Having the ability to go to one of our more experienced developers (read: literally any of our developers that isn’t me), shamefully admit that I am not now, nor have I ever been, in a state where I can honestly say “I know what I am doing” and ask them to take a look at what I’m doing, and then have this thing actually show them the same behavior it was showing me is an amazing benefit. Working on these issues locally on my workstation, I can maintain confidence that, once I get this problem solved here, it is also going to be solved on our production Docker hosts… it is as if a literal weight is lifted from my shoulders. I only have to solve a problem once, whether it be how to access a specific, troublesome control within the application, or allowing the application to do something it normally has no problems doing, like talking to a DC to get a list of available computers. Any changes that I need to apply to the host machine (in this case, the image that the container is created from) are captured in the Dockerfile that is used to create that image. Now that I have the proof-of-concept UI test created, and we’ll begin scaling up this implementation with the rest of the Automation team’s support, getting them involved in this process is going to be as easy as pointing them to the repo where I stored these files as I was working on them, asking them to read the readme on that landing page, and review the Dockerfile. The logic for the individual tests is still going to be stored in the test’s PowerShell scripts, obviously, but the knowledge transfer we would need to normally provide to get someone from “I’ve never seen this project before” to “I can make meaningful contributions to this project!” has just gone from ‘most of a day, at least’ to ‘however long it takes them to read the short documentation’. This is an amazing problem to have solved for you, simply by virtue of employing a technology that embraces that behavior.

I have determined that this is likely enough discussion around the benefits of deterministic behavior, but thankfully that is not the only benefit to be gained from these precious little slices of process isolation technologies!

Another Aspect That is Definitely Worth Considering: Scaling

In the normal use cases for containerizing an application, having the app running in containers makes it much easier to add more instances of the application to handle processing, in order to respond to increased demand, as the need arises. For normal business applications and use cases, this normally results in scaling up the number of replicas you currently have running for any given container under load in response to a spiked increase of user traffic. You don’t want to have to run and pay for 50-200% more compute capacity than your application is using, on average, but you also want to be able to handle spikes in customer traffic, right? You never know when the right Reddit post mentioning your product is gonna generate more traffic than you could ever have anticipated. Designing the application around being containerized, being stateless, and having the ability to scale at whim makes the creation of your application a bit more complicated, but gives you the benefit of being able to easily meet these demands, should the need ever arise.

Scaling, for the use case of UI Testing, should largely be exempt from that, right? I mean, there’s not going to be external forces that are controlling how many tests you need to run at any given time. Well, this is largely true, but… as you move to having more builds of the product per day (whether these builds be internal-only or customer facing, or if they’re running a full run of the test suite or just a subset of the tests tagged and designated to run on this type of build) you increase the chances that you’re going to have builds running simultaneously. When you only have a static number of testing targets available, your different build types start competing with each other for these resources. Addressing this either results in having different types of builds being given priority access to the testing machines (which means you need to design, implement, and test this behavior on top of the code you’re already testing), or needing to create more machines to run the tests, which doesn’t address the issue itself so much as minimize the symptoms.

When Using Containers for Your Test Targets, New Options Open up

For starters, the containers where we’re going to be running our tests don’t exist when we’re not running tests. No disk space is reserved for them, no memory is pre-allocated, the resources they need are only reserved and in-use during the brief timeframe that the container is alive. When you’re not running tests against your Docker swarm, the hosts are free to be used for other things. With well-designed container scheduling and orchestration software, you can have your full suite of tests run on whatever amount of resources are available, as long as they meet the minimum requirements of running a single container. With this implementation, prioritizing your different build types is still important, but it can be accomplished by the orchestrator fairly easily when running in an “oversaturation” orchestration state. Being able to scale in this manner allows you to reduce the overall amount of time it takes a build to get through all of its required tests, which is ridiculously important to getting feedback faster. The one-two punch of getting your test results faster, and only having tests fail when the product being tested is actually producing failing conditions, results in much more responsive build cycles. These build cycles also require a lot less manual intervention, so once we get everything up and running we have more hands available to grasp onto one of the eleventy million other things that constantly need attention here at PDQ.com. Pet projects, customer tickets, blog posts, keeping the webcast “talent” in a constant state of ‘inebriated enough to stand in front of a camera, but not too inebriated to stand’; all of these are a constant battle of resources vs. requirements that seemingly never cease.

That's Pretty Much it

And, honestly, that is pretty much it with regards to the consideration for “why did you decide to pursue the potential nightmare that was getting a UI test running in a Windows container?” There are a ton of benefits to be gained from using a containerized development cycle for an application, and a subset of those benefits can apply to UI Testing, as well. If you have been looking for a project to get you started playing around with Docker containers for Windows, keep an eye out for the companion blog to this one, where I’ll cover the “How” of getting the UI tests running inside of containers. I’ll cover the tools I used, provide some other options I’m aware of that you might try and cover a few “Gotcha!”s that I encountered along the way, to hopefully get others started down a bit of a smoother road than the one I traveled.

I’m sure there are nine and a half million topics that I didn’t cover here, and blog posts on topics of this depth are unfortunately more likely to raise more questions than they answer. On the bright side, this is exactly why comments are allowed on these blogs! Please, I’d love to have a discussion on the relative merits of pursuing this path. I’m happy to discuss this in as much depth as I can go. I may have done a poor job at conveying it, but I genuinely love the concept of containers and I’m ridiculously excited at what we can accomplish with this technology!

To quote our own, great Kris Powell, “You bring nothing but shame to this company, and I seriously doubt you’ll succeed in anything that you do.” This quote doesn’t really provide relevant feedback into the article I wrote here, I just wanted you all to know how mean and abusive our beloved PowerShell presenter can be when he’s off camera.