Today, Alan Page wrote a pretty good post about his thoughts on testing and automation. His main argument is that automation has gotten out of control in testing, and that too much effort and time has been spent on automating things that don’t need to be automated. You should really read the whole thing.
One of his main gripes is with UI automation, and how many automation tools produce test code that is tough to maintain and run. Ask anyone who’s done any significant UI automation and they will tell you tests can be brittle and a pain to keep up. I myself have seen such test code that’s held together with hope and a prayer (and written some as well). I read through the post and nodded to myself with each additional point, since it was all so true.
Then I remembered I was a Selenium test developer.
For about half a second, I panicked. Is my job really just based on one big ball of lies about software testing and quality?
I don’t think so. In fact, I think UI testing can be quite beneficial for software projects if done well.
One meme that’s been gaining credibility is that “test code is code”, something that I agree with whole-heartedly. Test code should be considered at the same level as production code, with all the usual considerations: code review, proper source control (not just any old crap), testing and design considerations, etc. It’s not just something you should throw together blindly. In that case you’d end up with the horrible messes that Alan describes.
One corollary of the “test code as code” meme is that test code starts to look like test code, in that it is written, debugged and deployed along with the related production code. In some projects, test code is even released along with production code. These tests can be used to check local builds for errors, explore areas of the code that need work, and so on. In other words, tests are a feature of the underlying product, which happens to be true whether the tests are customer-facing or not.
I think this is where automated testing comes into play. I think the correct way to think about unit tests, automated integration tests and a UI tests (such as Selenium tests) is not to “find crazy bugs” that no one else would find but to provide an increased lower bound on code quality and provide (fast) feedback to team members. Imagine making a change and knowing whether or not it caused any problems (and where!) within minutes or even seconds of a commit. Imagine making a change to your web app, committing that change and finding out within minutes that the change was fine on Firefox but had an issue on Chrome.
Further, automated testing can do things that are difficult for humans beings to do manually. It’s not a stretch to turn a collection of unrelated unit tests into a fast test harness for verifying the overall condition of a codebase. It just takes a bit of organization and possibly some ingenuity. And a dedication to unit testing.
These are real possibilities that are possible with test automation. More importantly, these are possibilities that no longer need access to supercomputer-esque servers or expensive automation tools. These things can be put together using most of what your devs have lying around right now.
I won’t deny that a lot of automation code falls into the category of “shouldn’t have been written”. But automated testing might hitting its stride as a software discipline and may start to show its true value.
In the spirit of my last post on what Selenium can and can’t do, I started thinking about what Selenium is good at, and what it’s not good at.
I realized that I might be on to something.
Selenium is an excellent tool, one that has grown significantly in utility and I think will be around for a while. It may also become a standard tool recognized by the W3C for web automation. Automated UI testing is looking more and more like unit testing; a standard testing approach when done well. Selenium is a big part of that, and it’s here to stay. Everyone and their brother wants automation with Selenium (or similar tool).
What’s the first thing people usually look to when they start writing Selenium scripts? Login pages.
Why is this? They’re familar, for one. Everyone has seen one, most people who work in tech see several daily. Login screens are also somewhat generic. In 99% of cases a login page has three things: a username text input, a password text input and a “login”-type button. In many cases, even the underlying source has items with explicit IDs or names like “username-input” and “password-input”. Most use cases are the same: provide a username and password, click the login button and observe the result. All very familar, all very straightforward. Writing an automated test case for a login page is the “Hello World!” of UI testing.
And this is precisely why you shouldn’t be doing much automated testing with login pages.
I’m not saying never write a production UI of a login page, although that’s not an entirely terrible idea. I’m saying that a login page is an excellent place to see the distinction between what can be tested and what should be tested. It’s a subtle difference, but one that can be very beneficial to realize.
Let’s look at testing a login page. Why not extensively test a login page with Selenium?
- It’s A Slow Test. Like I said above, most (obvious) login test cases follow a pattern. An automated Selenium test would probably always have these steps: start the browser, navigate to the login page URL, wait for the page to load, enter a username, enter a password, click the login button, check for the expected result. Even using Selenium, these steps won’t really be much faster than a power user completing the same steps. This might not seem so bad until you consider code maintenance (and rot), which will likely make it slower to automate than to simply test by hand
- It’ll Slow Down Your Other Tests. Once you’ve automated logging-in using Selenium, the temptation will be strong to use that as part of the setup method of other test methods. After all, you’ve got it, so why not use it? See the previous point. Now all of your tests are slower by a few seconds, and implicitly depend on the login page functionality. You also risk having tests fail in the setup before any test methods are even run. This is an example of where good testers can produce slow, brittle test code that is the scourge of automated testing.
- You’re Missing More Interesting Tests. A login page is really just a user-friendly interface for HTTP authentication in your app. While you could test login capabilities and limitations directly through this interface, automation can provide some interesting information about security. Can you login successfully without a browser? Without a valid login? Using a different authentication scheme? Many of these approaches are not really well suited to using Selenium.
In light of these points, maybe Selenium should be given a miss for login testing.
Selenium is a pretty cool tool. It’s nicely documented and is fairly well designed as far as APIs go. Almost any web application can probably benefit from some Selenium UI testing one way or the another.
But it can’t do everything. Unfortunately, the expectations for Selenium tests are quite high. Sometimes unrealistically so.
I think this comes down to thinking that Selenium can simulate “anything” in a browser. Loosely speaking, this is true if you define “anything” to be things that a real end-user (i.e. a live person) would do in a browser window. Strictly speaking, this statement is false. To understand why, you have to understand what exactly a major browser does and how it works.
A modern web browser like Firefox of Chrome performs many tasks, a good number of which are hidden away. Such tasks may include sending and receiving HTTP requests, rendering HTML, picking up regional settings on a host OS, managing user data such as preferences and saved content, and so on. All of these tasks work in concert to bring a user to a given web page or app, even if a few of them are front facing. In fact, modern browsers are so effective at doing this that most users are not even aware that many of these tasks are distinct and to a rather large extent independent. When testing a web app, this can cause some confusion, particularly with novices in web development (myself included).
How does Selenium fit into this? Selenium is a browser driver API. More specifically, it is a DOM API within various browsers. This is the part of the browser that contains rendered web pages, normally referred to as the browser window. The DOM is where HTML elements on a page are found along with all their element’s HTML attributes. If you want to do an action to an element in the browser window such as clicking a button or finding all elements with a particular attribute, Selenium is the tool for the job.
And that’s pretty much it. If you want something else done, you will need another tool.
For example, Selenium does not handle HTTP requests. This is by design, and probably a very good move. Why doesn’t Selenium do this? Because handling HTTP requests is a distinct, separate task from traversing the DOM. These tasks can (and should) function independently of one another. Yes, these tasks are related in a natural way but they’re still distinctive. One doesn’t require the other. There are also many good reasons to separate these tasks, from both development and testing perspectives. Furthermore, there’s are other great tools that do HTTP requests quite well.
Also, Selenium may be a DOM API, but it’s not magical or omniscient. It still has limitations that all UI automation tools have. It’s dumb and only does what it is told to do. That means that if a check-box is changed to a radio list, or if a hyperlink is moved from being a plain text element on a page to a link that only appears on rollover of some other element, tests will be broken. This is, in my opinion, the main drawback to automated UI testing: fragility. Selenium tests will likely need a fair bit of maintenance, such as regularly updating test logic and locators. In a lot of cases this is almost unavoidable even if good coding practices like the Page Object pattern are used.
Developers and managers sometimes overlook this part of UI automation, assuming it will be “easy” or “shouldn’t take too long” to write Selenium tests. Sure, these tests can be easy to write, but not quite so easy to debug and sometimes quite difficult to maintain. This also doesn’t even get into the problem of writing automated test scripts to replace other forms of testing so that testers can “forget about” certain aspects of testing. Tools should be part of a testing team, not its replacement.
A last aspect of Selenium that is misunderstood is that, despite being a browser driver API, it cannot actually test the browser application itself. Want to confirm options appearing in the right-click context menu appear correctly on your webpage? Selenium can’t do that. Ditto with bookmarking or working with a PDF in Adobe in the browser. It may be in the “browser” but it’s not in the DOM.
So, there you have it, my quick rundown of Selenium’s limitations. It’s still a great tool that is quite helpful (particularly with entirely JS-only applications). It is still good to know the limits.
Some people in software development don’t “get” automation. They don’t see the value. They don’t see why so much “effort” needs to be put into automation. Or they simply think test automation is a hassle and waste of time.
What is the value of test automation? I think it’s summed up quite neatly in this tweet I saw a while ago:
That moment when you demo test automation for your client and they see what took days to do once now takes 25 seconds, every commit.
— Dave Haeffner (@TourDeDave)
I was working on tests today and discussing things with some colleagues, when I started thinking about some boundary cases. I faced a situation where I thought “What’s the craziest thing that could happen?”
A lightbulb went off. After several seconds, the idea solidified. Somewhere, a dog barked.
The idea is “-est” testing.
Take a noun that could describe a user, a user action, an application or some other related concept (“crazy”). Then take it to the extreme (“craziest”) and use that as your base to direct how you test an application. It may give some insights into how to test an application, and perhaps processes around a given app.
Here are some first-draft, in depth examples:
- Stupidest: It’s almost a cliche to test an application under the assumption that a user is stupid. This may include passing inputs that are incorrect or of the wrong type, forgetting steps in a defined workflow or using a piece of software for the “wrong” task. But what is the stupidest thing a user could do? This line of thinking might yield insights not just to the application but also to assumptions about a user. What if the stupidest scenario a tester comes up with is actually a common use-case for many users? What if it’s actually not that stupid of a scenario under different circumstances (that a tester may not be aware of)? This line of reasoning might also help find limitations in the usability of an application. If it’s not difficult to come up with the “stupidest” situations for users, it might be time to look at the big picture with your application.
- Slowest: Obviously you could look at performance in this case, but you could also look at things like usability. If two workflows take wildly different lengths of time to complete, perhaps you might want to eliminate the second one, or promote the first one. Also, how well does the application holds up under the slowest possible circumstances might be an interesting line of inquiry for testing.
- Dangerous-est: More commonly known as “most dangerous”. This might touch on things like privacy, security, and overall harm. It can also help establish what exactly are the limits of what an application can do. It may also provide insights into how an application fits into a wider platform (OS, platform, embedded device, etc). Can you application do harm to other applications, either knowingly or unknowingly?
Those are just some quick ideas that come to mind. I do think this idea of “est” testing is good since it’s both simple and versatile. Like other good testing approaches, it can provoke conversation and discussion of the problems at hand. It also works well in a wide array of circumstances, positive and negative.
I’ll have to see how this plays out. After all, what’s the worst that could happen?
Here’s something that comes up quite a bit when I write automated tests: A web app has a field that looks like a text box. When the box is clicked or otherwise focused on, a grid appears with rows and columns that may be selected. I find that instead of entering text to complete the field, I need to select an element or row within the grid. Sounds easy, right?
It turns out it isn’t.
The elements within the grid sometimes do not have helpful id attributes or locators. In some cases the rows might have unique identifiers, but not elements with a given row. Other times, attributes are random or at least not known ahead of time. Often this is due to the underlying framework of the web app and how the source is generated.
I don’t think it has to be like this, even though people have come up with some clever approaches to working with grids.
The most effective way to make a web app automation-ready is include unique id attributes for any element that is required to be tested. This is usually a task for the web developers.
Sometimes, this is pretty straightforward. If the web app is built using a mature framework there are probably methods and tools available to set unique id’s for testing or automation purposes. For example, GWT has the ensureDebugId() method for UIObjects, notably Widgets. Using this method, the developer can easily set a custom id attribute value prepended with the string “gwt-debug”. This means that these testing id’s are easily identifiable to everyone, and can be removed if necessary in production. In my experience, this approach works mostly well (so long as all developers involved get on board with the idea). Entire grids are often UIObjects, and thus are easily recognizable to things like Selenium.
However, elements within things like grids like table rows or input elements can be tricky to work with. If these elements are dynamic, frameworks often generate unique id as needed. These are often not constant in time nor predictable, and so aren’t dependable for automation. Even identifying rows using xpath or other approaches might cause some problems if attributes aren’t stable or unique “enough”.
What can be done? Here’s an approach that’s worked for me and the web developers I’ve worked with:
- Set a unique id tag for the entire grid (e.g. gridElement.ensureDebugId(“client_grid”))
- Determine a “base” element id tag for each element of interest within a row (“client_name”, “client_address”, etc)
- Within each row that has been generated on a page, append the row number to the end of each id of each element
To do this, we did have to use a custom setId()-type method, that was called within a render method like this:
element.setAttribute(“debug_id” + rowNumber)
Including generating id attributes that are suitable for automation, this approach should also provide id attributes that are guaranteed to be unique on a given page, unlike what some web frameworks generate by default. One possible downside is that you may have to expose automation tags in production, but there are likely ways around this, if it is in fact a problem.
So far, it’s been working for me and my team, so I think this is a good approach overall.
Now if only something like this could be added as a feature in web frameworks…
Lately there’s been some hot talk around the Interwebs about “following your passion” as a career strategy. The idea is a fairly well-known one: to find a promising, fulfilling career, simply follow your passions. Do things you love, and happiness and success will be yours. This advice is usually given to younger folks, particularly those looking for a first job out of school or who are early in their careers.
I’m not convinced. In fact, I think it’s almost horrible advice. Some people agree with me. There are also some who realize that you are not your job, so stop thinking like that.
My experience is an excellent example of how not following your passions can work out very well.
I am currently an automated tester. I actually love my job, and enjoy it quite a bit. I’d also like to think I’m pretty good at it. I also didn’t even know it existed six years ago. In fact, I didn’t know much about anything in terms of software testing six years ago. Was software testing always my passion? It couldn’t have been, by definition.
Contrast that to what was my passion through high school and university: mathematics and education. I always wanted to be a high school math teacher, and all my life people have told me I’d make a good one. I still wonder what it would be like, but I’m not sure it’s still a passion of mine. In any case, I’m not following it as a passion. I still love math, but that’s not the whole story.
This is probably for the best. I’ve heard that in Toronto, the percentage of individuals chosen from the eligible-to-hire list that get tenure-track positions each year is around 5%. Those are “getting into Harvard numbers”, “I survived a deadly, rare form of cancer” numbers. All of these candidates have at least one year of teacher’s college and some supply experience. And it’s not exactly a shining time to be a teacher in this province.
Had I followed my passion, I would’ve gone right into this mess. Yes, I may have been happy, and I may have found a good career throughout all of this. Yes, I might have become fulfilled in what I get up and do every day.
But here’s the bottom line: I have done that. Not hypothetically. I actually have found a career path that makes me happy. Full stop.
And this is the problem with the “following your passion” line. It’s based on assumptions and hypotheticals. It assumes that if you love something you’re good at it, and that you can turn that into a career. Following this advice means imputing a bunch of details. It sounds good but falls over upon the slightest inspection. It doesn’t really take into account reality, nor your personal situation.
For example, if your passion is to play for the New York Yankees, well, just work really, really hard at playing baseball, and follow your passion. If you don’t become a Yankee, well, you just weren’t passionate enough, I guess. The fact that you have been born in Yellowknife, or that you had to give up baseball to support your family, or that you lost a sports scholarship because of a single poor grade in calculus (or other possibilities) never gets taken into account. Nor does it seem to get into account toward what you could be working toward career-wise. While you may be working toward being a MLB player, you could find out you have other skills that are valuable but not necessarily in becoming a MLB player. But forget about them if it’s not involved in finding your passion.
Nope. It’s simple: Passion implies success. If you’re not successful, well then, darn it, you just weren’t really passionate enough.
In reality, “following your passions” gets things almost exactly backwards. Most good careers start by someone being good at a particularly activity, good enough to impress someone else enough to pay them money or support them otherwise. The more someone works at this activity, the better they get, and the more attention they receive. It’s why people get promoted in workplaces. It’s why freelancers can charge more than their peers per hour. It’s why some people can ask to work a six-hour day instead of seven or eight and get paid the same. Because they’re good at what they do, and because they’ve built a career doing so.
To be successful, figure out what you are good at, then get better at it. It doesn’t matter if you’re “passionate” about it, particularly if you see your job as a component of your life instead of the whole thing. If you’re good enough, people will pay attention. Even more so if you can solve a problem they have for them.
Following your passions means fantasizing about what your career should be like, instead of actually having one. In my experience, having a good career is much more preferable to just thinking about it. Working with reality is usually the best way to go.
In continuing my series on Page Objects, I want to discuss the second property of POs from my first post, namely that
- Page Objects contain at least some class methods that return other Page Object (possibly of the same class), meaning there closure between Page Objects
This property is a bit obvious and a bit subtle. The idea is that POs truly represent the web app they model, without any obvious leaks (even if all abstractions are kind of bad), and that you don’t have to introduce any modelling hocus pocus to make them work.
How does this property accomplish this? Let’s see some examples.
Suppose your app has two pages, a home page and a contact page. Each page has a link to the other page that is always present (you can always directly click from one to the other). Using POs, you can model this relation in the following way (with code reuse from previous posts):
public class HomePage extends BasePage {
public HomePage(driver) {
superclass(driver);
}
public ContactPage goToContactPage() {
// navigate to contact page
return new ContactPage(driver);
}
public class ContactPage extends BasePage {
public ContactPage(driver) {
superclass(driver);
}
public HomePage goToHomePage() {
// navigate to home page
return new HomePage(driver);
}
Now, with these definitions in place, here’s part of a test script that includes navigating between the two pages:
HomePage home = new HomePage(driver);
// …
// go from the home page to the contact page
ContactPage contact = home.goToContactPage()
// do interesting stuff, then go back to the home page
home = contact.goToHomePage()
Great! You can now clearly see the relationships between your pages and how your tests trace through them. It resembles (at least partially) how a driver “walks” through pages of web app.
There are more benefits though. Notice that the each PO’s constructor is called every time a goToPage()-type method is called. This means that any checks or conditions that the constructor contains get done each time such a method is called. It’s almost like “free” checking, in the sense that you don’t have to manually place these checks at the end of any given method. That’s a bit of a win when you start to look at the PO design pattern in practice, with many methods called and scripts run.
And of course, these points apply to methods that do more than directly link from one page to another. You could argue that any user action “navigates” from a page to a “new” page, even if that page is an identical copy of the same page, so this concept is applicable to wide range of actions that take place on a given page.
It’s also clear that the driver member of a given script (or PO) is cleanly handled. No more mucking about keeping track of static drivers or objects that need messy initialization methods called periodically during tests.
As well, another main benefit of this property is that the capabilities of each page are clearly described while the actual implementation details are hidden. For example, in the goToHomePage() method, you could go to the home page by clicking a link or by redirecting using a URL. The actual details are irrelevant, as long as you end up on the expected page. Put differently, your test scripts are independent of the script/driver details, which is key. This applies doubly to less obvious methods such clicking a button that takes the user to a new page, or a scenario where an invalid action leads to a different page.
There are some lesser benefits too, such method chaining. That means you can do things like
phoneNumber = home.goToContacts().getPhoneNumber();
This can be a good thing, but it can also be a problem when it comes to debugging. It may also make your code less readable more confusing. It’s there anyway.
I am aware that there is some criticism of this property, and some advocate that POs do not necessarily have return other POs all the time or ever. I can see some possible downsides to my approach (garbage collection affecting performance is one) but I honestly think the way I’ve outlined is best, for now. You may also return the PO itself to preserve this approach (ie “return this” or “return self”) to help avoid memory management problems. When implemented properly, it can clear up many common automated testing issues such as erroneous stale elements and waiting for loading pages. It can almost be magical.
That’s my description of Page Objects, hopefully with helpful examples. Let the testing begin.
Once upon a time, when I was an undergrad math student I took a course in abstract algebra. Among other reasons, I was interested in taking the course because I read somewhere that the famous Rubik’s Cube puzzle was based on advanced algebraic concepts and so maybe learning the theory would teach me how to solve one.
As it turns out, we did learn a bit about Rubik’s Cubes, since it is an example of a permutation group on 27 elements (i.e. the outer faces). Permutation groups are an important class of groups, since a large class of groups can be represented as permutation groups. Essentially, since a Rubik’s Cube is a permutation group, you can always get from any given arrangement of colours to any other particular arrangement. Every Rubik’s Cube can be solved, regardless of how it is mixed up.
That’s the theory, which we covered in some detail. How to actually solve a Rubik’s Cube wasn’t touched on. At all.
In other words, I know why a Rubik’s Cube can be solved, but have no idea how to solve one. Which is a problem, since solving one is actually the fun part.
Meanwhile, at a cafe where I frequented during undergrad, I saw a poster ad for “lifetime” Rubik’s Cube lessons for how to actually solve cubes from any configuration. The tutor was something like 16, a local high school student, and apparently a Rubik’s Cube expert. I’m certain he didn’t have any training in abstract algebra but he had “right stuff”, as it were. He could get stuff done and was even a bit entrepreneurial about it.
Sometimes practice doesn’t follow theory. Sometimes they don’t even know each other exist.
In my last post, I began a series discussing Page Objects (PO), and what makes them so great. This post will continue on the theme, talking about what makes Page Objects Page Objects.
Previously I gave two defining properties that, I think, are requirements for Page Objects. The first one was that a PO has:
- A (private) driver object that is hidden away from public consumption, and is only handled via initialization/constructors
The key to this point is that the driver is a member of the PO class, and is mostly hidden from public PO calls. The driver should be only really be controlled at the highest levels of the PO hierarchy (i.e. in as few places as possible). This also helps change the underlying driver type if required without a major rewrite since the driver definitions only have to be changed in a few specific places.
Here’s a pseudo-code example of a very basic base class and how it contains a driver member:
public class BasePage {
protected _driver;
public BasePage(driver){
this._driver = driver;
}
}
And that’s it! It’s the most basic case but all the important bits are there and completely valid.
Let’s see how an actual PO class representing a home page of a web site or app would be implemented:
public class HomePage extends PageObject {
public HomePage(driver) {
superclass(driver)
}
// methods and fields that represent interactions with HomePage
}
And that’s it! The driver member is completely encapsulated by the class. It can be accessed throughout the class, but not publicly by just any old object or function.
Why is this great? Let’s look at a couple of benefits.
First, notice that I’ve just been using the term “driver” generically. In this post and the previous one I haven’t mentioned Selenium (or Watir, or SilkTest, or some homegrown Franken-driver). This means that there’s a clear level where the driver specifics are implemented. If you decide to change the particular browser driver, there’s a clear place to do so and not some patchwork across classes and files. This isn’t just a theoretical point: at work we are currently using Selenium and Silk4J, a Java binding of SilkTest that can drive browsers and desktop applications. It could happen that Selenium could be replaced with Silk4J (even if that would be widely regarded as a bad move). Yes the method details would have to be re-written, but at least Page Object structure stays the same, and you would almost certainly not have to re-write much of your test code, just the object/page side of things.
Incidentally, I also haven’t mentioned Java, Python, C++ or any other language. Page Objects illustrated here apply to any of those languages or others.
Second, using the PO pattern above simplifies writing method calls and signatures. Instead of having to pass a driver as an argument to all of HomePage’s class methods, you have an internal driver instead. This means methods that might look like
public bool isTitleCorrect(driver, expected_title)
now look like
public bool isTitleCorrect(expected_title)
This saves a lot of typing, as it’s a form of Don’t Repeat Yourself. Code looks cleaner and is easier to understand, in my opinion. It also saves error checking and handling directly of a driver outside of a PO instance, which can be a big win.
Lastly, and maybe most importantly, this pattern allows for checking right in the constructor! This is big; you can do some checking almost for free when you initialize new instances of pages (which will be discussed in Part 3).
Here’s an example. One thing you want to do often is to confirm an action takes you to the correct page, like clicking a link. One way to do this is to check the page title is correct. Suppose that isTitleCorrect() is already a method of the PageObject class. Then we can add the appropriate call to the class constructor, something like this:
public BasePage(driver){
this._driver = driver;
this._title = this.getPageTitle();
if !(this.isTitleCorrect(this._title)) {
Error(“Incorrect Page Title”);
return;
}
Now, every PO automatically checks a page’s title whenever a PO is initialized during tests. As we shall see in the next post, this is very helpful. It’s almost like getting checks done for free, since every PO object now has this capability automatically. You can also add specific checks for different PO classes on top of the base PO checking as well.
This post has gotten a bit long, so I’ll stop here. I’ve outlined some benefits of how using a constructor/initialization approach to the Page Object pattern. If there’s any downsides, I’d like to hear about them.
In my next post, I will discuss the second point from my initial post in the series.