Selenium Tutorial: Create Tests by Utilizing the Selenium API
This is a post about the Selenium API. Selenium has quite a few components and is often misunderstood. So, first of all, here’s a quick overview of what Selenium is and is not.
Selenium is a tool for automating web browsers. It offers a GUI interface (in the form of a browser extension), APIs in several programming languages, and a server component that allows you to drive a web browser remotely. Plus, there’s Selenium Grid which scales out your automation.
Selenium isn’t just for automating testing though it’s often the go-to tool for the purpose. And why not! It can be used to automate all the major browsers such as Firefox, Chrome, Safari, and Edge/IE.
Since Selenium is simply a tool for automating browsers, it’s not a method of testing. And it’s not actually a testing framework in and of itself. In order to use it for testing, you’ll need to use it in conjunction with a test framework such as Cucumber, JUnit, or NUnit. Selenium can be driven by a GUI (there are official extensions for Firefox and Chrome). And besides, it isn’t the recommended way to do Selenium.
In any case, this post is about using the Selenium API to drive browsers programmatically. So, without further ado, let’s get into some programming!
Python Sample
Using Selenium with Python, we can easily drive a browser with a few lines of code. But first, there are some prerequisites you’ll need before it’ll work.
Prerequisites
For the Python code to work, you’ll need to make sure you have Python and pip (the Python package manager) installed and ready to go. Then you’ll need to install the Selenium package.
pip install -U selenium
Then, you’ll need to install the WebDriver for whichever browser you use. Make sure to get the right version of the WebDriver for your browser. Since I’m using Chrome 77.0.3865, I’ll need the latest Chrome WebDriver version 77.0.3865. Here are the steps:
- Go to the download page: https://chromedriver.storage.googleapis.com/index.html
- Find and download the matching version for your operating system.
- Unzip the contents of the downloaded zip into your favorite tools directory.
- If your favorite tools directory isn’t already in your PATH, add it to the PATH variable.
If you’re lost at this point, you have some options. You can either message your favorite technical friend or search the web for “how to add to PATH variable” or “how to find PATH variable on <replace this with your OS>.” It’s totally OK to not know this, but it’s something you should learn, so why not now!?
The Code
The first thing we need to do besides importing WebDriver from Selenium is to create a driver instance. Then, we’ll use the driver to open a website and perform some actions.
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Chrome()
driver.get(“https://aspetraining.com”)
# do a search
search = driver.find_element_by_id(‘edit-keys’)
search.send_keys(‘selenium’)
search.submit()
# scroll to the course link
course_link = driver.find_element_by_link_text(‘Selenium for Test Automation’)
ActionChains(driver).move_to_element(course_link).perform()
# scroll a bit more to put the whole course on screen
driver.execute_script(f”window.scrollBy(0, 150)”)
You can save this code to a file named search_aspe.py and run it with the following command:
python search_aspe.py
Just make sure you’re in the same directory as the file you just saved and that you have proper permissions.
If you have everything installed and configured correctly, you should see the ASPE Selenium training course in a new browser window.
Anatomy
Let’s take a look at the anatomy of this sample to understand more about how Selenium actually works.
First of all, when you run the sample, it uses the Selenium API to launch chromedriver as a subprocess. You could launch chromedriver on its own and visit http://localhost:9515/status to see that chromedriver has an HTTP API. This API uses the W3C WebDriver standard to accept commands from clients and control instances of Chrome. When you first run chromedriver, you’ll see the following messages:
Starting ChromeDriver 77.0.3865.40 … on port 9515
Only local connections are allowed.
Please protect ports used by ChromeDriver and related test frameworks to prevent access by malicious code.
But if you notice the last message you should be careful about leaving this running. Use Ctrl+C to stop it. Even though you could launch it yourself and interact using HTTP, you really wouldn’t want to do that. It’s a highly inefficient way to work with WebDriver. This is where Selenium comes in.
Selenium adds something valuable on top of WebDriver—it gives you APIs you can program to. You use Selenium APIs to manage the WebDriver process, manage sessions, and communicate with WebDriver. WebDriver itself does all the heavy lifting. In fact, each major browser has its own WebDrivers.
WebDriver
The World Wide Web Consortium (W3C) sets the standards for WebDriver. This standard provides some description of what WebDriver is, what it should do, and defines the HTTP endpoints that need to be implemented. Browser makers such as Google, Mozilla, Apple, and Microsoft implement their own WebDriver. Safari’s WebDriver is built-in and only needs to be turned on. Others must be downloaded or installed.
Remote Control
WebDriver endpoints are only available through localhost as a security precaution. In order to access WebDriver on a remote computer (for things like testing multiple browser versions and running tests at scale), you need Selenium Server, Selenium Grid, or a service that will run your tests at scale.
Subtle Differences
Since each browser maker uses its own JavaScript engine and implements its own WebDriver, there are subtle differences in the way they handle some commands. For example, the JavaScript Selenium API uses promises. You can use async/await functionality to resolve these promises. However, Firefox and Chrome behave differently when it comes to responding to certain async commands. And of course, Edge, Opera, and Safari WebDrivers have their own way of being finicky as well. Let’s look at a sample of the JavaScript Selenium API now.
JavaScript Sample
Selenium provides a JavaScript API via NodeJS. Use npm to install it. Of course, it still depends on you having a WebDriver installed just like any other Selenium API. Here are the prerequisites in a nutshell.
Prerequisites
This time, I’m going to drive Microsoft Edge with Selenium and NodeJS. So first, we have to make sure we have some things set up.
- NodeJS
- npm or your favorite package manager
- Edge browser
- The WebDriver (see: https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/)
npm init
# fill in what you want
npm i –save selenium-webdriver
Alternatively, you could just install selenium-webdriver in your global npm as follows:
npm i -g selenium-webdriver
Once you’ve got everything set up, you can get on to some coding.
The Code
This is a sample of JavaScript code you can use to automate the Microsoft Edge browser.
const { By, Builder } = require(‘selenium-webdriver’);
const driver = new Builder()
.forBrowser(‘MicrosoftEdge’)
.build();
driver.get(“https://aspetraining.com”)
.then(async () => {
const searchBox = await driver.findElement(By.id(‘edit-keys’));
await searchBox.sendKeys(“seleniumuE007”);
const link = await driver.findElement(By.linkText(‘Selenium for Test Automation’));
await link.click();
})
.catch(console.error);
Save the file as search_aspe.js in your favorite directory.
Simply run the code using the following command in a shell or command line:
node search_aspe.js
You might get an error when it comes to clicking the search result. I did! But let’s do a little experiment before we try to resolve this.
Multi-browser Runs
We’re going to run this automation in multiple browsers at once. We’ll run it in Edge, as we are now, and Chrome to see how it works there.
First, a little refactoring. We’ll extract the test code into its own function like this:
runTest();
function runTest() {
const driver = new Builder
.forBrowser(“MicrosoftEdge”)
.build();
driver.get(“https://aspetraining.com”)
.then(async () => {
const searchBox = await driver.findElement(By.id(‘edit-keys’));
await searchBox.sendKeys(“seleniumuE007”);
const link = await driver.findElement(By.linkText(‘Selenium for Test Automation’));
await link.click();
})
.catch(console.error);
}
Next, we’ll have runTest take a parameter for the browser name.
runTest(browserName) {
const driver = new Builder()
.forBrowser(browserName)
…
Adding to the import at the top of the file, we can use built-in browser names and call runTest for each browser. This will run the automation in both browsers simultaneously!
const { By, Builder, Browser } = require(‘selenium-webdriver’);
runTest(Browser.EDGE);
runTest(Browser.CHROME);
function runTest(browser) {
const driver = new Builder().forBrowser(browser).build();
driver.get(“https://aspetraining.com”)
.then(async () => {
const searchBox = await driver.findElement(By.id(‘edit-keys’));
await searchBox.sendKeys(“seleniumuE007”);
const link = await driver.findElement(By.linkText(‘Selenium for Test Automation’));
await link.click();
})
.catch(console.error);
}
This works for me in Chrome but still fails in Edge. Frustrating at times! Using a bit of reason, we might realize that each browser has a different driver running it. Also, the way they handle certain actions is slightly different.
Sometimes, it can take a bit of debugging to find the best way to solve this kind of problem. I’ll leave that as an exercise for you. Will it work with By.partialLinkText? By.xpath? Or can you get multiple elements by class then select a certain index? Give it a try and see what you come up with (there’s sure to be more than one solution).
In Closing
If you’ve been running these samples, don’t forget to close your browser windows. Normally, calling driver.close() is part of a test cleanup and should be in a finally block at the end of each test. But sometimes the test run goes so quickly you can’t see what happened. So in these samples we let the window stay open.