Enter Selenium


Prev		Next

Selenium’s tagline is “Selenium automates browsers. That’s it!” Automation works because modern web browsers implement a “WebDriver” set of APIs for this purpose. The main use case is automated testing of web applications, but it’s not limited to that.

Selenium is roughly three APIs in a trench coat. One of those APIs talks to the web browser through the WebDriver API. The other exposes a standard Selenium API to a host language. And in the middle, Selenium wires these two APIs together.

There are host language APIs for Java, Python, C#, Ruby, JavaScript, and Kotlin. For an implementation written in any of those languages, talking to Selenium is just a matter of loading the right language bindings.

In theory, we could start a web driver for our browser, start Selenium, direct Selenium to load the page, wait for the scripts running in the browser to populate the table, and ask Selenium to give us the data.

All we need to do is implement a Selenium step.

The trouble with a Selenium step is that it doesn’t do anything all by itself. It exposes an API that you can drive from a host language. You’d need to be able to drive it from XProc.

To drive Selenium from XProc, the step would need to have some way to describe what the author wanted to do. XProc isn’t really designed to support stateful, imperative programming so mapping directly to the Selenium host language APIs isn’t really an option.

What’s needed is some way for the pipeline author to describe how they want Selenium to behave. They need some mechanism for scripting the interaction. Then we can imagine a Selenium step that takes that script as input and uses it to interact with the browser.