Control browsers in the cloud with Golang by using the Playwright library.
By Max Schmitt, Published on 9/10/2020
Automating browsers in the cloud, like on Kubernetes with a lightweight, type-safe, and easy to use programming language is especially useful for critical applications. In this blog post, we're gonna focus on the Go version of Playwright, what the current state is and what their limits are.
Playwright is a Node.js library to automate Chromium, Firefox and WebKit with a single API. Playwright is built to enable cross-browser web automation that is ever-green, capable, reliable and fast. Headless execution is supported for all the browsers on all platforms.
Playwright for Go aims to be a 1:1 wrapper of the JavaScript version. It uses like Playwright for Python the official driver implementation internally. Go has key differences compared to JavaScript and Python which affected on the Playwright implementation:
interface{}
based type to the basic types, map[string]interface{}
or []interface{}
.The core concept of the driver implementations like Go or Python is based on a sub process which is started in the background. This will be done with the Run()
and the corresponding Stop()
method.
package mainimport ("log""github.com/mxschmitt/playwright-go")func main() {// Launching the driver internallypw, err := playwright.Run()if err != nil {log.Fatalf("could not launch playwright: %v", err)}// Start the Chromium browserbrowser, err := pw.Chromium.Launch()if err != nil {log.Fatalf("could not launch Chromium: %v", err)}// Creates internally a context and a new pagepage, err := browser.NewPage()if err != nil {log.Fatalf("could not create page: %v", err)}// Visit the website and wait for a network idle for at least 500msif _, err = page.Goto("http://whatsmyuseragent.org", playwright.PageGotoOptions{WaitUntil: playwright.String("networkidle"),}); err != nil {log.Fatalf("could not goto: %v", err)}if _, err = page.Screenshot(playwright.PageScreenshotOptions{Path: playwright.String("foo.png"),}); err != nil {log.Fatalf("could not create screenshot: %v", err)}if err = browser.Close(); err != nil {log.Fatalf("could not close browser: %v", err)}if err = pw.Stop(); err != nil { log.Fatalf("could not stop Playwright: %v", err) }}
The core concept of Playwright is based on a browser that has multiple contexts. A single context is an isolated entity that has separate state for cookies or local storage. Each context has multiple pages with browser session shared between each other.
We specify networkidle
to the Page.Goto()
call which waits after there is a network idle on the page for at least 500ms. You can either pass then a path to the Page.Screenshot()
method or use the returned data ([]byte
) directly.
In this example we're navigating to Hacker News (popular tech news) site, crawling the current entries, and printing them to the standard output. This could in this case be done using HTML scrapers since Hacker News is based on returning rendered HTML but our approach would also work for dynamically generated sites with JavaScript which use e.g. React, Vue or Angular as a frontend framework.
package mainimport ("fmt""log""github.com/mxschmitt/playwright-go")func main() {pw, err := playwright.Run()if err != nil {log.Fatalf("could not start playwright: %v", err)}browser, err := pw.Chromium.Launch()if err != nil {log.Fatalf("could not launch browser: %v", err)}page, err := browser.NewPage()if err != nil {log.Fatalf("could not create page: %v", err)}if _, err = page.Goto("https://news.ycombinator.com"); err != nil {log.Fatalf("could not goto: %v", err)}// Looping over the DOM elementsentries, err := page.QuerySelectorAll(".athing")if err != nil {log.Fatalf("could not get entries: %v", err)}for i, entry := range entries {// Finding the next title link elementtitleElement, err := entry.QuerySelector("td.title > a")if err != nil {log.Fatalf("could not get title element: %v", err)}title, err := titleElement.TextContent()if err != nil {log.Fatalf("could not get text content: %v", err)}// Printing it to the consolefmt.Printf("%d: %s\n", i+1, title)}if err = browser.Close(); err != nil {log.Fatalf("could not close browser: %v", err)}if err = pw.Stop(); err != nil {log.Fatalf("could not stop Playwright: %v", err)}}
In this example, we're making use of a selector on the latest news entries. We then loop over them, get its corresponding header/title element and print it to the console.
In this introduction, we went through two examples and an introduction of Playwright for Go and its current state. We are working to make it bulletproof, concurrent-safe, and adding more tests in the next few weeks to ensure it can be used in production. For further reference, see on GitHub.