Headless browser - Puppeteer

Browser

About

Puppeteer is a Node library that provides a high-level API over Chrome or Chromium (ie headless chrome)

Puppeteer communicate with the browser via the DevTools Protocol

API

The Puppeteer API is hierarchical and mirrors the browser structure.

  • A Browser instance can own multiple browser contexts.
  • A BrowserContext instance defines a browsing session and can own multiple pages.
  • A Page has at least one frame: main frame. There might be other frames created by iframe or frame tags.
  • Frame has at least one execution context - the default execution context - where the frame's JavaScript is executed. A Frame might have additional execution contexts that are associated with extensions.
  • Worker has a single execution context and facilitates interacting with WebWorkers.

Puppeteer Architecture

Component

puppeteer-core

puppeteer-core is a library to help drive anything that supports DevTools protocol. puppeteer-core doesn't download Chromium when installed. Being a library, puppeteer-core is fully driven through its programmatic interface and disregards all the PUPPETEER_* env variables.

puppeteer-core doesn't download Chromium when installed.

Usage:

  • build a PDF generator using puppeteer-core and write a custom install.js script that downloads headless_shell instead of Chromium to save disk space.
  • to use in Chrome Extension / browser with the DevTools protocol

Code Usage:

const puppeteer = require('puppeteer-core');

puppeteer

When installed, it downloads a version of Chromium, which it then drives using puppeteer-core. https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#environment-variables

Example

Integration

Javascript - Jest-puppeteer with typescript configuration

API / Doc

Launch

const browser = await puppeteer.launch({
  headless: false,
  slowMo: 200, // slowdown by 200 ms for every operations
  devtools: true,
  args: [
    '--disable-infobars', // Removes the butter bar.
    '--start-maximized',
    // '--start-fullscreen',
    // '--window-size=1920,1080',
    // '--kiosk',
  ],
});

Snippet

Serialize and Deserialize a date

Puppeteer - How to pass back and forth a date (or a complex type) to the headless browser via the evaluate function

Execute Javascript inside the page

Example with local storage and passing parameters

await page.evaluate(
  (storageKey) => { localStorage.removeItem(storageKey); }, 
  'theKey'
);

Add a breakpoint

There are two execution context:

  • node.js (running the test code)
  • and the browser (running application code)

Timeout

If you are going to play with breakpoint, you need to change the timeout accordingly.

In a test file, as jest is available as a global object.

jest.setTimeout(100000);

It will be use in every invocation with the setTimeOut function.

Node breakpoint

  • Start the browser with a GUI
const browser = await puppeteer.launch({
    headless: false,
    slowMo: 250, // slowdown by 250 ms
    });
  • Set a breakpoint in your IDE and step over each puppeteer step (open, click,…)

Browser breakpoint

  • The browser should be start with the devtool
const browser = await puppeteer.launch({devtools: true});
  • Add a breakpoint
await page.evaluate(() => {debugger;});

Select

<div class="tweet">
    <div class="retweet">10</div>
</div>
/**
* @type {import("puppeteer").ElementHandle<HTMLDivElement>}
*/
const tweetHandle = await page.$('.tweet .retweet');
expect(await tweetHandle.evaluate(node => node.innerText)).toBe('10');

Debug

https://developers.google.com/web/tools/puppeteer/debugging

Documentation / Reference





Discover More
Card Puncher Data Processing
Application - Download

download is when a remote resource from an application is saved on the local file system and not shown. upload is when a file on your local file system is saved into the remote application. If you...
Browser
Chrome DevTool protocol (CDP)

The is a API that permits to call browsers implementing the CDP api (chrome of course but also any other browser implementation ) via json RPC. The protocol is used to communicate with Chrome and drive...
Browser
Headless Chrome

is a way to run the Chrome browser in a headless mode (ie without the UI, you don't see the screen, it's a server mode) The Chrome Debugging Protocol is an API that permits to control Chrome (or any...
Javascript - Jest-puppeteer with typescript configuration

How to install and configure puppeteer with Jest and Typescript. custom-example-without-jest-puppeteer-preset You...
Speed Index Distribution
Lighthouse

GoogleChrome/lighthouselighthouse - a tool for auditing an app for PWA features and checking your app meets a respectable bar for web performance under emulated mobile conditions. can emulate a Nexus...
Browser
Puppeteer - How to pass back and forth a date (or a complex type) to the headless browser via the evaluate function

A step by step guide that shows how to serialize and deserialize an object with a date ( ) when using the puppeteer evaluate...
Browser
Web - Headless browser (Test automation)

A headless browser is an application/library that emulates a web browser but without a graphical user interface ie (without DOM / without the Web api) They are the basis to build a web bot. Build...
Web - Prerendering / Snapshoting (Dynamic to Static Web Site Generation)

Prerendering is a web static generator method that will take a dynamic website and turn it into a static web application. You then: don't need a server. improve the page load The website (called...
Page Loading Key Moment
Web Page - Painting

Painting is the last step of the rendering phase for a page load. This phase takes the box model tree created during the layout rendering phase and positions each pixels accordingly to the screen. ...



Share this page:
Follow us:
Task Runner