IN THIS ARTICLE

Comparison of page.evaluate() and page.$eval() methods.

Extract product titles

Extract product prices

Common errors in Get text From Element in Puppeteer

Updated on

April 24, 2024

Get Text From Element in Puppeteer: Examples

Puppeteer is a Node library for browser automation. As a browser automation tool, you can use it for automated testing, and scraping web data even from dynamically loaded sites. In the context of web scraping, especially when you need to perform Puppeteer get element text operations, it's often necessary to pull out text from elements like paragraphs (), spans (), and divisions (<div>), among others. To accomplish this, Puppeteer provides several methods, two of which are given below.

First, page.evaluate() is a method that runs custom JavaScript within the browser context. This means that whatever code you could run in the browser's console, you can execute it on the page through this function. When extracting data from web elements, page.evaluate() can be used to access any element's textContent or innerText properties directly.

Second, page.$eval() is a more specialized method that combines querying for an element and executing a function against that element. It's a shorthand for selecting an element with a CSS selector and then extracting its text content in one go.

Both these functions are invaluable for web scraping and automation. page.evaluate() offers broad JavaScript execution capabilities, while page.$eval() streamlines the process of targeting and extracting text from specific elements.

In this article, we will discuss several scenarios where you would need to extract text and how to do it with Puppeteer using the methods mentioned above.

Extract text by element’s class

To extract the text of a certain element, we need a way such as element IDs to distinguish it from others. Usually, the class attribute is used to group elements that share a similar style. Therefore, in web scraping, using class to identify a group of elements that share a common characteristic is a common practice. In this section, we will show you how to extract text by an element's class.

Extracting Text from Elements

and <div> are one of the most common elements used in a web page. They are often used as containers to hold other elements. As our first example, we will show you how to extract text from elements.

Using Puppeteer's page.$eval() function, we can easily extract a single element text from elements on a webpage. However, it's important to note that page.$eval() targets only the first element that matches the specified class. Here's a brief example to illustrate this behavior.

Note: To understand this code, you might want to read more about Get Element in Puppeteer.


const puppeteer = require('puppeteer');

(async () => {

  // Using element const to define browser

  const browser = await puppeteer.launch();

  // 'const element' used to define a new page

  const page = await browser.newPage();

  // Using the await element to go to the Puppeteer web page

  await page.goto('https://pptr.dev/');

  // Using the const text to store innetText of the span element. 

  const text = await page.$eval('.token', span => span.innerText);

  console.log('Text from the first .token:', text);

  // Closing the browser with 'await browser'

  await browser.close();

})();

This code snippet highlights that page.$eval() returns the inner text of the first element matching the class (.token). It's a quick way to access specific text content when you're interested in the first occurrence of an element with a particular class.

If you need to grab text from every element with the "token" class on a webpage, Puppeteer's page.$$eval() is the tool for the job. Unlike page.$eval() which only fetches text from the first match, page.$$eval() lets us collect text from all matching elements. Check out this example to see it in action.


const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://pptr.dev/');

  // Get text from all spans with the class "token"
  const textsFromTokens = await page.$$eval('.token', spans => spans.map(span => span.innerText));
  console.log('Texts from all .token:', textsFromTokens);

  await browser.close();
})();

Let’s look at another example where we extract the data from atag as tags are commonly used to wrap information. For this example, we will consider tags in the webshare.io/blog webpage.


const puppeteer = require('puppeteer');

(async () => {

  const browser = await puppeteer.launch();

  const page = await browser.newPage();

  await page.goto('https://webshare.io/blog');

  // Using page.$eval to get text from the first  with class "caps_menu"

  const textFromFirstP = await page.$eval('p.caps_menu', p => p.innerText);

  console.log('Text from the first p.caps_menu:', textFromFirstP);

  // Using page.$$eval to get text from all 
 with class "caps_menu"

  const textsFromAllP = await page.$$eval('p.caps_menu', ps => ps.map(p => p.innerText));

  console.log('Texts from all p.caps_menu:', textsFromAllP);

  // Using page.evaluate to run a custom script for  with class "caps_menu"

  const customTextExtraction = await page.evaluate(() => {

    const paragraphs = Array.from(document.querySelectorAll('p.caps_menu'));

    return paragraphs.map(p => p.innerText);

  });

  console.log('Custom text extraction for p.caps_menu:', customTextExtraction);

  await browser.close();

})();

In the script, three methods are used to pull text from elements with the class caps_menu. page.$eval grabs the text from the first matching element. For all matching elements, page.$$eval is used to retrieve their texts. And page.evaluate runs a custom script to do the same, offering more control over the JavaScript execution within the page context. As you can see each method has its use.

Note: If you are scraping, we highly recommend using a Puppeteer proxy to prevent getting blocked.

Comparison of page.evaluate() and page.$eval() methods.

In Puppeteer, page.evaluate() and page.$eval() do share the purpose of executing code within the context of the page. However, the scope of their operation and their use cases differ. The page.evaluate() method is a general-purpose tool for running scripts in the context of the page itself, allowing interaction with any accessible elements or variables defined in the page's environment. It's not necessarily faster than page.$eval(), as performance depends on the nature of the task rather than the method itself.

page.$eval(), on the other hand, is a more focused function. It requires a CSS selector and executes the provided function on the first element that matches this selector. It's particularly useful when you need to perform an action or retrieve information from a specific element, as it automatically handles the query selection.

Extract all heading elements

We will do more examples to understand and practice extracting text from various elements. In this example, we will extract all heading elements.


const puppeteer = require('puppeteer');

(async () => {

    const browser = await puppeteer.launch();

    const page = await browser.newPage();

    await page.goto('https://www.webshare.io/blog');

    // Extract text from all heading elements (h1-h6)

    const headings = await page.evaluate(() => {

        return Array.from(document.querySelectorAll('h1, h2, h3, h4, h5, h6'))

                    .map(heading => heading.innerText);

    });

    console.log('All Headings:', headings);

    await browser.close();

})();

Here is the output of the console.

In this Puppeteer script, we're focusing on extracting text from all heading elements (h1 through h6) on the "webshare.io/blog" page. The page.evaluate() method is used to run JavaScript within the page's context, selecting all heading elements and mapping their inner text to an array. This array is then logged to the console, showing all the headings found on the page.

Extract product titles

Extracting product titles from e-commerce websites like Etsy is important for market research, competitive analysis, and price monitoring. It helps businesses understand market trends, compare products, and strategize pricing. In this example, we'll focus on extracting product titles from the "Weddings" category on Etsy, where product titles are displayed in h3 headings with the class v2-listing-card__title.


const puppeteer = require('puppeteer');

(async () => {

    const browser = await puppeteer.launch();

    const page = await browser.newPage();

    await page.goto('https://www.etsy.com/c/weddings');

    // Extract product titles from the page

    const productTitles = await page.evaluate(() => {

        return Array.from(document.querySelectorAll('h3.v2-listing-card__title'))

                    .map(title => title.innerText.trim());

    });

    console.log('Product Titles:', productTitles);

    await browser.close();

})();

Extract product prices

After extracting product titles, the next logical step for a market research analysis is retrieving product prices. For this purpose, we will continue our previous example on the Etsy "Weddings" category page. Here's how you can extract product prices on Etsy using Puppeteer:


const puppeteer = require('puppeteer');

(async () => {

    const browser = await puppeteer.launch();

    const page = await browser.newPage();

    await page.goto('https://www.etsy.com/c/weddings');

    // Extract product prices from the page

    const productPrices = await page.evaluate(() => {

        return Array.from(document.querySelectorAll('span.currency-value'))

                    .map(price => price.innerText.trim());

    });

    console.log('Product Prices:', productPrices);

    await browser.close();

})();

Once you run the above script, you will see a list of prices on your console output.

However, when you try to scrape popular sites like Etsy, you might be blocked by anti-scraper measure. To find solutions to them you might want to read about Puppeteer Stealth mode.

Common errors in Get text From Element in Puppeteer

Element Not Found: This error occurs when Puppeteer attempts to extract text from an element that does not exist or has not yet loaded on the page.

Solution - For error handling, use Puppeteer's page.waitForSelector() to wait for the element to be present before attempting to extract text.

Timeout Error: A timeout error may happen if Puppeteer takes too long to find the element or if the page takes too long to load.

Solution - Increase the timeout setting using page.waitForSelector(selector, {timeout: 10000}) to give more time for elements to appear.

Incorrect Use of Asynchronous Operations: Puppeteer operations are asynchronous, and incorrect handling of promises or await/async can lead to issues.

Solution: Ensure all Puppeteer commands, especially those that return promises (like page.evaluate(), page.$eval(), etc.), are properly awaited or handled with .then().

Conclusion

This article explored how to extract text from elements in the Puppeteer web automation tool. We covered specific tasks such as extracting text by an element’s class, retrieving all heading elements, and extracting product titles, and product prices. We further discussed Puppeteer's page.evaluate() and page.$eval() functions. Additionally, we addressed common challenges encountered in web scraping, including difficulties in locating elements and handling slow page loads. Through understanding these tools and potential issues, users can effectively navigate and extract data from web pages using Puppeteer.

How to Scrape Websites Using Puppeteer?

How to Use Puppeteer Stealth For Advanced Scraping?

How to Get HTML in Puppeteer?