Main Website
Web Scraping
Web Scraping
Updated on
March 25, 2024

Convert HTML to PDF Using Puppeteer

Converting web pages into PDFs, especially for things like invoices, detailed reports and tables, is super important. It’s all about making sure the stuff you see in a web page looks just right when turned into a PDF.

Puppeteer is like a magic tool for making this happen. It works with Node.js and empowers developers to effortlessly navigate web pages and convert HTML into PDFs. In this article, our focus will be on Puppeteer and Node.js, showcasing how this powerful combination simplifies the process for invoices, reports and tables.

Jump straight to code examples:

Choosing Puppeteer as PDF generating package

When it comes to converting HTML to PDF using Node.js, Puppeteer emerges as an okay choice. While there are better tools out there like the HTML to PDF plugin for Node.js, you can also use Puppeteer and limit the amount of dependencies, incorporating PDF generation directly using Puppeteer.

Main use case of Puppeteer for PDF generation is taking advantage of its headless browser automation. Puppeteer is built on top of the Chrome browser's DevTools Protocol, enabling headless browser automation. This allows Puppeteer to render and interact with web pages, making it well-suited for tasks like HTML to PDF conversion directly from the target URL. However, it is limited in its native support for HTML and PDF manipulation.

HTML to PDF conversion methods

In this section, we’ll explore methods of converting HTML to PDF using Puppeteer and Node.js. Each method offers unique features and advantages, allowing you to choose the most suitable method based on your specific requirements.

Method 1: Generating PDF from a web page using URL

One practical use case of Puppeteer is generating a PDF directly from a web page using its URL. This method is particularly useful when you want to capture the content of a webpage and save it as a PDF without having to render the page in a headless browser. This method advantages are:

  • Direct Extraction: Allows direct extraction of the PDF from the specified URL without rendering the page in a browser.
  • Time Efficiency: Can be more time-efficient compared to loading the page in a headless browser, especially for scenarios where rendering is not necessary.
  • Ideal for Static Pages: Suitable for static web pages or when dynamic content rendering is not a requirement.

Here’s a simple code example:


const puppeteer = require("puppeteer");

async function downloadPdfFromUrl(url, outputPath) {
  const browser = await puppeteer.launch({ headless: "new" });
  const page = await browser.newPage();

  // Navigate to the specified URL
  await page.goto(url, { waitUntil: "networkidle0" });

  // Generate PDF from the page content
  await page.pdf({ path: outputPath, format: "A4" });

  // Close the browser
  await browser.close();
}

const targetUrl = "https://www.webshare.io/blog/what-are-datacenter-proxies";
const outputFile = "downloaded_page.pdf";

downloadPdfFromUrl(targetUrl, outputFile)
  .then(() => console.log(`PDF downloaded successfully at: ${outputFile}`))
  .catch((error) => console.error("Error:", error));

Here's what the code does:

  • Navigate to URL: The page.goto() function is used to navigate to the specified URL. The { waitUntil: 'networkidle0' } option ensures the page is considered loaded when there are no network connections for at least 500 milliseconds.
  • Generate PDF: The page.pdf() function is used to generate a PDF from the page content. The PDF is saved at the specified output path outputPath in A4 format.
  • Close the Browser: The browser.close() function is called to close the Puppeteer browser instance.

You can run the code and see the result as shown below:

Here's the generated PDF file:

Common use case of this method is invoice URL generation to PDF. Invoices are crucial documents in business transactions, and converting them to PDFs is a common requirement for archival, sharing, or printing purposes. The PDF format ensures that the invoice maintains a consistent appearance across different devices and platforms.

Suppose you have an invoice hosted on a website like Stripe, and you want to generate a PDF from its URL using Puppeteer. Below is a code example demonstrating this scenario:


const puppeteer = require("puppeteer");

async function generateInvoicePdfFromUrl(invoiceUrl, outputPath) {
  const browser = await puppeteer.launch({ headless: "new" });
  const page = await browser.newPage();

  // Navigate to the specified invoice URL
  await page.goto(invoiceUrl, { waitUntil: "networkidle0" });

  // Generate PDF from the invoice page content
  await page.pdf({ path: outputPath, format: "A4" });

  // Close the browser
  await browser.close();
}

const invoiceUrl = "https://b.stripecdn.com/docs-statics-srv/assets/hosted-invoice-page.46a27a6f0e9fee330cde9bdb884dce68.png";
const outputFile = "invoice.pdf";

generateInvoicePdfFromUrl(invoiceUrl, outputFile)
  .then(() =>
    console.log(`Invoice PDF generated successfully at: ${outputFile}`)
  )
  .catch((error) => console.error("Error:", error));

Here’s the generated PDF:

Method 2: Generating PDF from an HTML file

In this method, Puppeteer is employed to generate a PDF directly from an HTML file. This is beneficial when you have a pre-existing HTML file that you want to convert into a PDF without navigating to a live web page. Puppeteer can seamlessly render the HTML file and generate a PDF based on its content. This method advantages are:

  • Offline Processing: Ideal for scenarios where the HTML content is available locally, eliminating the need to fetch content from a live URL.
  • Batch Processing: Suitable for batch processing multiple HTML files to generate corresponding PDFs.
  • Custom Styling: Allows customization of the PDF output based on the styling and structure of the provided HTML file.

Suppose you have an HTML file named template.html with the below provided content.


<!DOCTYPE html>
<html>
<head>
    <title>HTML content</title>
</head>
<body>
    <h1>Sample</h1>
    <div>
        <p>
        </p><ul>
            <li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
            <li>Integer interdum felis nec orci mattis, ac dignissim mauris commodo.</li>
        </ul>
        <p></p>
        <p>
        </p><ul>
            <li>In et augue non turpis faucibus tincidunt a et lectus.</li>
            <li>Nulla congue nisi vel diam hendrerit, at pulvinar massa aliquam.</li>
        </ul>
        <p></p>
    </div>

    <h1>Ipsum Paragraphs</h1>
    <div>
        <p>
            Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sit amet magna turpis. Donec a tellus in mi pharetra volutpat at et nulla. Aenean porttitor fringilla diam et pretium. Fusce id velit mauris. Aenean ultrices orci dolor, sed tristique eros molestie eget. Fusce non ultrices odio. Sed nisi ex, porttitor non fermentum eu, rutrum quis mauris. Morbi scelerisque sollicitudin semper. Nunc vitae pharetra tortor, vel gravida ante. Integer euismod velit nisi, quis sollicitudin neque dictum nec. Morbi magna nulla, scelerisque a malesuada at, scelerisque at quam. Aliquam sit amet lorem congue, pellentesque metus non, aliquet purus. Integer a metus augue. Ut venenatis cursus ante, sed venenatis quam consequat id. Fusce rhoncus elementum felis, eu volutpat magna lacinia id. Proin ac sagittis nulla, a molestie turpis.
        </p>
    </div>
</body>

</html>

Here’s a code example that uses Puppeteer to convert this HTML file into a PDF:


const puppeteer = require("puppeteer");
const path = require("path");

async function downloadPdfFromHtmlFile(htmlFilePath, outputPath) {
  const browser = await puppeteer.launch({ headless: "new" });
  const page = await browser.newPage();

  // Load HTML content from the file
  const absolutePath = path.resolve(htmlFilePath);
  await page.goto(`file://${absolutePath}`, { waitUntil: "networkidle0" });

  // Generate PDF from the page content
  await page.pdf({ path: outputPath, format: "A4" });

  // Close the browser
  await browser.close();
}
const inputHtmlFile = "template.html";
const outputFile = "downloaded_from_html.pdf";

downloadPdfFromHtmlFile(inputHtmlFile, outputFile)
  .then(() => console.log(`PDF downloaded successfully at: ${outputFile}`))
  .catch((error) => console.error("Error:", error));

Here's what the code does:

  • Load HTML from File: The page.goto() function is used to load HTML content from the specified local file.
  • Generate PDF: The page.pdf() function is used to generate a PDF from the loaded HTML content. The PDF is saved at the specified output path in A4 format.
  • Close the Browser: The browser.close() function is called to close the Puppeteer browser instance.

Output

You can run the code and see the output:

Here's the sample PDF output from our HTML:

Common use case of this method is report generation from HTML to PDF. In case you do not have an HTML file ready, Excel table format is often used for reporting. You can use a tool like Table Convert to generate an HTML from your Excel table. When you have it ready, just replace the <html> content with your generated table HTML code. Alternatively, instead of pasting HTML directly in the code, you can setup a placeholder HTML file and reference to it, using previous code examples.


const puppeteer = require("puppeteer");

async function generateReportPdf() {
  const browser = await puppeteer.launch({ headless: "new" });
  const page = await browser.newPage();

  // Load HTML content for the report
  const reportHtml = `
   
<html>
<head>
  <title>Sample Report</title>
  <style>
    body {
      font-family: Arial, sans-serif;
      margin: 20px;
    }
    h1 {
      color: #333;
    }
    p {
      color: #555;
      margin-bottom: 10px;
    }
    table {
      width: 100%;
      border-collapse: collapse;
      margin-top: 20px;
    }
    th, td {
      border: 1px solid #ddd;
      padding: 8px;
      text-align: left;
    }
    th {
      background-color: #f2f2f2;
    }
  </style>
</head>
<body>

  <h1>Monthly Sales Report</h1>

  <p>Date: January 10, 2024</p>

  <table>
    <thead>
      <tr>
        <th>Product</th>
        <th>Units Sold</th>
        <th>Revenue</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Product A</td>
        <td>150</td>
        <td>$5,000</td>
      </tr>
      <tr>
        <td>Product B</td>
        <td>120</td>
        <td>$4,000</td>
      </tr>
      <tr>
        <td>Product C</td>
        <td>200</td>
        <td>$6,500</td>
      </tr>
    </tbody>
  </table>

  <p>Total Revenue: $15,500</p>

</body>
</html>

  `;
  await page.setContent(reportHtml);

  // Generate PDF for the report
  await page.pdf({ path: "report.pdf", format: "A4" });

  await browser.close();
}

// Call the function to generate a report PDF
generateReportPdf();

Here’s the generated report:

Styling tips

Let's explore styling tips and enhancements for HTML to PDF conversion using Puppeteer.

CSS styling considerations

Styling plays a crucial role in ensuring that the PDF output looks polished and meets specific design requirements. Here are some considerations:

While Puppeteer supports both inline styles and external stylesheets, inline styles are often more straightforward for PDF generation.

Consider embedding styles directly within the HTML using the <style> tag for simplicity.


<style>
 body {
 font-family: 'Arial', sans-serif;
 }
 h1 {
 color: #333;
 }
 /* ... (additional styles) ... */
</style>

For more advanced styling techniques, consider exploring the following resources:

  • CSS Tricks: CSS Tricks is a comprehensive resource with articles and guides on various CSS techniques. You can explore their tips on responsive design, flexbox and grid layout.
  • MDN Web Docs - CSS: The MDN Web Docs is an excellent reference for CSS properties and values. Dive into their CSS documentation for in-depth information.
  • Google Fonts: Google Fonts offers a wide selection of free and open-source fonts. Choose fonts that are not only pleasing but also supported in PDF rendering.

Conclusion

In this article, we covered two handy methods for turning HTML into PDFs with Puppeteer and Node.js. These methods are crucial for creating polished documents like invoices, reports and tables. Further, we delved into styling tips that empower developers to enhance the visual appeal of their PDFs, ensuring a seamless blend of precision and elegance in their document generation process.

Related Articles

How to Get HTML in Puppeteer?

How to Take Screenshot in Puppeteer: Complete Guide

Downloading Images in Puppeteer: 6 Methods Explained