Automation

How to Scrape Dynamic Websites with Python and Playwright (2025 Guide)

BeautifulSoup failing to grab data? Learn how to use Python and Playwright to scrape dynamic JavaScript-heavy websites like a pro.

Md. Rony Ahmed · 3 min read

How to Scrape Dynamic Websites with Python and Playwright (2025 Guide)

Why BeautifulSoup Isn't Enough

If you've followed my [Web Scraping Basics guide](/posts/web-scraping-basics), you know how to grab HTML. But modern sites use React and Vue.js, meaning the data loads after the page opens.

In this tutorial, we will fix that using Playwright.

Prerequisites

First, install the necessary libraries:

pip install playwright
playwright install

The Basic Script: Loading a Dynamic Page

Here is a simple script to launch a browser, navigate to a site, wait for the content to load, and grab the page title.

Note: We set headless=False so you can visually see the browser working

from playwright.sync_api import sync_playwright

def run():
    with sync_playwright() as p:
        # Launch browser (headless=False lets you see the action)
        browser = p.chromium.launch(headless=False)
        page = browser.new_page()
        
        # Go to the target website
        page.goto("[https://example.com](https://example.com)")

        # Critical: Wait for a specific element (like an h1 or a data table) to ensure the page is ready
        page.wait_for_selector("h1")

        print(f"Page Title: {page.title()}")
        
        # Close the browser
        browser.close()

if __name__ == "__main__":
    run()

Handling Infinite Scroll

Many modern sites (like Twitter/X or Instagram) use "Infinite Scroll," where new content loads only as you scroll down. If you just load the page and scrape immediately, you'll miss most of the data.

Here are two ways to handle this.

Method 1: The Quick Scroll (Fixed Distance)
This method is useful when you just need to trigger a few load events quickly. Note that we need to use page.evaluate to run JavaScript inside the browser.

# ... inside your run() function ...

# Scroll down 3000 pixels
page.evaluate("window.scrollBy(0, 3000)")

# Optional: Wait a moment for new content to load
page.wait_for_timeout(2000) 

print('Simple scroll completed')

Method 2: The "Human-Like" Scroll (Best for Anti-Bot Protection)
If a website has strict anti-bot measures, scrolling instantly to the bottom is a red flag. It is better to scroll in small, random chunks with slight pauses, mimicking a real human user.

You will need to import random and time for this to work.

import time
import random

def random_scroll(page, base_scroll=300):
    """
    Scrolls down the page in random increments to mimic human behavior.
    """
    total_height = 0
    # Get the total height of the page
    max_scroll_height = page.evaluate('document.body.scrollHeight')
    
    # Continue scrolling until we reach the bottom
    while total_height < max_scroll_height:
        # Randomize scroll amount (e.g., between 300 and 450 pixels)
        scroll_step = random.randint(base_scroll, base_scroll + 150)
        
        # Scroll down by the calculated step
        page.evaluate(f"window.scrollBy(0, {scroll_step})")
        total_height += scroll_step
        
        # Add a small random delay (0.5 to 1.5 seconds) to act like a human
        sleep_time = random.uniform(0.5, 1.5)
        time.sleep(sleep_time)

    print("Reached the bottom of the page.")

Conclusion

By combining Playwright's browser automation with smart scrolling techniques, you can now scrape data that BeautifulSoup can't even see. This opens up the ability to scrape complex Single Page Applications (SPAs) and social media feeds.