How to Scrape Dynamic Websites with Python and Playwright (2025 Guide)
BeautifulSoup failing to grab data? Learn how to use Python and Playwright to scrape dynamic JavaScript-heavy websites like a pro.
Md. Rony Ahmed
ยท 3 min read
Why BeautifulSoup Isn't Enough
If you've followed my [Web Scraping Basics guide](/posts/web-scraping-basics), you know how to grab HTML. But modern sites use React and Vue.js, meaning the data loads after the page opens.
In this tutorial, we will fix that using Playwright.
Prerequisites
First, install the necessary libraries:
pip install playwright
playwright install
The Basic Script: Loading a Dynamic Page
Here is a simple script to launch a browser, navigate to a site, wait for the content to load, and grab the page title.
Note: We set headless=False so you can visually see the browser working
from playwright.sync_api import sync_playwright
def run():
with sync_playwright() as p:
# Launch browser (headless=False lets you see the action)
browser = p.chromium.launch(headless=False)
page = browser.new_page()
# Go to the target website
page.goto("[https://example.com](https://example.com)")
# Critical: Wait for a specific element (like an h1 or a data table) to ensure the page is ready
page.wait_for_selector("h1")
print(f"Page Title: {page.title()}")
# Close the browser
browser.close()
if __name__ == "__main__":
run()
Handling Infinite Scroll
Many modern sites (like Twitter/X or Instagram) use "Infinite Scroll," where new content loads only as you scroll down. If you just load the page and scrape immediately, you'll miss most of the data.
Here are two ways to handle this.
Method 1: The Quick Scroll (Fixed Distance)
This method is useful when you just need to trigger a few load events quickly. Note that we need to use page.evaluate to run JavaScript inside the browser.
# ... inside your run() function ...
# Scroll down 3000 pixels
page.evaluate("window.scrollBy(0, 3000)")
# Optional: Wait a moment for new content to load
page.wait_for_timeout(2000)
print('Simple scroll completed')
Method 2: The "Human-Like" Scroll (Best for Anti-Bot Protection)
If a website has strict anti-bot measures, scrolling instantly to the bottom is a red flag. It is better to scroll in small, random chunks with slight pauses, mimicking a real human user.
You will need to import random and time for this to work.
import time
import random
def random_scroll(page, base_scroll=300):
"""
Scrolls down the page in random increments to mimic human behavior.
"""
total_height = 0
# Get the total height of the page
max_scroll_height = page.evaluate('document.body.scrollHeight')
# Continue scrolling until we reach the bottom
while total_height < max_scroll_height:
# Randomize scroll amount (e.g., between 300 and 450 pixels)
scroll_step = random.randint(base_scroll, base_scroll + 150)
# Scroll down by the calculated step
page.evaluate(f"window.scrollBy(0, {scroll_step})")
total_height += scroll_step
# Add a small random delay (0.5 to 1.5 seconds) to act like a human
sleep_time = random.uniform(0.5, 1.5)
time.sleep(sleep_time)
print("Reached the bottom of the page.")
Conclusion
By combining Playwright's browser automation with smart scrolling techniques, you can now scrape data that BeautifulSoup can't even see. This opens up the ability to scrape complex Single Page Applications (SPAs) and social media feeds.