Automating Word Count Display on a Static Blog with Python + Git Hooks

May 15, 2025 • • 5 min read

Why I Wanted This

While working on my personal blog hosted on GitHub Pages, I had a simple idea: what if each post could show its word count automatically? I didn’t want to do this manually or hard-code anything — it felt like something a script should handle. And more importantly, I wanted it to update automatically every time I made a commit.

This turned out to be a fun mini-project that taught me more about Git hooks, Python scripting, encoding quirks, and working around GitHub Pages’ static limitations.

The Setup I Ended Up With

portfolio/
├── blogs/                        # My HTML blog posts
│   ├── post1.html
│   └── post2.html
├── data/
│   └── blog-word-count.json      # Auto-generated word count file
├── scripts/
│   └── pre-commit.ps1            # PowerShell script to update the word count
├── generate_word_counts.py       # Python script to scan all blog files
└── .git/
    └── hooks/
        └── pre-commit            # Git hook that triggers pre-commit.ps1

Writing the Python Word Counter

The heart of the setup was a Python script that loops through all HTML files in my blogs/ folder, extracts the visible text using BeautifulSoup, counts the words, and writes the result to a JSON file. I made sure it was flexible enough to work for any file name.

I ran into a surprising bug early on: I had added emoji icons like 📘 to the terminal output to make the logs cuter. Turns out Windows’ default terminal encoding (cp1252) couldn’t handle those — which completely broke the script in Git pre-commit context. That was my first lesson: keep hooks ASCII-safe.

Once that was fixed, the script worked flawlessly.

Automating with PowerShell + Git Hook

Next, I wanted to automate running the script before every commit. I initially tried putting the script directly in .git/hooks/, but realized that Git ignores anything inside .git/ when cloning or sharing repos. So I moved the logic to a tracked scripts/pre-commit.ps1 file and added a simple shell shim in .git/hooks/pre-commit to call it:

#!/bin/sh
powershell -ExecutionPolicy Bypass -File "scripts/pre-commit.ps1"

The PowerShell script looks like this:

Write-Host "Running generate_word_counts.py..."

$process = Start-Process -FilePath "python" -ArgumentList "generate_word_counts.py" -NoNewWindow -Wait -PassThru

if ($process.ExitCode -ne 0) {
    Write-Host "Python script failed. Aborting commit."
    exit 1
}

git add data/blog-word-count.json

Write-Host "Word count JSON updated and staged."

Once this was hooked up correctly, committing felt magical — Git ran my script, updated the JSON, and staged it for me.

Displaying the Word Count on My Site

To actually use the data, I added this simple JS snippet to my HTML pages:

<p>This post has <span id="post1">...</span> words.</p>

<script>
fetch('data/blog-word-count.json')
  .then(res => res.json())
  .then(data => {
    for (const key in data) {
      const el = document.getElementById(key);
      if (el) {
        el.textContent = data[key] + " words";
      }
    }
  });
</script>

Now each post updates its word count dynamically when deployed.

A Few Gotchas Along the Way

Unicode Crashes in Terminal Output: Using emojis in Python print() statements caused UnicodeEncodeError. The fix was replacing them with ASCII-safe tags.
Git Couldn't Find My Script: Initially used a relative path that assumed the hook was executing from a different location. Fixed by referencing scripts/pre-commit.ps1 directly.

What I Learned

Git hooks are powerful, but picky about paths and encoding
Don’t use emojis in anything that runs through Git hooks on Windows
You can still build dynamic features on static sites — if you pre-process the data
It's way more satisfying to automate something small and have it just work every time