Automating Word Count Display on a Static Blog with Python + Git Hooks

May 15, 20255 min read

Why I Wanted This

While working on my personal blog hosted on GitHub Pages, I had a simple idea: what if each post could show its word count automatically? I didn’t want to do this manually or hard-code anything — it felt like something a script should handle. And more importantly, I wanted it to update automatically every time I made a commit.

This turned out to be a fun mini-project that taught me more about Git hooks, Python scripting, encoding quirks, and working around GitHub Pages’ static limitations.

The Setup I Ended Up With

portfolio/
├── blogs/                        # My HTML blog posts
│   ├── post1.html
│   └── post2.html
├── data/
│   └── blog-word-count.json      # Auto-generated word count file
├── scripts/
│   └── pre-commit.ps1            # PowerShell script to update the word count
├── generate_word_counts.py       # Python script to scan all blog files
└── .git/
    └── hooks/
        └── pre-commit            # Git hook that triggers pre-commit.ps1

Writing the Python Word Counter

The heart of the setup was a Python script that loops through all HTML files in my blogs/ folder, extracts the visible text using BeautifulSoup, counts the words, and writes the result to a JSON file. I made sure it was flexible enough to work for any file name.

I ran into a surprising bug early on: I had added emoji icons like 📘 to the terminal output to make the logs cuter. Turns out Windows’ default terminal encoding (cp1252) couldn’t handle those — which completely broke the script in Git pre-commit context. That was my first lesson: keep hooks ASCII-safe.

Once that was fixed, the script worked flawlessly.

Automating with PowerShell + Git Hook

Next, I wanted to automate running the script before every commit. I initially tried putting the script directly in .git/hooks/, but realized that Git ignores anything inside .git/ when cloning or sharing repos. So I moved the logic to a tracked scripts/pre-commit.ps1 file and added a simple shell shim in .git/hooks/pre-commit to call it:

#!/bin/sh
powershell -ExecutionPolicy Bypass -File "scripts/pre-commit.ps1"

The PowerShell script looks like this:

Write-Host "Running generate_word_counts.py..."

$process = Start-Process -FilePath "python" -ArgumentList "generate_word_counts.py" -NoNewWindow -Wait -PassThru

if ($process.ExitCode -ne 0) {
    Write-Host "Python script failed. Aborting commit."
    exit 1
}

git add data/blog-word-count.json

Write-Host "Word count JSON updated and staged."

Once this was hooked up correctly, committing felt magical — Git ran my script, updated the JSON, and staged it for me.

Displaying the Word Count on My Site

To actually use the data, I added this simple JS snippet to my HTML pages:

<p>This post has <span id="post1">...</span> words.</p>

<script>
fetch('data/blog-word-count.json')
  .then(res => res.json())
  .then(data => {
    for (const key in data) {
      const el = document.getElementById(key);
      if (el) {
        el.textContent = data[key] + " words";
      }
    }
  });
</script>

Now each post updates its word count dynamically when deployed.

A Few Gotchas Along the Way

  1. Unicode Crashes in Terminal Output: Using emojis in Python print() statements caused UnicodeEncodeError. The fix was replacing them with ASCII-safe tags.
  2. Git Couldn't Find My Script: Initially used a relative path that assumed the hook was executing from a different location. Fixed by referencing scripts/pre-commit.ps1 directly.

What I Learned

  • Git hooks are powerful, but picky about paths and encoding
  • Don’t use emojis in anything that runs through Git hooks on Windows
  • You can still build dynamic features on static sites — if you pre-process the data
  • It's way more satisfying to automate something small and have it just work every time

Further Reading

If you're interested in learning more about quadtrees and spatial partitioning: