Automating Word Count Display on a Static Blog with Python + Git Hooks
Why I Wanted This
While working on my personal blog hosted on GitHub Pages, I had a simple idea: what if each post could show its word count automatically? I didn’t want to do this manually or hard-code anything — it felt like something a script should handle. And more importantly, I wanted it to update automatically every time I made a commit.
This turned out to be a fun mini-project that taught me more about Git hooks, Python scripting, encoding quirks, and working around GitHub Pages’ static limitations.
The Setup I Ended Up With
portfolio/
├── blogs/ # My HTML blog posts
│ ├── post1.html
│ └── post2.html
├── data/
│ └── blog-word-count.json # Auto-generated word count file
├── scripts/
│ └── pre-commit.ps1 # PowerShell script to update the word count
├── generate_word_counts.py # Python script to scan all blog files
└── .git/
└── hooks/
└── pre-commit # Git hook that triggers pre-commit.ps1
Writing the Python Word Counter
The heart of the setup was a Python script that loops through all HTML files in my
blogs/
folder, extracts the visible text using BeautifulSoup, counts the words, and
writes the result to a JSON file. I made sure it was flexible enough to work for any file name.
I ran into a surprising bug early on: I had added emoji icons like 📘 to the terminal output to make the logs cuter. Turns out Windows’ default terminal encoding (cp1252) couldn’t handle those — which completely broke the script in Git pre-commit context. That was my first lesson: keep hooks ASCII-safe.
Once that was fixed, the script worked flawlessly.
Automating with PowerShell + Git Hook
Next, I wanted to automate running the script before every commit. I initially tried putting the
script directly in .git/hooks/
, but realized that Git ignores anything inside
.git/
when cloning or sharing repos. So I moved the logic to a tracked
scripts/pre-commit.ps1
file and added a simple shell shim in
.git/hooks/pre-commit
to call it:
#!/bin/sh
powershell -ExecutionPolicy Bypass -File "scripts/pre-commit.ps1"
The PowerShell script looks like this:
Write-Host "Running generate_word_counts.py..."
$process = Start-Process -FilePath "python" -ArgumentList "generate_word_counts.py" -NoNewWindow -Wait -PassThru
if ($process.ExitCode -ne 0) {
Write-Host "Python script failed. Aborting commit."
exit 1
}
git add data/blog-word-count.json
Write-Host "Word count JSON updated and staged."
Once this was hooked up correctly, committing felt magical — Git ran my script, updated the JSON, and staged it for me.
Displaying the Word Count on My Site
To actually use the data, I added this simple JS snippet to my HTML pages:
<p>This post has <span id="post1">...</span> words.</p>
<script>
fetch('data/blog-word-count.json')
.then(res => res.json())
.then(data => {
for (const key in data) {
const el = document.getElementById(key);
if (el) {
el.textContent = data[key] + " words";
}
}
});
</script>
Now each post updates its word count dynamically when deployed.
A Few Gotchas Along the Way
- Unicode Crashes in Terminal Output: Using emojis in Python
print()
statements causedUnicodeEncodeError
. The fix was replacing them with ASCII-safe tags. - Git Couldn't Find My Script: Initially used a relative path that assumed the
hook was executing from a different location. Fixed by referencing
scripts/pre-commit.ps1
directly.
What I Learned
- Git hooks are powerful, but picky about paths and encoding
- Don’t use emojis in anything that runs through Git hooks on Windows
- You can still build dynamic features on static sites — if you pre-process the data
- It's way more satisfying to automate something small and have it just work every time
Further Reading
If you're interested in learning more about quadtrees and spatial partitioning:
- Wikipedia: Quadtree
- Check out my GitHub for implementation examples