Site Reliability Engineering Audiobook
3 Oct 2025Recently I wanted to revisit some chapters from the great SRE Book while on the go. Unfortunately, I was not able to find an audio
version of it. Below is the description how one could make it oneself.
Read Aloud
Easiest solution to listen to this book on a mobile phone would be to open it in Chrome and then click Listen to this article. That works, but has some drawbacks:
- Book is published as multiple separate pages, need to switch them manually
- No bookmarks, listening position gets reset when you have interruption for a couple of days
- Audio stops after some time when you turn off the screen. Need to keep screen on
- Need internet connection
That could be fixed by @Voice Aloud Reader. Unfortunately its actual voice quality is worse than in Chrome.
Another options would be Elevenreader and Speechify:
- Quality is much better than in Chrome
- Limits to 2h/week, and internet connection is required
Ok, let’s see how hard it is to get a “good old” mp3 files for an audiobook player.
Model
At the time of this writing, top TTS model at huggingface is Kokoro-82M

It does not have “voice cloning” as in XTTS2. But it is still fully self-hosted and of a better audio quality, as shown by TTS-Spaces-Arena:

Also, it has a docker distribution via remsky/Kokoro-FastAPI, so could be started as simple as:
docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu
Then you can open web UI at http://localhost:8880/web/ and check the output quality on some text examples. Which sounds pretty great!
But how to get the input, i.e text from a bunch of html pages?
HTML to Text
Let’s start with Table of Contents:
https://sre.google/workbook/table-of-contents/
Need to parse this page, and get title
and url
for each chapter. For example, with something like this:
def getLinks(url: str) -> List[tuple]:
res = []
base = '/'.join(url.split('/')[0:3])
page = requests.get(url).content
content = BeautifulSoup(page, 'html.parser').find(id="content")
for link in content.find_all('a'):
ref = link.get('href', '')
if ref.startswith('/'):
ref = base + ref # make absolute link
if ref == url:
continue # link to TOC itself
res.append((link.text.strip(), ref))
return res
Now for each url, we need to get page content as a text. Let’s also drop first <h1>
tag with name of the chapter. Because we already know it from the Table of Contents. (And I prefer announcement like “Chapter 6 - Eliminating Toil”, instead of just “Eliminating Toil” from the page itself)
def getText(url: str) -> str:
page = requests.get(url).content
content = BeautifulSoup(page, 'html.parser').find(id="content")
content.find('h1').extract() # remove first H1, as it is the same as link title
return content.text
And that’s it!
Now let’s stitch that together, and save each chapter as a separate file named {num}-{title}.mp3
.
Another thing which I want to add, is a chapter title announcement in the beginning with a short pause afterwards. Unfortunately, there is no tags support in Kokoro. Quick solution would be to use ;-,
chars, while proper solution is to actually generate silence.
So, here is the full version of a quick scraper:
import requests
from typing import List
from bs4 import BeautifulSoup
def getLinks(url: str) -> List[tuple]:
res = []
base = '/'.join(url.split('/')[0:3])
page = requests.get(url).content
content = BeautifulSoup(page, 'html.parser').find(id="content")
for link in content.find_all('a'):
if (ref := link.get('href', '')) == '':
continue
if ref.startswith('/'):
ref = base + ref
if ref == url:
continue
res.append((link.text.strip(), ref))
return res
def getText(url: str) -> str:
page = requests.get(url).content
soup = BeautifulSoup(page, 'html.parser').find(id="content")
soup.find('h1').extract() # remove first H1, as it the same as link title
return soup.text
def generate(input, filename: str):
response = requests.post(
"http://localhost:8880/v1/audio/speech",
json={
"model": "kokoro",
"input": input,
"voice": "am_michael(2)+am_santa(1)",
"response_format": "mp3", # Supported: mp3, wav, opus, flac
"speed": 1.0,
"normalization_options": {
"normalize": False
}
}
)
with open(filename, "wb") as f:
f.write(response.content)
if __name__ == "__main__":
i = 0
for link in getLinks("https://sre.google/workbook/table-of-contents/"):
print(link[0])
text = f"{link[0]}.\n;-,;-,;-,;-,;-,;-\n\n" # chapter announcement
text += getText(link[1])
generate(text, f'{i:03}-{link[0]}.mp3')
i += 1
Performance
Let’s take for example this page: https://sre.google/workbook/foreword-II/
Audio version of it has duration of 12m18s
2m42s
is the audio generation time on Apple M3 CPU, so generation speed is ~5x of a playback speed.4m43s
(~2.6x) on AMD Ryzen 5500, just for comparison.1m15s
(~10x) on Apple M3, by moving from FastAPI to locally running kokoro on CPU.
There is even an option to enable GPU acceleration for Mac viaPYTORCH_ENABLE_MPS_FALLBACK=1
but it does not work, needs mlx.27s
(~27x) on Apple M3 GPU, by running FastAPI locally via./start-gpu_mac.sh
15s
(~50x) on RTX 5080, by running FastAPI GPU docker container in WSL2 (needs sm-120 workaround)11s
(~67x) on RTX 5080, by running kokoro directly in WSL2
Using GPU, resulting times for the whole book:
- “SRE book” total duration is
22h02m
, audio generation time is19m03s
- “SRE workbook” total duration is
17h53m
, audio generation time is16m30s
Alternatives
https://github.com/yl4579/StyleTTS2
https://github.com/matatonic/openedai-speech
https://github.com/p0n1/epub_to_audiobook
https://eamag.me/2025/Voice-Cloning
Even better solution would be to use html-to-markdown and then use some paid stuff like https://www.openai.fm, where you can fine tune tone and emotions. But let’s leave it for non-techical literature)