A.I. Audiobook Experiment

Why did I create a free short story audiobook?

Click here to hear the story on YouTube: ‘Clocking Time’

Generative art like Midjourney and Stable Diffusion on the web caught my eye a while back. Then, after listening to various podcasts for authors about the exciting developments in text-to-speech technology, I just had to try it out.

Faced with a firehose of content being uploaded to ebook retailers daily, discovery is perhaps the toughest marketing challenge for many unknown authors.

But people can’t read stories if they can’t find them.

I chose YouTube for my audiobook experiment because the platform welcomes long-form content, hosts BookTubers, and has many people looking for audiobooks. (Of course, I know the YT algorithm will not look kindly on my solitary short story audiobook, but that’s a tale for another day.)

I signed up with murf.ai on their basic plan; 2 hours of audio for $19 monthly. That looked good enough to try out their offering and make a call on whether to cancel before the month is up (see end of this post.)

Creating a murf.ai project took only a few clicks. Next, I uploaded my short story (6890 words) as a docx file.


I selected a licensed music track, Dystopia, as a short lead-in for the story because it reflected the darker backdrop of totalitarian rule in a Britain fractured by anthropogenic climate disruption and socio-economic upheaval.

The audio track is embedded in the released .mp3 file and cannot be used separate from this file.

As a complete newbie to text-to-speech, I took my time in checking each paragraph for pronunciation errors.

For example, the system struggled with the correct pronunciation of my surname. I fixed the problem by changing the text spelling from McClure to Macklure and selecting from a list of phonetic approximations. It’s not perfect, but close enough.

Homophones can also trip up the translator software. I found one where it made the ‘wrong’ choice between pronouncing tears, as in “tears rolling down her cheek”, and tears in a piece of clothing. (I wanted the former.)

I downloaded the generated mp3 audio file (52 MB for a 43 mins recording) and then looked up how to convert it into a mpeg video file that I could load into YouTube. I first checked some commercial offerings but settled on shotcut.org - an open source, cross platform, video editor. After a few attempts, I figured out how to align the .mp3 audio track with a book cover static image (created using canva.com according to YouTube’s size guidelines). Finally, I exported the resulting 115 MB mpeg file and uploaded it to YouTube.

And what did I learn from this experiment?

Well, it takes time! I spent about six hours from start to finish (in one day.) Now that I know what to do, I could probably reduce that to around three hours (for an audiobook of only one hour, and with a single narrator.)

Overall, I’m pleased with the final product. Although the character Nina didn’t have many lines of dialogue, I would’ve also chosen a female voice for her part. And a music fade-out would’ve been nice, but maybe next time…

Is this technology “better” than using a professional voice actor? No, I don’t think so. Not yet. An experienced human narrator would add flow and voice into this story that my ‘A.I.’ voice-talent ‘missed’. Still, for a free audiobook, I think it worked out well, and if the technology continues to improve, I foresee it also being used for commercial audiobook products.

Will I continue with my murf.ai subscription? Probably not. I have just over one hour of credit remaining this month and may use that to make one more audiobook short story. However, in my case, applying the WIBBOW test wins out - Would I Be Better Off Writing - and that’s where I intend to spend most of my available creative time in 2023.

