January 2025
Artificial Intelligence (AI) has driven much discussion lately, so it shouldn't come as a surprise that AI voice generation has found its way into audiobook creation. Amazon (at the time of this writing) has a beta program, Google Play Books offers a free service (more on this later), and a plethora of companies provide easy-to-use (relatively speaking) software for content producers.
So of course I had to give it a try.
As long as it wasn't expensive. Or better yet, free. The Holy Grail for the cash-conscious self-published writer.
My original interest in AI voice generation was to create a "Podcast" short story as part of my At Trails End series. My books tend to be dialogue-driven, so I thought it fun getting the main character Paul Matthews behind the mic. And that became the free podcast story The Dirt At Trails End. While exploring AI tools, I also experimented with creating AI voice narrative generation for the three At Trails End book series to be available, initially, on Google Play Books.
It was an interesting, thought-provoking, and at times frustrating journey, and one I wanted to share. By no means an in-depth evaluation of the myriad products available, since that would take months to accomplish, instead a discussion on the approach I took.
The first challenge was sifting through the impressive variety of AI generative software, ranging from free to darn expensive. When selecting a product, whether a dishwasher, car, home, or piece of software, you'll be most happy if you understand what you need, and don't need. Every feature has a cost, or learning curve, or both. To create my podcast audio files and audiobooks I came up with this short list:
Was free, or relatively low cost (create everything for less than a few hundred)
Text-to-speech capability
Multiple voices in a single audio file
Ability to create and manage long-form content
Offered a wide array of voices to choose from
Required no programming
I started my research with the low-hanging fruit of the software market; the free stuff. Right in my price range. There are several free products accessed through a web browser or as a downloaded software package. I have a Windows-based machine but prefer using my Chromebook, which means sticking with something accessed through Google Chrome (web browser). Installed software typically runs faster, but it also means dealing with software incompatibilities. Yech. I dealt enough with that during my tech consulting days. I'll stick with web-based. On to the next requirement.
The need for text-to-speech seems obvious since I want an audio file of my written content. However, other products also offered speech-to-text, converting audio files to text for things like captioning, or providing advanced video editing. I don't need that, don't want to learn it, and certainly don't want to pay for it. Searching for products marketed for creating audiobooks helped narrow down the list. And I still had a couple of free ones to consider.
But I quickly ran afoul of the next two requirements.
Multiple voices. For my podcast story, I wanted to replicate a host/interviewee dialogue which meant having at least two distinct voices in the same audio file. That quickly narrowed down the list. Many products, free and paid, only allow selecting one voice for the entire recording. They have many voices to choose from, but you can only select one. That still left a few free options for a narrated audiobook (single voice) but then came the next item.
Long-form content. I didn't even know that was a thing until I started working with generative AI. Some products are restricted to what I consider "sound bites"; content such as announcements, short instructions, captions, or whatever. Not the 50,000+ words expected in a book. And if you have long content, you need tools to manage and edit it. That requirement dropped most of the free stuff. In fact, most of the paid software works on a subscription model (pay by the month or year) and this feature often requires a higher subscription level (more money).
Next came voices. This is the fascinating part of AI. I think of the early days of generated voices, those robotic monotone personas that lacked emotion and distinction. There was little chance listeners would confuse these robotic voices with the actual thing. AI is changing that, creating human-like expressive voices in multiple styles, languages, genders, ethnicities, and age characteristics. They can originate from voice actors, your recorded voice, audio clips, or be completely computer generated. The more sophisticated/pricey the software package, the more unique voices provided, as well as the ability to modify or create your own AI voice. The key aspect here is that the AI voices convey emotions by understanding the actual content of your text. Or at least they try. The random nature of AI voice generation is both interesting and frustrating.
My last requirement was this process had to be simple. No programming. I wanted the ability to create audio files without resorting to code snippets of SSML (Speech Synthesis Markup Language). A few products had advanced capabilities more suited to video game creators with embedded audio. Way too complicated. I wanted to bring my content into something that looked like a text editor, select voices, experiment, tweak if necessary, save my stuff, and produce audio files. Easy peasy.
The Results
After trying software such as NaturalReader, Murf.AI, and Speechify, I settled on Elevenlabs. It met all my requirements and the price was reasonable. They offer a subscription plan and based on the features I wanted, the Creator package at around $22/month fits the bill. This gave me the tools I wanted, the license to create commercial products, and the ability to convert 100 minutes of audio (100,000 characters, referred to as credits) per month. Plus the promise I can cancel at any time. As with most AI generative products, they have a free plan that will at least allow you to try it out.
For the moment I'm only using it for my fictional At Trails End podcast. The Projects feature was the big selling point, which ironically, was the one thing I couldn't try out with the free version. It handles long-form content, allowing you to upload a document and break it into chapters (or they can be added manually), which you then generate into separate audio files or one single file. In my case, the chapters are episodes that are edited and then generated individually.
The interface is simple, maybe a tad too much. I wished it had more text editing features such as search and replace. But assigning voices is a breeze. You can pick a default voice for the entire chapter, or since the software identifies paragraphs, a specific voice for each paragraph.
And they offer a ton of voices to choose from with filters based on style, language, gender, and age. Some of these voices come from Elevenlabs themselves, others are community contributions. One thing to be mindful is that some of the latter have interesting restrictions. They may only be available for a certain period of time. They may consume more credits when used, meaning it'll cost more to use them. And they may be Moderated, which I still don't quite understand. Don't like the ones offered? You can always clone a voice based on a supplied audio clip, including your own. I do like hearing myself talk, just not enough to put in the work to make it happen.
The challenge with selecting a voice for my podcast was that I knew what it should sound like. When writing I hear those voices in my head (versus hearing voices in my head. Different problem altogether) and it drove me nuts trying to find a good match. It came down to selecting voices for each episode that differed enough to be distinctive, if not perfect.
Again, the fascinating aspect of AI generation is the interpretive nature of the technology. I found myself editing text based on how the particular voices selected "reacted" to my script. In other words, not always as expected. Elevenlabs advises users to include descriptive language to assist the AI engine, such as "he said angrily". I'll try that when converting my ebook to an audiobook, but since my podcast doesn't contain any "he said/she said", I'm still in the discovery phase of influencing the AI engine.
If you'd like more information on AI audiobooks, author Derek Murphy offers great advice and insight at Free AI Audiobooks Narration: ElevenLabs vs Audible Virtual Voice.
And then there's Google Play Books with their AI auto-narrated audiobooks offering. For free. And it checks all my boxes. You can alter the voice type, pronunciation, and narration speed. Their text editor has more of the features we'd expect, including search and replace functionality, and it's all web-based. So what's not to love?
There are two rather important caveats to using Google Play Books when creating an audiobook:
Your ebook must be published on Google Play Books and you must own the audio rights
It must be published in the epub format
That first issue is a kicker for me. I've published my books on Amazon and am very happy with the platform, process, and ease of use. I wish the Kindle Create program were web-based rather than an installed software package, but I've never had any issues converting my book to the Kindle format, even considering I'm starting with a Google Doc. But I haven't offered my books on other platforms because they're also enrolled in the Amazon KDP Select Program (aka Kindle Unlimited), which grants Amazon exclusive rights to my books. I have no problem giving away that right, it's my decision after all, and it's only for 90 days, renewable with my agreement. But I can't publish them on Google Play Books until the 90-day period expires. Of course, I can always decide later to unpublish them from Google Play Books and re-enroll them on KDP Select.
The second point about the EPUB format seemed, at first, to be a non-issue. Unlike other document formats such as docx (Microsoft), gdoc (Google Docs), or PDF (Adobe), epub is an open standard that's been around for years and is supported by a wide range of e-readers. I've never had a reason to use it, but converting my Google Doc document was as simple as clicking Download and choosing EPUB. But, alas, it was not that simple. I was not happy to find my new EPUB document without a table of contents, and the title format was garbled. A quick internet search informed me that such issues were common. I fixed the format issue, removed any traces of a TOC, and prayed there weren't other errors. Seems the process would be error-free since I'm on a Google Chromebook, using Google Docs, and loading the epub file Google generated into Google Play Books (notice all the Google's?).
For now, I'm still experimenting with Google's AI product. As my books age out of the KDP Select program, I'll publish them on Google Play Books to try out the auto-narrated audiobooks, and ultimately decide whether to keep them on Google or revert back to KDP Select.
I'll keep my website updated with the progress.
February 2025
Two of my books have now been converted to auto-narrated audio files using Google Play's AI tool. Not surprisingly, it was easier than ElevenLabs simply because there were many fewer options to choose from. ElevenLabs offers a vast pool of voices, including the ability to quickly create a new voice based on characteristics you supply, such as gender and age, while Google provides only a short list of narration voices to work with. But it does make for a fast process. One aspect of the interface I found particularly helpful was the ability to right-click on a word and make any necessary changes to pronunciation. As I mentioned, most of the work required to create an auto-narrated book on Google Play was getting my file into a clean EPUB format.
But I'm not sure when I'll get to the last book. I received feedback from readers disappointed that my books weren't available through the Kindle Unlimited option, so I waffled on the decision to pull them out of the KDP Select program. They're back in, which means they're no longer available on Google Play. Decisions, decisions, decisions.
Once again, after my books finish their 90-day KDP enrollment period, I'll reconsider publishing them on Google Play. The AI landscape is changing rapidly, so there's no telling what new features and capabilities will bubble up. Perhaps even Amazon will make auto-narration widely available.
April 2025
I had been waiting for an opportunity to try Amazon's AI-narrated offering, and it finally happened. A few weeks ago, I was invited to participate in the Beta program for their Audible audiobooks with virtual voice.
Once again, my use of Google Play's auto-narration is on the back burner since I'll keep my books in the KDP Select program. Doing so allows both the ebook and audiobook to be available for Kindle Unlimited subscribers and Audible's Premium and Premium Plus subscribers.
The process for creating an audiobook was a breeze -- but that's all I'll mention for now. Amazon is still in the Beta stage and has asked authors to refrain from discussing specifics. As time goes on, I'll update this page with my experience.
May 2025
So, Google is back on the front burner after interesting enhancements were made to their auto-narrated audiobooks. Features so good that I decided to let my books age out of Kindle Unlimited and publish them on Google Play in addition to Amazon. The one big improvement I was excited about? The ability to add multiple voices to a single audiobook. I didn't go nuts with it, only adding three or four voices for particular passages that, in the ebook version, were designated using a different font. Such as the reading of a newspaper article. Easy to do, and fun.
Which also meant revisiting issues with the EPUB versions of my book. To add free auto-narration to your ebook, it must be published on Google Play AND have a version in the EPUB format. Saving my Google Doc to an EPUB format has been problematic and requires editing. But the solution turned out to be an easy one. While making changes to my Amazon KDP versions using Kindle Create, I had forgotten that I could save them in both KDP and EPUB format. And that EPUB version came out clean without any format issues. And it was nice to know that my Amazon and Google Play editions came from the same source (yes, I did manage to confuse myself at one point, having multiple copies of the same book in different formats).
But if you do want to generate an EPUB file from a Google Doc, I found the following edits necessary:
Remove the Table of Contents (TOC). It has something to do with the EPUB version being used. I'd expect that at some point this won't be an issue. For me, not a big concern, since this is a fiction novel, and most readers don't jump from chapter to chapter. Still don't like it, and it would be a real problem for non-fiction books.
Location of page breaks. For some reason, some of my page breaks went missing, squashing pages together. I finally realized that page breaks at the end of a line were being ignored. I placed them on a new line, and everyone was happy. If you have problems, go to View, and toggle "Show non-printing characters" to locate the offenders.
Watch out for Title and Subtitle. I'm still not sure what happened here, and I lost interest trying to figure it out. Sometimes, though not always, text using Title and Subtitle styles came out wonky. I'd switch them to normal style, and all was good.