Speechki's Approach to Audiobook Recording

New Member Spotlight – Speechki

Have you heard anything about Siberia? Many people think that it is ice, snow, bears, prisons, and nothing else. But this is not the case. Dima Abramov, co-founder and CEO of Speechki, is from Siberia - born and bred. There he grew up, got an education, and, together with a partner, created the innovative startup Speechki, which records audiobooks using artificial intelligence. Currently, Speechki is based in the US and has recently joined the APA, so we want to share what Speechki is and what their artificial intelligence brings to the market. Just as he has always known that there is much more to Siberia than its desolate reputation, Dima has now set out to shatter some of the equally inaccurate myths that surround artificial voices.

APA: What is Speechki, and how does it work?

Dima: Speechki is an audiobook recording platform for publishers to upscale their audiobook inventories by several times, thanks to AI synthetic voices. We provide over one hundred ultra-realistic voices in 33 languages. A book is recorded in 15 minutes and can be fine-tuned by a proof-listener in a few hours.

English Sample

Spanish Sample

Probably many will say, oh, here are some more programmers who want to get rid of people, to destroy the profession of the audiobook narrator. But we're talking about something completely different.

Now publishers spend thousands of dollars and at least a few weeks to produce a single audiobook. This process is slow, expensive, and highly complex. Publishers' unit economy doesn't work with traditional methods of audiobook production. As a result, publishers miss 95% of potential yearly producible unique titles. Speechki can drop the cost to only $400 and a couple of days instead.

APA: Will that put human voice actors out of work?

Dima: Absolutely not! And that's not our goal. Speechki aims to create a new revenue stream for publishers by converting all their idle copyrights into profitable audiobooks using AI synthetic voices. We also want to give readers the opportunity to find all the books they want and consume them in the audio format. Before Speechki, only five percent of published works had been released as audiobooks.

We don't want to replace human voices. Nothing can replace the experience of listening to a fantastic actor reading a book. And nothing will stop publishers from hiring them to do it, either!

But we want to open opportunities for listeners. If you are vision-impaired or disabled, or if you simply prefer hearing audiobooks over reading text, your content opportunities are currently severely limited. We aim to fix that.

APA: Isn't that a difficult task? How do you do it?

Dima: In pursuit of this goal, Speechki is solving the issues that working efficiently with synthetic voices presents. In 2019, Speechki spent 120 hours of human labor to record one 8-hour audiobook. Now it's just 12 hours. The next goal is to reduce this number to 40 minutes soon.

Speechki supports over 30 languages, including English, Spanish, German, French, Portuguese, and more. The catalog for recording audiobooks has over 100 neural voices, suitable for different genres. We have already produced over 700 audiobooks in 9 languages over the past year.

And the value is in the automatic text formatting, processing, and voice diversity. The formatting is different for different voices because each neural voice works on its own model and has its own requirements. All text-to-speech platforms require making Speech synthesis marking and using APIs by professional developers. But Speechki provides a simple interface like Google Docs or Microsoft Word. The system makes changes itself without showing the source code to the proofer.


APA: Tell us why you decided to start Speechki.

Dima: The Speechki service was created in 2019 in Omsk – one of the biggest cities in Siberia -- by Sergey Baranov and me, who have been working together for the last 12 years and had built a previous software development company. Being audiobook fans, we often couldn't find the books we needed in audio format.

We realized that we did not need our own neural voice engine. Every year the most common speech solutions do up to four jumps in speech quality, as measured by focus groups. Each update of their Text-to-Speech engines improves us automatically. Speechki is sticking to a partnership strategy with Microsoft, Amazon, Google, IBM, Yandex, and others. We use these guys' R&D results and make them much better for audiobook production. Speechki is focused on controlling the best available voices.

APA: Any interesting/fun facts we should know about you/your company?

Dima: As I said, we've recorded more than 700 audiobooks over the past year. Yeah, 700 audiobooks, that's not much, but what if I said they were recorded by only 3 employees? Without any professional recording equipment or speakers, just using a browser, keyboard and mouse.

APA: What are you hoping to gain from your APA membership?

Dima: The main reason we applied to be in the APA is that we, as a service provider, want to be very close with publishers -- to remain in full contact. We want to build the conversation and to understand what they need. The better we understand the publishing industry's needs, the better we can tailor our service to meet them.

To learn more about audiobook recording with AI synthetic voices and Speechki, feel free to contact Dima directly: dima@speechki.org or visit the Speechki website.

Previous
Previous

July 2021 News Roundup

Next
Next

June 2021 News Roundup