I see a demand for some kind of guide for changing ingame voices through AI, so I decided to describe the process I used in creating my mods.
Beforehand, I want to apologize for english-speaking users if you find some mistakes in this article, as English isn't my native language.

Intro

First of all, there're 2 fundamentally different approaches to use AI in this case: speech-to-speech (or STS) and text-to-speech (or TTS). During first one AI takes an existing recorded voice and alters it with a voice model, usually trained on some other voice. During second one AI takes some text and tries to voice it using a voice model. In this article I'll cover both ways applied to my Female Vasco Voice mods, which has several voice types, created using both of this techniques.
Also, there're many different AI TTS and STS instruments you can use, I'll cover only few I used in my mods, but progress doesn't stop and sooner or later these will become obsolete.

Tools mentioned in this guide:



Method 1. STS

TL;DR:
1. Extract .wem files with BAE
2. Transform them with RVC
3. Convert all to .wem and put to \Data


This method is the easiest one and allows you to partially keep many of the traits in original voice, such as intonation and accent. In some kind it melds two voices together - original one and one from voice model.

Prerequisites:

  • Register on Audiokinetic site and download Audiokinetic Launcher
  • Start Audiokinetic Launcher (named Wwise Launcher after instalation) and on Wwise tab install Wwise, you can unckeck all plugins

  • Install RVC, I recommend downloading and extracting a complete package from releases page
  • Download RVC models from Hugging Face or AI Hub discord channel (find it in Google) or any other place. You can also train your own, but that's for another guide.
  • Place .pth voice model in RVC/weights folder and .index file in RVC/logs folder.

  • Install PowerToys, we'll need PowerRename tool from it.

Steps:

1. Unpack the necessary audio files from "sound/voice/starfield.esm" folders inside the "Starfield - Voices01.ba2", "Starfield - Voices02.ba2" and "Starfield - VoicesPatch.ba2" archives, you can use Bethesda Archive Extractor for this. For example, Vasco's voice lines are located in "sound\voice\starfield.esm\robotmodelavasco" directory, and Sarah's in "sound\voice\starfield.esm\npcfsarahmorgan". All game audio is presented in .wem format.

2. Convert unpacked .wem files to .ogg using Wwise Audio Unpacker (put all .wem in "Game Files" folder and launch "WEM to OGG.bat", the resulting files will be placed into the "Result" folder).

3. Launch RVC-WebUI, on "Model Inference" tab select voice model in the "Inferencing voice" list, then scroll to the bottom and adjust the settings:
• Transpose: leave at 0 for most cases or change the number to change pitch; I don't recommend to go beyond -6 to 6, better change your voice model.
• Specify output folder: self-explanatory.
• Extraction algorithm: I recommend using rmvpe
• Median filtering: default one is ok, but test it for yourself
• Index path: select .index file which was bundled with the voice model
• Search feature ratio: for Vasco I set this to 0, for human characters to 0.75, but you can find your sweet spot yourself.
• Resample: 0
• Volume envelope scaling: higher values help to mask oddities on altering quiet voice, whispering, sighs, breathing etc. Set this either to 0.25 if original voice lines have small amount of those or 0.75 otherwise.
• Protect voiceless consonants: set to 0
• Audio folder to be processed: the folder with your .ogg files from previous step
• Export file format: wav
Then click Convert button on the bottom of the page and wait. The speed will depend on GPU you have.

4. Convert .wav files to .wem format. This is kinda complicated to setup for the first time, so I will break this down:
• Launch Wwise through Wwise Launcher, create new project with any name, untick everything in "Import Factory Assets"
• Go to Project > Project Settings > Source Settings and set "Default Conversion Settings" to "Vorbis Quality High".
That's unrelated to this guide, but please note that though all dialogue files seem to be Vorbis, some sound files must be encoded as PCM.
The rule is simple, keep the same encoding as original .wem file had.

• Go to Project > Import Audio files > Add Folders and add folder with your .wav files from step 3 and click Import. If Import Conflict Manager pops out, go to "%USERPROFILE%\Documents\WwiseProjects\your-project-name\Originals\SFX" and delete all inside of it, then repeat the step.
• Go to Project > Convert All Audio Files and click "Convert". The resulting files will be in "%USERPROFILE%\Documents\WwiseProjects\your-project-name\.cache\Windows\SFX\name-of-import-folder"

5. Go to output folder in explorer, right-click on empty space and launch PowerRename, rename all files so their names are identical to the original ones, e.g. "0047cc70.wem"

6. Place all the files in the same directory structure as the original ones in .ba2 archives, for example "Starfield\Data\sound\voice\starfield.esm\<your-modded-npc-name>".

That's all, you just created your own AI voice mod. Congratulations!


Method 2. TTS

TL;DR:
1. Extract dialogues with xTranslator
2. Voice them with AI tool of your choice
3. Find missing voice files and transcribe vanilla versions with OpenAI Whisper
4. Repeat step 2
5. Convert all to .wem and put to \Data



This is a trickier one. The advantage of this approach is also its disadvantage: you'll get a completely new voice with unique intonations, speed of speaking, accent and everything else. In some cases, like my mod for Vasco, the benefits may outweigh the drawbacks for some, as many people didn't like original Vasco's retrofuturism-esque way of speaking, but in the same time we lose al of the original voice actor's performance, which quite rarely can be outperformed by an AI, at least now.

Also, the process isn't so straight-forward as in the STS method, as you will see, so I'll have to skip some minor moments. To perform this method, you may need some basic coding knowledge.

So, let's start:

  • Launch xTranslator and go to Options > Dictionaries and languages > Source Language: en, Destination Language: en
  • Go to File > Load Esp/Esm and open Starfield.esm
  • Click NPC/Fuz Map.
  • Go to File > Export Translation > XML files, check Export Fuz Data and Everything and click OK

Now you got an XML file with all (not really) dialogues in the game and corresponding .wem files. Next I suggest you to transform it to JSON with similar structure:
{
"audio.wem": "Blah-blah-blah.",
"someotheraudio.wem": "Whatever."
}

You can do this by using Python script kindly provided by ice9000, he also included already converted JSON files for some NPCs. My own ready to use JSON with Vasco's dialogues is available here (transcription by Whisper described later is included).

Next step is to voice the dialogues with the TTS tool of your choice. I recommend using ElevenLabs as it produces the best results at the moment, even though it's paywalled under a subscription if you want to generate more than 10000 characters. Some free alternatives which can run locally on your PC are: Tortoise, Bark, Silero. Also, Edge-tts is not bad, it uses Microsoft's online TTS services, but I don't know about its free use rate limits.
If you're sticking with ElevenLabs, let's continue.

  • Install Python from official site
  • Install ElevenLabs Python API by executing pip install elevenlabs in terminal.

Here you may use my Python scripts. If you do, open ElevenLabs.py with notepad and fill in your API key and the name of the voice model you're planning to use, and optionally adjust generation settings.

  • If you're happy with your JSON dictionary and script, run it with python ElevenLabs.py and wait, my script by default puts the generated .wav files in "elevanlabs-output" folder. If you get connection problems try to increase wait time between requests at the very last line of the script.

TA-DAH! All voice lines are generated! Or are they?

While making a Female Vasco Voice 2 (TTS) mod I noticed, that xTranslator exported 933 replicas for Vasco, while in reality there're 2052 of them in game files... Before Creation Kit comes out I don't know how to check if all of them are really used in the game, but at least bunch of them, the ones that are responsible for Vasco naming you by your character's name, are absent from xTranslator's export. So, here I had to use OpenAI Whisper to transcribe remaining .wem files. Of course, the ones which contains only "Captain Sexy" (sexy.wem) or "Captain Boobies" (boobies.wem, real filenames, by the way!) don't need this, just create a JSON with filenames, append "Captain" before names and you're good. You can get a list of all files in a folder by running  dir /b /a-d command in terminal.

Anyway, you'll have to convert .wem files to .ogg like described in previous section, and then go on.

  • Install OpenAI Whisper by executing  pip install openai-whisper in terminal
  • Put files needed to be transcribed into the "whisper-input" folder next to Whisper.py
  • Run the transcription be executing python Whisper.py  in terminal, that will generate whisper_transcription.json file
  • Repeat previous steps to generate voices, but don't forget to adjust ElevenLabs.py to use whisper_transcription.json

When you finally generated all the voice files, just convert them to .wem like described in first method, and that's pretty much it. Hope you like the result you got after these 6 hours of struggling.

Extra

After transforming/generating voices with AI and before conversion to .wem you might want to do some post-processing, I recommend using free Audacity or Adobe Audition to do so. Robotic voice can be achieved with some echo/chorus/flanger, muffling with EQing down upper and lower frequencies. That's not an audio processing guide, so seek help on YouTube.

Protip: you can use vgmstream plugin for foobar2000 to listen .wem files without converting them.
Also, vgmstream-cli is a good alternative for converting from .wem files, as it can convert both Vorbis and PCM data. You can find my wrapper Python script for it here.

Don't forget to ask voice actors for a permission for using their voice! Well, it technically isn't totally "their" voice, and of course, there's a grey area in copyright laws regarding AI, but it would be a display of good manners to do so. Also, Nexus policy is that your mod will be deleted if author of the voice writes a complaint, keep that in mind.

Big thanks to Nojioh for his assistance with extracting dialogues, check his Vasco Japanese Female Voice mod.

Also, check my profile for my own AI voice mods done with the described techniques.

And of course, go make some awesome mod!



Please, link this article if it helped you to make your mod. Not necessary, but highly appreciated.



If you want to support my work you can send me a tip on boosty.to

Article information

Added on

Edited on

Written by

63OR63

25 comments

  1. shibbyy05
    shibbyy05
    • premium
    • 0 kudos
    Appreciate this guide man!

    My Starfield playthrough now has the following cast :D

    Sarah - Jennifer English (Shadowheart from BG3)
    Barret - Morgan Freeman
    Andreja - Ariana Grande
    Sam Coe - Brad Pitt
    Mateo - Jack Black
    Walter - Troy Baker (Joel Miller from Last of Us)
    VASCO - Steve Carell

    What a time to be alive lol

    1. JChristopherson
      JChristopherson
      • supporter
      • 1 kudos
      ya mind putting those up? lol it sounds amazing! If not totally get it but the jack black one has me sold!
    2. Sebelleun
      Sebelleun
      • member
      • 19 kudos
      Would love to see it in action lol
    3. atatassault
      atatassault
      • premium
      • 3 kudos
      Honestly, you should've left Walter Stroud as Armin Shimmerman. Having already played a greedy Capitalist, Quark, he's perfect for the role.
    4. IchigoMait
      IchigoMait
      • member
      • 11 kudos
      They can't publicly share it. Like this guide creator had to take down two mods, because the voice actor didn't like her voice likeness being used.
    5. scottyus1
      scottyus1
      • premium
      • 114 kudos
      LMAO what a cast. I'd also love to see a video of this.
  2. seshperankh
    seshperankh
    • member
    • 0 kudos
    Thank you for this. I am going to give it a try.  I REALLY want Sarah to lose the British accent.  Ideally I prefer she either have no accent or slight Asain.  Hopefu;;y I cant make this work. 
  3. sethdurden21
    sethdurden21
    • supporter
    • 0 kudos
    Thank you so much for this ! I also wonder how could we add voices to the player's lines of dialogue ? I know it's been done in skyrim but how could we do this ? 
    1. nikdbeli
      nikdbeli
      • supporter
      • 0 kudos
      I'd very much like to know too!
  4. nikdbeli
    nikdbeli
    • supporter
    • 0 kudos
    Hey have you tried this TTS method with the PROTAGONIST ?

    That would be my biggest and only need for this? To AI voice the text responses in dialogue
  5. Delhmur
    Delhmur
    • premium
    • 2 kudos
    Is it normal for the conversion process to take 24+ hours to only convert 300-ish out of 4.6K files? Asking for a friend.
  6. ZhateckyGus
    ZhateckyGus
    • member
    • 11 kudos
    Hey, great Tutorial!

    I`ve created an alternative one on how to train your model using XVATrainer.

    What do you think?
  7. nikish3
    nikish3
    • member
    • 0 kudos
    So, let's start:Launch xTranslator and go to Options > Dictionaries and languages > Source Language: en, Destination Language: enGo to File > Load Esp/Esm and open Starfield.esmClick NPC/Fuz Map.Go to File > Export Translation > XML files, check Export Fuz Data and Everything and click OKNow you got an XML file with all (not really) dialogues in the game and corresponding .wem files. Next I suggest you to transform it to JSON with similar structure:{"audio.wem": "Blah-blah-blah.","someotheraudio.wem": "Whatever."}


    help with this point, it doesn’t work, I receive a file but there is no data on the NPC so that I can run it through the script, post a video on how to do it, I’m already getting hysterical with this, please help, I can’t sleep because of this
  8. JordZord
    JordZord
    • member
    • 9 kudos
    My question is how do you batch generate the converted voices and make it work in the game? You essentially need  thousands of files and with the right naming as well, but modders already did this in Cyberpunk even though V has over ten thousands of voice files. So there's  a way i dont know about here.
  9. Brandoman
    Brandoman
    • premium
    • 482 kudos
    How do you even launch RVC?  Which exe launches it?  There's literally 74 exe files in the full package.
    1. santichrist
      santichrist
      • supporter
      • 2 kudos
      if you youtube any videos on rvc they explain you have to click on go-web.bat 
  10. santichrist
    santichrist
    • supporter
    • 2 kudos
    Thank you for this guide, I was able to convert Alejandra to Wattson from Apex Legends and Sam Coe to Arthur Morgna, been having a lot of fun messing with the voice mods and what seemed to be daunting and out of my abilities wasnt really that hard only time consuming with this guide helping me out 

    edit: I forgot to mention that its worth pointing out you will need a free license to use wwise otherwise you need to pay for it, without a license you can only convert 200 files and all the main characters have thousands of lines that need to be converted. you can get a free license by applying on their site for one