I get this error, I think it is the configuration of xtts-api-server-mantella.exe 12:30:48.533 TTS: Connecting to XTTS... 12:30:50.575 TTS: Could not connect to XTTS. Attempting to run headless server... “E:/SteamLibrary/steamapps/common/The” is not recognized as an internal or external command, program or executable batch file.
i think is cause you left space in the names of the folders, try naming your folders like this "E:/SteamLibrary/steamapps/common/The_Elder..." instead of "The Elder", or changing the folder where xtts is to something without empty spaces, that cant be recognized by xtts.
This is working perfectly as of today...generating a new voice type right now...
One thing I should mention though: I spend a good amount of time trying to figure out how to get this working with my batch. I know you mentioned in the description:
Your batch files should not be changed, they should still be .wav files. This plugin will take care of the .lip and .fuz conversion externally.
While this is true..certainly...it didn't click to me what you meant. For my fellow boneheads that may be running into an error that mentions that a suitable format can't be found...the above statement refers to the out_path entries in your spreadsheet. Make sure your out_path filenames end in .wav.....NOT .fuz.
Makes sense now....the secondary lip/fuz process requires the .lip AND .wav files in order to create the .fuz...duh.
Other than that little hurdle, this seems to be working fantastically. Really a HUGE timesaver.
Here's a passing suggestion: Many times, I'm making a voice type as a secondary/replacement for existing ones. I've noticed that using my recorded voice usually provides good inflection...so I was thinking that it would be great to be able to use the original dialogue file as the reference for inflection the same way my voice is used when recorded. That scenario would make for much better results, I'd think...at least for dialogue that already exists. Just a cool option that could be used in batch processes for cloning dialogues with different voices. Maybe just an inf_ref added to the spreadsheet...then run it through the same process as if using a recording...that is, if the respective cell isn't empty in the spreadsheet record.
Thanks a million for your work, Dan! xVASynth really is pretty remarkable!
.Fuz files don't work for me. Sample set to 44100hz and 16-bit audio file, creates the .Fuz files but nothing happens in game. Using the latest version of both xVASynth and this mod. Any advice?
You need a VERY specific MS Visual C++ Redistributable from 2010 to get the XWMAencoder to work. The later ones (2015-2022) won't have the MSVCR100.dll needed for the encoder. Look for "Microsoft Visual C++ 2010 Service Pack 1 Redistributable Package". Installing that got the .fuz file generation working when all I was getting was .wav and .lip files.
How i can fix this problem? It gives me this message when i trying to use batch generation
Traceback (most recent call last): File "server.py", line 335, in do_POST File "python\xvapitch\model.py", line 326, in infer_batch File "D:\Modding\XVASynth\.\resources\app\python\xvapitch\xvapitch_model.py", line 223, in infer_advanced return self.infer_using_vals(logger, plugin_manager, cleaned_text, text, lang_embs, speaker_embs, pace, None, None, None, None, None, None, pitch_amp=pitch_amp) File "D:\Modding\XVASynth\.\resources\app\python\xvapitch\xvapitch_model.py", line 292, in infer_using_vals x, x_emb, x_mask = self.text_encoder(input_symbols, x_lengths, lang_emb=None, stats=False, lang_emb_full=lang_emb_full) File "torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "D:\Modding\XVASynth\.\resources\app\python\xvapitch\xvapitch_model.py", line 602, in forward x = torch.cat((x_emb, lang_emb_full), dim=-1) RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 27 but got size 1 for tensor number 1 in the list.
63 comments
12:30:48.533 TTS: Connecting to XTTS...
12:30:50.575 TTS: Could not connect to XTTS. Attempting to run headless server...
“E:/SteamLibrary/steamapps/common/The” is not recognized as an internal or external command,
program or executable batch file.
(I FORGOT TO CHECK MAKE FUZ FILES)
One thing I should mention though: I spend a good amount of time trying to figure out how to get this working with my batch. I know you mentioned in the description:
While this is true..certainly...it didn't click to me what you meant. For my fellow boneheads that may be running into an error that mentions that a suitable format can't be found...the above statement refers to the out_path entries in your spreadsheet. Make sure your out_path filenames end in .wav.....NOT .fuz.
Makes sense now....the secondary lip/fuz process requires the .lip AND .wav files in order to create the .fuz...duh.
Other than that little hurdle, this seems to be working fantastically. Really a HUGE timesaver.
Here's a passing suggestion: Many times, I'm making a voice type as a secondary/replacement for existing ones. I've noticed that using my recorded voice usually provides good inflection...so I was thinking that it would be great to be able to use the original dialogue file as the reference for inflection the same way my voice is used when recorded. That scenario would make for much better results, I'd think...at least for dialogue that already exists. Just a cool option that could be used in batch processes for cloning dialogues with different voices. Maybe just an inf_ref added to the spreadsheet...then run it through the same process as if using a recording...that is, if the respective cell isn't empty in the spreadsheet record.
Thanks a million for your work, Dan! xVASynth really is pretty remarkable!
Look for "Microsoft Visual C++ 2010 Service Pack 1 Redistributable Package". Installing that got the .fuz file generation working when all I was getting was .wav and .lip files.
Any suggestions?
EDIT: I found a solution by adding both FaceWrapper and FonixData file to the plugins folder in Xsynth.
Traceback (most recent call last):
File "server.py", line 335, in do_POST
File "python\xvapitch\model.py", line 326, in infer_batch
File "D:\Modding\XVASynth\.\resources\app\python\xvapitch\xvapitch_model.py", line 223, in infer_advanced
return self.infer_using_vals(logger, plugin_manager, cleaned_text, text, lang_embs, speaker_embs, pace, None, None, None, None, None, None, pitch_amp=pitch_amp)
File "D:\Modding\XVASynth\.\resources\app\python\xvapitch\xvapitch_model.py", line 292, in infer_using_vals
x, x_emb, x_mask = self.text_encoder(input_symbols, x_lengths, lang_emb=None, stats=False, lang_emb_full=lang_emb_full)
File "torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "D:\Modding\XVASynth\.\resources\app\python\xvapitch\xvapitch_model.py", line 602, in forward
x = torch.cat((x_emb, lang_emb_full), dim=-1)
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 27 but got size 1 for tensor number 1 in the list.
Also, some tutorial steb-by-step on how to use it would be welcome :)