0 of 0

File information

Last updated

Original upload

Created by

NightDocsYT

Uploaded by

NightDocsYT

Virus scan

Safe to use

135 comments

  1. NightDocsYT
    NightDocsYT
    • supporter
    • 12 kudos
    Locked
    Sticky
    Achieving the Best Results Possible When Using AI Voice Cloning

    I've been getting some questions here and there about my method for achieving high accuracy and believability in my voice mods, so if you'd like to try, this is for you.  This guide is going to be ancillary to the documentation provided through links, and this will only be helpful as far as you follow the instructions in both the documentation, supplemented by my personal best practices, if you're trying to replicate my results. Your mileage may vary, and a few specific settings I mention (or don't mention) are worth experimenting with to achieve the best results for your specific project. 

    First and foremost - TRAIN YOUR OWN MODELS. There are lots of models on huggingface ready to go, but almost all of them suck. Really small training datasets and far too little time spent training the models, leading to a really subpar voice clone.

    I CANNOT emphasize this enough...

    Train 👏 your 👏 own 👏 models!!!!!

    Installing Mangio and Python
    Training your own models is by far the biggest time sink, but I promise it pays off if you do it right.

    First install the latest edition of Python and make sure you've set it to PATH in your environment settings for windows (Google it).  Don't worry, you don't need to know Python, it's just a prerequisite to run our program. 

    For voice training, I'm currently using the Mangio fork of RVC v2 [Github]. Click on "releases" on the right side to get the latest version.

    Download the installer for the version that lets you both TRAIN and INFER. It's a very handy web GUI of the RVC v2 model that most people are using currently. Gives you a really handy user interface in a web browser, separated and sectioned off by the steps you need to take throughout the process of training and generating a voice model, instead of having you manually type a series of complicated python commands into an intimidating terminal you don't understand.

    They have a very thorough guide on how to install and navigate. Just follow the instructions. Become familiar with the documentation and follow its instructions precisely. Install, fumble around, and make a bunch of mistakes since you're as impatient as I am and then return to the documentation and then here when things aren't working the way you thought they would because you're so damn smart you can just figure it out.

    Preparing training data
    Use the highest and most consistent quality that you can. No echo, no background noise, as close to podcast quality as you can get. What goes in is what comes out, and if the quality and sound varies, your model will produce extremely inconsistent results which will sound f*#@ing terrible.

    For the Alex Jones mod, my model was trained using ~45 min of training data collected and carefully selected to get as broad of a tonal and energy range as I could get. Included audio from a video where it's just a montage of him screaming, (I cut out the parts that were clipped / distorted),  a video where he's going on a bunch of his famous high-energy rants, and then for a baseline, I downloaded one of the recent InfoWars episodes and ran it through Descript, where I transcribed it so I could identify parts where it's just him speaking and cut out the rest. I also used Descript's feature where it will shorten word gaps. It can remove filler words like "uhh" "you know" etc, but for training a voice model I leave all those in because it's their natural speaking cadence and useful for training.

    I then took all samples intended for training and then compressed, EQ'd, then mastered them in my DAW (I use FL Studio) using plugins like iZotope Neutron Pro and Ozone 11 Pro to maintain a constant gain level. Might be helpful to use a very small amount of noise gate with high release even if you think you have no noise floor in your training audio.

    When following the steps in the interface to bring in your training data, make sure to enable pitch guidance. Even if you're not using it for music, this is the sauce that lets you copy both the speech of the model as well as match the actual performance of the audio you'll eventually run through your completed model.

    For the inferring model, you can choose between pm, harvest, dio, crepe, mangio-crepe, and rvmpe. RVMPE is BY FAR the best model I have used for both training AND inferring. For good results, the more time you give it to train, the better the model will be. Their suggestion in the documentation is that you only need to run it for up to about 50 epochs to get a good result. 

    It's not enough. 

    For Alex Jones, I took the 45 minute training data and ran it for about 300 epochs. This took approximately 8 hours on my computer using an NVIDIA RTX 3060. It takes a long ass time to train these models, so hit "train" when you're about to go to bed and you may be done by the time you wake up. Trust me, the extra time will pay off. It gives the training enough time to work out as many kinks as possible, resulting in as low of a data loss (basically meaning high accuracy I think), and actually using the model takes nowhere near as much time when it's all over.  

    Inferring steps
    Downshifted original voice lines by 4 semi-tones to match the average tonal range of model's average speaking tone. Adjusted formant shift to compensate. If you're adapting a voice with a significant difference in timbre (high / deep, male / female voices), you need to raise or lower the original samples by a few semitones (probably more like an average of 6 for something like a male to female / vice versa inference.

    When you're using the model to process your actual audio, I have personally found that using the setting of 0.2 on feature retrieval ratio yields the smoothest result with very few artifacts. Any setting higher than that I almost always seem to hear artifacts, making the AI-ness immediately apparent and immersion-breaking. There's still a few even in the Alex Jones mod, but it's far fewer than what I'd get otherwise, using a poorly trained model off of huggingface or without having a very fine-tuned training and inferring method.

    Whew!  This ended up way longer than I anticipated, but I hope this helps!

    If you have any questions, consult the documentation, and if there's anything else, feel free to reach out.
  2. NightDocsYT
    NightDocsYT
    • supporter
    • 12 kudos
    Locked
    Sticky
    LMFAAOOOO Alex saw it I'm screaming

    Article on InfoWars

    1. NightDocsYT
      NightDocsYT
      • supporter
      • 12 kudos
  3. nuclearmonke
    nuclearmonke
    • member
    • 0 kudos
    Arasaka is putting chemicals in the Real Water to make all my chooms gay. This is a fun little mod that adds a layer magic realism to the game I adore.
  4. nuclearmonke
    nuclearmonke
    • member
    • 0 kudos
    Arasaka is putting chemicals in the Real Water to make all my chooms gay. This is a fun little mod that adds a layer magic realism to the game I adore.
  5. JtheENiGM4
    JtheENiGM4
    • supporter
    • 0 kudos
    Now the frogs in Night City are gay too.  Don't drink the corpo water. 
  6. Scrappy172
    Scrappy172
    • premium
    • 0 kudos
    This adds the immersion in ways I cannot express. 
  7. jubba76
    jubba76
    • member
    • 0 kudos
    a conspiracy character based on a real life conspiracy character in real life only for someone to make a mod cloning their voice based on them is probably the most cyberpunk thing you could possibly do
  8. THERKP
    THERKP
    • member
    • 0 kudos
    I've been modding video games for a little while now. From Fallout 3, New Vegas and 4, to Oblivion and Skyrim and now finally here, Cyberpunk 2077 and I have to say, without a shadow of a doubt, this IS one of THE GREATEST mods I have encountered over the last ten years. You deserve, THE BIGGEST pat on the back. Sir, if I could buy you a beer I absolutely would! 
  9. namii87
    namii87
    • member
    • 0 kudos
    Bro I never endorse let alone leave comments but you are a legend. Thanks for the write up.
  10. NwahWitAttitude
    NwahWitAttitude
    • premium
    • 2 kudos
    I f*#@ing love Alex Jones. Awesome guy. Thank you for this. 
  11. Syynx
    Syynx
    • premium
    • 101 kudos
    I really like this mod but his voice always came off as unnatural sounding so I made some edits to the audio files and imo it sounds much better now

    Here's the new version

    OP, if you like the edits you're more than welcome to upload it here
  12. Tankyz
    Tankyz
    • member
    • 1 kudos
    This mod is fire, better than sex mods.