Morrowind

I don't usually write articles around my mods. In fact, I *never* write articles around my mods. Does anyone?

I work in tech for my day job, and, having spent alot of time with ElevenAI specifically, am in an interesting position to share my thoughts around AI - specifically voice/sound AI, and the criticisms and concerns I see coming up in the comments section and in the discourse around this mod and others like it.

I've written the below not to attack those people, or defend the ethics of voice AI mods like this one - but just to put my own thoughts down in writing, as people clearly have an interest in the subject.


Concern 1: But didya get permission to steal the voice from Jeff Baker's mouth?

Mods have always existed in a muddy legal territory. 

The game ships with a load of assets - meshes, textures, sounds, music.

People have come to accept that the remixing of those original assets - even if you don't make anything 'original' yourself (we'll get to, later, if this mod does have anything original in it) - like a modder who kitbashes a new armour together from existing meshes and textures and releases it as 'Redoran dust warrior armour' or something. Or you do an acoustic guitar cover of the Morrowind theme song for a bard to play in a tavern. Or you re-use the bug texture from a robe and use it for a creature. 

No one worries about getting permission from the original modelling team to make that armour. No one worries that they're putting them out of a job.

Morrowind is a game that's over 20 years old now - no one is going to be missing out on royalty checks from this content. 

If anything, if this mod was to become incredibly popular (be still my beating heart!) it could actually push people to purchase Morrowind, either to install again or for the first time - actually making Bethesda and crew money, and sustaining fan interest in the series until the next Elder Scrolls game.


Concern 2: Even if it's not unethical, the results are creepy and unnerving. 

This is a really interesting one. In part, I think it's because it's new technology, and so we're perhaps a little *too* impressed by the results. Voices are just made of composite elements - pitch, volume, reverb, etc - with some patterns for how we annunciate or pause in our speech. It's perhaps no more complex than the disparate elements of say, an armour mod - it's just less well understood by players as a whole. When we first saw normal maps and diffuse maps and all these cool texture variants that simulate slime or metal and light hitting surfaces, we thought it was witchcraft. Now we just see being able to simulate cloth physics and hair whipping in the wind as par for the course for our games. 

Perhaps the other problem is the fact the voice is such an intimate thing. A texture or a model is one thing, but a voice - mimicing it, replicating it - feels akin to copying someone's soul. When you create an armour that looks realistic, people clap. But when you create a voice that fools people into thinking its real, they get creeped out. I definitely did when first making this mod. But it quickly passed, and was replaced with a sense of wonder, and a feeling of 'wow, I can really make Morrowind feel like a more immersive, reactive place for players to have fun in with this tech.' 

Once you get over the fact that it's possible, and passes for the real thing, I think the feeling passes. 


Concern 3: It's too easy to make this stuff. You just press a button and the AI makes a voice. 

This is both true and not true. It is true that you can get 70% decent results from ElevenAI with little effort. But in order to get those 95% immersive results I'm trying to do, it's much harder. This includes:

Endless 'regenerating' of the lines for it to put emphasis in the right place. The number of times I've had a take that has 'You'd better watch yourself' delivered in a threatening baritone, just to have the second part 'or I'll call the guards' delivered in a cabaret sing-song that ruins the whole thing, I can't count.

Re-writing of the original lines. Morrowind wasn't written in a way people actually speak. The greetings were written to have a wikipedia style hyperlinking of topics and phrases. Ie - I'm Von Django, the modder for Bethesda's award winning game Morrowind. Many of these lines have been re-written, shortened, and humanised to work as delivered speech.

Re-writing phoentic spelling of Dwemer ruins, local alcohols, or Dunmer language. 'Lahloo, Must-ay,' etc

Originally, the scope of this mod was just to make quest-givers awknowledge the quest you were on. But this mod has expanded to include so many new things, like custom idle thoughts for some NPCs, voices for guild guides who you can travel with, rumours for cities and towns, and even some wild stuff like Galbadier actually having a conversation downstairs in the Mages Guild while she's distracted and you're on a mission for Ajira.

This is where the 'original' part of the mod comes from, beyond just pressing the button in ElevenAI and getting a sound file.


Concern 4: The mod isn't the problem. The problem is the implication of the tech for voice actors, and humanity as whole. 

I'm definitely not qualified to talk about the impact of AI on humanity as a whole.

It's certainly true that some technologies often put people out of business. Photography, although it didn't kill convential art, probably put alot of portrait painters out of business. Or perhaps allowed people who could never afford to have their portrait a taste of a similar thing.

Having used ElevenAI extensively, I don't think AI voice models will put voice actors out of business.

There will probably be a shutterstock bank of AI Voices for games that can't, or won't afford, to pay for original voice acting. 

But this will never be able to replace the impact of the real thing. 

As someone who has spent too much time with it - the only reason ElevenAI works here is a very limited application - short, sharp single lines of dialogue. ElevenAI is incapable of delivering a stirring speech, much less a realistic exchange of dialogue between two actors. ElevenAI doesn't understand the real emotional impact of what it's mimicing, and certainly can't capture the nuance of two actors bouncing off each other. It can only accidentally hit the right notes, in the right order, for a very short period of time - if you make it reroll enough. It's the equivilent of 5000 monkeys on typewriters eventually writing Shakespeare.

Players will know when they are in an AI voice acted game (when you have full, deep back and forth conversations) - and will always prefer the impact of the real thing.

I'm sure there will be AI addendums to actors contracts so the studio can use AI to record additional lines for new content as the game is developed, or additional responses to dialogue trees if they come up with additional ideas after the recording session. But it'll be, in the main, a backup plan to give studios extra leeway in development rather than a replacement for the real thing. 

Of course I could be wrong. But as someone who has used this technology a huge amount to make this mod, I think people have less reason to be concerned about this technology than this mod may make them feel.

It's a cool trick. But it's not a game-changer. 



Article information

Added on

Written by

vonwolfe

7 comments

  1. miketheguy
    miketheguy
    • supporter
    • 1 kudos
    There is no ethical problem, voice AI is the future for video gaming. It will lower costs and allow for more expansive and engaging games. 
  2. jahmon808
    jahmon808
    • premium
    • 70 kudos
    I don't know anything about anything, but I think I wouldn't be entirely incorrect in pointing out for those who understand even less than I do that this isn't actual "Artificial Intelligence (AI)", it's just a program capable of learning and then firing up the 5000 monkeys to hopefully give you that near-perfect result you're looking for.
    1. vonwolfe
      vonwolfe
      • premium
      • 143 kudos
      Its impossible to know what's under the hood of the tech. It can be guided tonally by emotive words like 'kill' and by punctuation. It can mimic pauses and intonation well. But yes, the need for constant rerolls suggests it's a well designed tool rather than ai in the proper sense.
  3. Confessoru
    Confessoru
    • member
    • 13 kudos
    Your exposition perfectly outlines the mod maker's intent to improve on the gaming experience, but it lacks finer tact and is problematic for the reason that your entire argumentation is based on a flawed assumption that using voice assets equates to using any other assets.


    Once you get over the fact that it's possible, and passes for the real thing, I think the feeling passes. 

    For a moment you considered, in a Daniel Day-Lewis fashion, that voice belongs to the domain of a soul. And yet, you do not seem to understand the difference between making armor and synthesizing a voice.
    Voice has an inherent imprint of a person behind it. It is wholly unique. Can you tell me which developer modeled a chitin armor? Will you be able to differentiate between a person who modeled said armor, and tell if this person also made a building asset? You are making a synthetic product to reproduce the likeness of a person. When you use your new tech toy to reduce the voice to mere physical components to give the masses the taste of a real thing, you only stop to consider how many clicks on the churn out button you need to push, saying that having the end result be close enough will make it believable in people's eyes. It doesn't matter how good the end result is. If tomorrow someone unbeknownst to you makes a deepfake of you and put it up on display, wouldn't you want the person to ask permission to use your image beforehand? You may not be infringing on legal rights, but if you don't have an explicit permission, you should not use it due to the risk of infringing on personal rights. It's a common courtesy. 


    No one worries about getting permission from the original modelling team to make that armour

    People accept it because you are given explicit permission from Bethesda, and even then, they don't want you to port game assets from one of their games to another. In your view, since voice assets fall into the same category as model assets we do not need to ask the permission of the voice actors.
    Why? Because there is no explicit statement forbidding you to use that content? Does this give a readily apparent implicit permission? Do you think we should only be bound by a legal framework and it's okay to do since it's in the grey area? When you want to use the other modder's work on Nexus to include it in your mod, you are required to ask that modder's permission. Why do you think that is? It is based on what we as humans being agree is fair. To have the greenlight of the original creator with intent to use or modify his work. It's not an arbitrary decision, we recognize (hopefully) that it's a right thing to do. In any legitimate modding circles, there is an underlying assumption that you shouldn't use something for which you have no permission. It will be fair to establish the guidelines to the use of voice assets, simply as an alternative to a missing legal framework, and because we understand that it's a respectful thing to do, as some of the Morrowind's original voice actors in the past have already asked the community for their voice not to be trained.


    Morrowind is a game that's over 20 years old now - no one is going to be missing out on royalty checks from this content. 

    This is laughable. Morrowind is an old product, so such abstract concepts like copyright do not apply? Intellectual property rights do not have an arbitrary expiration date. 

    The way of moving forward with this in mind would be: as a modding community, establish the ethical guidelines and assume that using voice assets implies infringing on personal rights associated with an inherent likeness of the actor. Do not use the voice without an explicit permission of the owner. It won't be convenient, yes. But it will be fair.
    1. vonwolfe
      vonwolfe
      • premium
      • 143 kudos
      That's a strong rebuttal! You will forgive me if I reply with equal vim (chim).

      "For a moment you considered, in a Daniel Day-Lewis fashion, that voice belongs to the domain of a soul. And yet, you do not seem to understand the difference between making armor and synthesizing a voice."

      I'm guessing you mean Daniel Plainview, the character he plays in There Will Be Blood? Unless you're equating me with the greatest actor of our generation - in which case, I'll take the compliment.

      If you do mean Daniel Plainview, it's a little over-emotive to compare me to a murdering, psychopathic oil baron in your opening gambit. Unlike Daniel Plainview, I make no money from this, and the only power I have is to entertain fans of a 20 year old RPG.

      My opening rebuttal would be - you fail to establish the meaningful difference between the two in terms of permissions, and resort to literary flourishes instead of any kind of evidence to back up your thesis.

      'You are making a synthetic product to reproduce the likeness of a person.'

      I'm making a mod that makes Caius Cosades, a fictional character in a fictional video game, recognise the quest you're on. No one believes I've created life with this mod - just something you can believe in if you choose to suspend your disbelief. This is a really important distinction for an argument you make later.

      'People accept it because you are given explicit permission from Bethesda, and even then, they don't want you to port game assets from one of their games to another. 


      Same assets, same game for this mod.

      "Will you be able to differentiate between a person who modeled said armor, and tell if this person also made a building asset? "

      Uniqueness of the asset seems to be the crux of your argument. You wouldn't know it was Jeff Baker's voice unless I told you or you looked it up. Voice actors tend to be just as anonymous in terms of their reputation as Modellers or Texturers do. If you mean the ability to say 'that's by the same person' - then when it comes to artists, I can often tell who they are by their work. Vincent Dutrait's art on board game boxes, for example - or the fact Michael Kirkbride's concept art was missing from Oblivion, creating a notable lack of the weird, strange world I love Morrowind for. Many people have voices that sound near identical, especially to untuned ears - or faces people get mixed up between actors. 

      You want to believe in a unique-ness to these things because it invokes some kind of humanity or soul to you. It makes you feel uncomfortable that they're not - but you feeling uncomfortable isn't a strong argument.

      'If tomorrow someone unbeknownst to you makes a deepfake of you and put it up on display, wouldn't you want the person to ask permission to use your image beforehand? '

      If they told everyone it was a deepfake, and it was being used for a purpose that was an extension of something I already did and agreed with (voice acting characters in the same video game) - and it wasn't losing me money, or making them money - yes, I would. Genuinely. I don't know what it would be - perhaps a comedy meme of me talking to Dagoth Ur that went viral - I would be okay with it.

      Your line implies that in this case, someone would create a deepfake of me to try and fool people into thinking I really said it, and put words in my mouth I didn't agree with - which isn't the case here.

      The only argument you could make, from a moral point of view, is making their voices say things they would have been uncomfortable recording originally. An explicit sex mod, for example. There I could see an argument for being morally shady.

      "This is laughable. Morrowind is an old product, so such abstract concepts like copyright do not apply? Intellectual property rights do not have an arbitrary expiration date. "

      I mean, they literally do, it's 70 years after the death of the original author. Which obviously hasn't elapsed yet - but if you're going to write something down, check your facts first.

      My point about Morrowind being old wasn't that copyright had elapsed, my point was you couldn't argue this mod was putting anyone out of pocket.

      "The way of moving forward with this in mind would be: as a modding community, establish the ethical guidelines and assume that using voice assets implies infringing on personal rights associated with an inherent likeness of the actor."

      Why? It's a conclusion unsupported by your argument.

      Should modders who build armour from Michael Kirkbride's concept art take it down unless he gives them permission? Or those face mods that uses george clooney's face for guards?

      I can recognise the original source they're copying in both cases, but I wouldn't get up on my soapbox and tell them to take them down.

      Modding exists in legal grey area - and that's part of the creativity and freedom that allows hundreds of thousands of mods for this game to flourish. 

      If anything, the stupid Dagoth Ur memes and mods like this one are one of the few pure outputs for AI Voice work that will be morally uncomplicated moving forwards - making no one any profit, clearly marked as AI-produced, and done simply for the love of a game.
  4. deleted175703511
    deleted175703511
    • account closed
    • 0 kudos
    There's a lot of this going on in our Stardew Nexus too. Infighting over the ramifications of AI generating villagers' character portraits. On the one hand I empathize with young up and coming artists feeling threatened competing with new tech. On the other hand, I write a lot of it off as technophobia. Very complex issues that humanity will just have to work through.
  5. Sonja
    Sonja
    • member
    • 181 kudos
    It's a fascinating area of discussion, and while my most recent post on the topic illustrates some of my unease, I also can't help but feel that sense of wonder you describe.

    This is both true and not true. It is true that you can get 70% decent results from ElevenAI with little effort. But in order to get those 95% immersive results I'm trying to do, it's much harder. This includes:

    Endless 'regenerating' of the lines for it to put emphasis in the right place. The number of times I've had a take that has 'You'd better watch yourself' delivered in a threatening baritone, just to have the second part 'or I'll call the guards' delivered in a cabaret sing-song that ruins the whole thing, I can't count.


    Re-writing of the original lines. Morrowind wasn't written in a way people actually speak. The greetings were written to have a wikipedia style hyperlinking of topics and phrases. Ie - I'm Von Django, the modder for Bethesda's award winning game Morrowind. Many of these lines have been re-written, shortened, and humanised to work as delivered speech.


    Re-writing phoentic spelling of Dwemer ruins, local alcohols, or Dunmer language. 'Lahloo, Must-ay,' etc


    Originally, the scope of this mod was just to make quest-givers awknowledge the quest you were on. But this mod has expanded to include so many new things, like custom idle thoughts for some NPCs, voices for guild guides who you can travel with, rumours for cities and towns, and even some wild stuff like Galbadier actually having a conversation downstairs in the Mages Guild while she's distracted and you're on a mission for Ajira.


    The above, I think, highlights why I prefer your mod over the alternatives. Aside from the fact that partial voicing just fits old games like this better (at least in my personal opinion), I've really noticed how much care you've taken so that everything sounds right. This is, of course, subjective, but I think that in mods such as this, tone and nuance are vital. I'm reminded of films where an incredibly attractive actor couldn't deliver their lines well, even if their very soul depended on it…when you lack good, natural-sounding delivery, everything else is a waste. This is also, incidentally, one of the main reasons why I've never been deeply pulled into Oblivion… the voice acting is horrendous on every level, and it's something that, at least historically, has been impossible to address through mods.