Thread #43127073
File: altOP.jpg (1.3 MB)
1.3 MB JPG
Welcome to the Pony Voice Preservation Project!
youtu.be/730zGRwbQuE
The Pony Preservation Project is a collaborative effort by /mlp/ to build and curate pony datasets for as many applications in AI as possible.
Technology has progressed such that a trained neural network can generate convincing voice clips, drawings and text for any person or character using existing audio recordings, artwork and fanfics as a reference. As you can surely imagine, AI pony voices, drawings and text have endless applications for pony content creation.
AI is incredibly versatile, basically anything that can be boiled down to a simple dataset can be used for training to create more of it. AI-generated images, fanfics, wAIfu chatbots and even animation are possible, and are being worked on here.
Any anon is free to join, and there are many active tasks that would suit any level of technical expertise. If you’re interested in helping out, take a look at the quick start guide linked below and ask in the thread for any further detail you need.
EQG and G5 are not welcome.
>Quick start guide:
docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa 0/edit
Introduction to the PPP, links to text-to-speech tools, and how (You) can help with active tasks.
>The main Doc:
docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQna c/edit
An in-depth repository of tutorials, resources and archives.
>Online speech generation
haysay.ai
alpha.15.dev
>Active tasks:
Research into animation AI
Research into pony image generation
>Latest developments:
pastebin.com/4p00iUZM
>The PoneAI drive, an archive for AI pony voice content:
drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp
>Clipper’s Master Files, the central location for MLP voice data:
mega.nz/folder/jkwimSTa#_xk0VnR30C8Ljsy4RCGSig
mega.nz/folder/gVYUEZrI#6dQHH3P2cFYWm3UkQveHxQ
drive.google.com/drive/folders/1MuM9Nb_LwnVxInIPFNvzD_hv3zOZhpwx
>Cool, where is the discord/forum/whatever unifying place for this project?
You're looking at it.
Last Thread: https://desuarchive.org/mlp/thread/43073987/#43073987
111 RepliesView Thread
>>
FAQs:
If your question isn’t listed here, take a look in the quick start guide and main doc to see if it’s already answered there. Use the tabs on the left for easy navigation.
Quick: docs.google.com/document/d/1PDkSrKKiHzzpUTKzBldZeKngvjeBUjyTtGCOv2GWwa 0/edit
Main: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQna c/edit
>Where can I find the AI text-to-speech tools and how do I use them?
A list of TTS tools: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQna c/edit#heading=h.yuhl8zjiwmwq
How to get the best out of them: docs.google.com/document/d/1y1pfS0LCrwbbvxdn3ZksH25BKaf0LaO13uYppxIQna c/edit#heading=h.mnnpknmj1hcy
>Where can I find content made with the voice AI?
In the PoneAI drive: drive.google.com/drive/folders/1E21zJQWC5XVQWy2mt42bUiJ_XbqTJXCp
And the PPP Mega Compilation: docs.google.com/spreadsheets/d/1T2TE3OBs681Vphfas7Jgi5rvugdH6wnXVtUVYi ZyJF8/edit
>I want to know more about the PPP, but I can’t be arsed to read the doc.
See the live PPP panel shows presented on /mlp/con for a more condensed overview.
2020 pony.tube/w/5fUkuT3245pL8ZoWXUnXJ4
2021 pony.tube/w/a5yfTV4Ynq7tRveZH7AA8f
2022 pony.tube/w/mV3xgbdtrXqjoPAwEXZCw5
2023 pony.tube/w/fVZShksjBbu6uT51DtvWWz
>How can I help with the PPP?
Build datasets, train AIs, and use the AI to make more pony content. Take a look at the quick start guide for current active tasks, or start your own in the thread if you have an idea. There’s always more data to collect and more AIs to train.
>Did you know that such and such voiced this other thing that could be used for voice data?
It is best to keep to official audio only unless there is very little of it available. If you know of a good source of audio for characters with few (or just fewer) lines, please post it in the thread. 5.1 is generally required unless you have a source already clean of background noise. Preferably post a sample or link. The easier you make it, the more likely it will be done.
>What about fan-imitations of official voices?
No.
>Will you guys be doing a [insert language here] version of the AI?
Probably not, but you're welcome to. You can however get most of the way there by using phonetic transcriptions of other languages as input for the AI.
>What about [insert OC here]'s voice?
It is often quite difficult to find good quality audio data for OCs. If you happen to know any, post them in the thread and we’ll take a look.
>I have an idea!
Great. Post it in the thread and we'll discuss it.
>Do you have a Code of Conduct?
Of course: 15.ai/code
>Is this project open source? Who is in charge of this?
pony.tube/w/mqJyvdgrpbWgZduz2cs1Cm
PPP Redubs:
pony.tube/w/p/aR2dpAFn5KhnqPYiRxFQ97
Stream Premieres:
pony.tube/w/6cKnjJEZSCi3gsvrbATXnC
pony.tube/w/oNeBFMPiQKh93ePqTz1ns8
>>
>>
>>
>>43127156
Well, I'm kind of doing something but like I've mentioned last thread, things are going super slow.
At least the Chinese ai video image stuff Anons showed from month or two ago is able to make a really nice quality ponies (even if the movement itself is still derped)
>>
>https://u.pone.rs/tyfahhli.mp3
>https://u.pone.rs/wzzujnzi.mp3
some Vul stuff found in the wild
>https://u.pone.rs/bedglhrn.mp3
/create/ cover of Mrs Robinson
>>
>https://u.pone.rs/dfezfvrk.mp4
right now it seems all of the cooler video ai are limited to making 5 seconds (or 20 seconds at best) of footage, technically someone with insane patience could generate bizillion clips and stitch them all together to create a coherent ai episode.
However seeing how stuff improves by yearly basis I feel like once we get a open source model that can make a whole minute of decent quality ai video the interest to making one should comeback among Anons (but only if the gpus will stop costing an arm, leg and both kidneys).
>>
>>43128959
Oh, and I meant to also crosspost this new green screen ai model from ai art thread (a guy trained a custom stablediffusion model to take in green screen image sequence and output true transparency pngs, were even wearing green clothes and holding reflective glass would still result in a almost industry standard footage masking)
>>43098665
>>43102567https://www.youtube.com/watch?v=3Ploi723hg4
https://github.com/nikopueringer/CorridorKey
https://github.com/edenaion/EZ-CorridorKey
>>
>>
>>
>>
>>
>>
>>43131240
Which GitHub are you trying to install? Also what's your gpu, if you trying to get the newest stuff the requirements txt will fuck your stuff up on basis that it's trying to get newest modules that are not always compatible with eachother andor hardware you have?
>>
>>
>>
File: Terri Softmare 2964722.png (580.5 KB)
580.5 KB PNG
>>43131240
>https://u.pone.rs/hegtqssw.txt
hey bud, I was looking at my own conda installation instruction and it does looks like an absolute clouserfuck of patch notes written on top of each other (as seen per pip freeze above, its a mess and some).
One of the things that seem to highlighted is to make sure the environment is set as follows:
conda create -n "_name_of_your_env_" python=3.10.3 ipython==9.11.0 --yes
Followed by this sequence:
typing-extensions==4.5.0
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1+cu116 "tensorflow[and-cuda]" --extra-index-url https://download.pytorch.org/whl/ --no-cache-dir --force-reinstall
conda install curl=8.9.1
pip install omegaconf==2.0.6
pip install abc
pip install omegaconf==2.0.6 --force-reinstall
#(I think installing abc module breaks the omegaconf module?)
pip install --upgrade setuptools==49.6.0 pip --user --force-reinstall
C:\Users\User001\anaconda3\envs\RVC_vul_5\python.exe -m pip install --upgrade setuptools==49.6.0 pip --user --force-reinstall
pip install PyYAML==5.1.2
conda install requirements.txt -c conda-forge
And for some reason there is also this fucking note, because apparently getting the PyYAML to work its adventure on its own:
-------------------------
#requirement error PyYAML (>=5.1.*)
git clone https://github.com/omry/omegaconf/ --branch v2.0.6 --depth 1
##this instructions
cd omegaconf
#go to the file \omegaconf\requirements\base.txt
#change the PyYAML requirement from PyYAML (>=5.1.*) to PyYAML (>=5.1).
#create and install the module
python setup.py sdist
pip install dist/omegaconf-2.0.6.tar.gz
#exit directory
cd ..
-------------------------
Sorry if the above terminal installation steps are confusing, I've got the RVC working few years ago and dare not touch that part of console with ten foot pole in case it breaks.please do post if you need more help, I will lurk here and in the ai art thread mostly
>>
>>43131366
>>43132275
Thanks so much for being willing to help out. I’m trying to install the webUI from https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI on a Dell Poweredge R730 server (cpu only, no gpu) to see how well it performs for voice conversion.
I’ll take another stab at it tonight and let you know how it goes. I did eventually get the webui running and I could connect to it from another machine, but the UI immediately displays an error whenever I click any buttons or dropdowns. As far as I can tell, the callbacks defined in the gradio components are not getting called at all (I can insert a print statement in them and it doesn’t print). Something is broken with gradio but I have no stack trace to work with, so it’s very hard to troubleshoot. I’ll try reinstalling it from scratch using the pip freeze; maybe some other module I currently have installed is incompatible with the version of gradio I got.
>>
>>43132364
>(cpu only, no gpu)
hmm, Im not sure if rvc code has a default swtich to cpu if gpu is not present, it may be worth looking int ot he main start up python code and comment out the gpu device lines to put in their place something like this?
import torch
torch.cuda.is_available = lambda : False
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
No idea if this will work, I just took it from the first semi decent looking link on front page https://community.esri.com/t5/arcgis-image-analyst-questions/how-force -pytorch-to-use-cpu-instead-of-gpu/ td-p/1046738
Also, the only rvc Ive used was the one from Vul github and something called RVC1006Nvidia for doing the voice training.
>>
>>43131085
>What the fuck happened to this thread?
It achieved everything it set out to do, vastly exceeding expectations in many cases. Now AI progress has generally plateaued and will probably stay that way until the next big breakthrough, at which time the datasets will be there and ready to go.
>>
>>43129728
ngl I actually like this thread as an occasionally recurring general, I miss it when it's not around. Shame it's a bit slow to stay on the catalogue until bump limit yeah, but I imagine with enough advancements in AI tech it could swing back around eventually. Local video gen becoming way more accessible will probably give PPP a good boost.
>>
File: gradio the bane of my existence.png (14.5 KB)
14.5 KB PNG
>>43132275
>>43132364
>>43132480
Welp, I reinstalled everything from scratch using a different python version and going through python dependency hell again, but I still got this same, nonsense "connection" error. There are no logs and there is no stacktrace, even if I set debug=True in the app.queue call in infer-web.py. The webapp and my browser can talk to each other over the default port 7865 and both have internet access, so I don't see how this can actually be a connection error, unless gradio is trying to connect to a nonexistent website for some silly reason. Upgrading fastapi did not help, upgrading gradio created a package incompatibility.
I looked into the cpu vs cuda thing, and it looks like it gracefully switches to cpu if you don't have a gpu. See line ~175 in configs/config.py.
I'm not going to try to get the RVC web client working any more, unless someone points out a quick fix that I've somehow completely missed. Instead, I will tear out all the gradio stuff and get the model loaded on a simple flask server that I can make API calls to for now, for the simple testing I wanted to do. I would have tried Vul's older RVC GUI, but I see pyqt code in there, so I think it's a natively-running UI; I require a web ui... unless I install xserver, which I suppose I could do.
>>
>>
>>43131768
>>43131867
I'd be excited to see whatever gets pulled up for it. I dunno if there's anything super notable that's been done lately, but even a panel showing off pony AI voice covers for music or something would be fun. No pressure though, PPP's kind of in slow mode at the moment, at least to some extent.
>>
>>
>>43133460
>>43133540
Success! A simple Flask server worked for my purposes. The server takes ~6 seconds to convert a 1-second audio file that it hasn't seen before (if it *has* seen it before, then it's about 3 seconds).
I've also noticed that the RVC codebase does not keep the index file in memory after converting a file; it dumps and reloads it again on the next conversion. That's significant, because those index files can be hundreds of megabytes large. I bet I could cut down the conversion time even more if I kept the index file cached.
>>
>>
>>
>>43136753
Oh I actually thought about that, was moreso for EquestrAI being shown off to everyone but you could totally do one with text and then having another Anon trying to make images based on whatever's going on to go with the stream. That could be really fun, just a giant AI ensemble of all of the tech we got to showcase it all at once. Could throw in voicegen for dialogue as well even.
>>
>>
>>
>>
>>
>>43144405
Seeing Anons come up with stuff like the /chag/ ai light novel and the bonziPONY desktop pony ideas brings me a joy to see that despite lots of people pushing the idea that anything with ai is slop there are still plenty of people out there bringing ponies to life in their own way.
>>
>>43144988
Oh absolutely. Was the AI light novel shown off at Marecon like Bonzipony was? And if so, what block was it at? I'm not familiar with the light novel thing. Unless you mean the VN as in EquestrAI then yeah I absolutely agree.
>>43144405
For sure! And if we're talking EquestrAI, the new Godot version should be out soon. They'll be able to do a panel for it there easy, and since it's fully open source there could even be a collab with adding voice gen or something to it. It'd be really neat to have some more rep for the AI side of the board there, not that we haven't had plenty of panels for it already.
>>
File: UI preview.png (345.8 KB)
345.8 KB PNG
Making progress on the IPA Translater app. To be honest, though, I've lost some motivation for this little project because I've come to learn that there are many nontrivial issues with phonemic/phonetic transcription, so no solution I come up with will be perfect. There are literally infinitely many sounds that the human voice can create, so you have to pick a level of acceptable granularity. Accents have a huge impact on phonetic transcription, and you even have to pick a level of granularity on accent, too. For this project, I'm following Wiktionary's English pronunciation appendix for "General American",
https://en.wiktionary.org/wiki/Appendix:English_pronunciation
though I'm not at all sure that's the right choice. Maybe I should presume Canadian English because many of the voice actresses are Canadian? Or maybe let users switch between American and Canadian modes? And then Rarity, of course, speaks with more of a Trans-Atlantic accent. Auto-translating this stuff is going to be sketchy, and you almost need to be trained in phonetics to do a correct transcription. At the same time, though, all this messiness highlights the need for cleaner training data. I'm sure that most of the training the ppp has done so far has involved some degree of auto-transcription from plaintext to IPA or Arpabet, and who knows how good a job those tools did.
>>
>>43145658
>For sure! And if we're talking EquestrAI, the new Godot version should be out soon.
Nonny...
Last update (download here): >>43144346
And what was added at first (don't download here): >>43140874, >>43140879
>>
>>
>>
>>43146737
I feel like a lot of intellectual force got pushed into LLMs and other text boat as well as whatever other shinny things that look nice to investors (like "this ai can replace your admin/lawyers/doctor wagies" sound bites).
I have a gut feeling the reason why RVC and the other stuff from the past years years peaked because it got to the "good enough" level were the clips already sound like the source 90% of the time, but getting the extra 10% to make the lines truly sound like they are spoken by a human is just bit too difficult without some extreme miraculous eureka moment.
>>43145865
Speaking of innovation, it may sound dumb, but isn't there a way to train a separate speech-to-ipa model on smaller dataset that all it does is learning how to convert the spoken word into ipa and than use that new model to create proper ipa dataset with all the available audio datasets from the past for the creation of this new-new tts model?
>>
>>43145865
i do look forward to see what you cook up, as using the ipa has potential of making a semi universal tts that can actually use the original character voices and isnt just converting somebody else voice to kind of sound like /int/ ponies
>>
>>
>>
>>43149017
it used to be all ai related things, however at one point the ai art & text/LLM models spin off into their own things, which in turn makes the PPP semi redundant since there are dedicated threads for them leaving the PPP with other ai stuff that sadly doesn't get as much spotlight (like population simulation or the ability for tts models trained on only one voice language input dataset being able to talk perfectly in multiple other languages)
Also getting into why the PPP threads are bit dead now, since using art/LLMs for just personal use is much easier while making stuff with voices always requires extra steps of making the recording, converting it than editing it into a larger script.
So now you already have a situation were there are several steps that discourage people from making stuff even before they look into making anything voice related BUT than you have another layer of dealing with Python bullshit before hand (and as seen in the talks above in this thread, most people do not have enough autism to deal with that bullshit).
tldr its not good in the hood
>>
>>43149621
>it used to be all ai related things
Sort of, but we already hyper-focused on voice way back then.
>It's dedicated to save our beloved pony's voices
>This project is the first part of the "Pony Preservation Project" dealing with the voice.
Then with 15, which is what we are mainly known for in general by those who know of us, we sort of pigeonholed into that even more.
>>
>>43149621
nta, but i'm glad to have been here when this thread was /the/ cutting edge of ai development.
>cake
>science fiction is here
watching it all slowly come to life thanks to a bunch of anonymous strangers on the internet working >for free is something I will never forget
>>
>>43145865
https://arxiv.org/abs/2603.29217
>Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition
Heres something related to your work
>>
>>
>>
>>
>>
>>43152191
>>43152229
Different types of negro.
>>
File: neuralblender - Lego My Little Pony cover art.png (1.3 MB)
1.3 MB PNG
remember where you came from pony man
>>
>>
>>
>>
>>
>>
File: 1418305233869.png (129.8 KB)
129.8 KB PNG
>>43151111
>>
>>
>>
>>
>>
>>
>>
File: 1329778572192.png (71.5 KB)
71.5 KB PNG
>>43153464
>>
>>
>>
>>
>>
>>
>https://www.youtube.com/watch?v=kaXOUZJPurc
man, this guy is both picking some nice songs but at the same time he just doesn't do anything to get the pitch levels "neutralize" when they go out the usual singing octave.
>>
>>
>>
>>
>>
>>
>https://www.youtube.com/watch?v=H4FgCiEDnPM
this zigga is cooking (looks like its another rvc converted song, but this time guy edited in segments with multiple lines overlapping).
>>43172308
such as is life of Chinamare
>>
>>
>>
>>
>>43162590
>>43169221
>>43172489
I honestly don't understand the point of just converting the voices without changing anything else. It won't sound as good as the original. At least change the lyrics to make it more pony or something. But I guess that requires way more effort and you'd have to sing everything yourself.
>>
>>
>>
>>
File: awh.png (460.5 KB)
460.5 KB PNG
>>43178242
no
>>
>>
Do you love Starlight? Love Communism too!
>>
>>
>>43180647
>show glimglam https://youtu.be/ywVHF6Lltac?si=i7-KwIM7HyekJG9a
does she double down or get better?