-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading Voice / Audio from Voice Channel? [For Voice Recognition AI Bot] #444
Comments
Not yet. Voice receive has been planned for ages. PRs welcome. |
https://github.com/Rapptz/discord.py/blob/master/discord/voice_client.py#L266 Seems to be already reading/polling from the voice channel though? |
Sure, just design the API, document it fully and submit a pull request. |
Just to settle this, this has been tried and tried again, and everyone has mostly failed. Danny wants it one in a nice way, but really, it isn't worth the time and effort, so this will not be coming anytime soon. |
whos danny? and why is it hard? isn't it just connecting whatever the socket for the audio and reading that data |
Danny made this library (Danny = Rapptz). Next, it is very easy to read the data from the websocket, but presenting data that is usable, in a decent manner, is hard. In essence, you have to chunk streamed data, and I dont think anybody wants to go into the trouble of doing that, yet. The hard part is designing the API in a way that is useful and useable, not a quick throw-together solution. |
Just to let you know, Danny has planned voice recieve for the rewrite |
oh ok thanks for the information! |
I wanted my bot to play a 15 second replay on-demand, just for laughs basically, so I needed basic recording capability to start with. I built a setup that mixes together all incoming audio and makes available a single stream of ~50 packets a second. There's no fancy synchronization or stretching, it's just in-out as fast as possible with a latency of a few frames so there's time to get everything in order. You need to call a function to fetch a new frame 50 times a second. Each speaker can be "re-synchronized" when they don't speak, so the stream remains live and stable on the long term even if there's minor drifting. Otherwise you could drop or duplicate packets I'm sure. The code is shit, but if I could make it a little less shit, would that kind of basic "just feed me data" API be worthy if only as a starting point? |
You can always pull request, but keep in mind, there have been more than a few failed attempts, since Danny is very strict when it comes to pull requests. |
@Ruuttu Would you be able to share that code? Curious because I'm trying to add some voice recording (save to file) Thanks |
Let's see. This was all done against version 0.13.0 at the time. I started by copying the work from #333 for receiving decrypted opus voice packets. I wrote a "Decoder" class in opus.py, which I've only confirmed to work in Windows. In your bot (inherited from discord.Client) you need to call enable_voice_events() for your VoiceClient after joining a channel. After that you can receive opus packets in the on_speak() method which you'll add. I wrote a "Recorder" class that takes the packets from on_speak(), converts them to PCM and maintains continuous per-speaker audio "streams" that sync together. There's a get_replay() method for retrieving the last n seconds of audio. You get lists of tuples because the audio is still separated by speaker, plus there's some extra data. Once you figure out what's what, you can mix together the speakers using python's audioop module. You'll need to make some edits, but this should have all you need. I added a commented out example of how you might write a mixed down PCM stream to a file. Sorry some of the code is kinda silly and poorly commented. |
@Ruuttu Thanks for this! This is very helpful. However, doing these modifications against the latest discord.py version, the decoder doesn't seem to be working, it raises an access violation error.
Only thing that has been changed between these versions (of discord.py) in opus.py is it setting the signal type to auto when encoding:
(in opus.py) I just recently started with Python so I don't have any idea how this could be fixed. I already got decoding working before using python-opus(with some editing), but it would be nice to get this working since it doesn't need another library. EDIT: I think i got it working, atleast it doesn't error anymore. I was just messing around in opus.py and somehow got it working. Here is my opus.py that seems to be working. |
I've been needing voice recieve for some stuff, and I've had a poke around and I think it should be possible to knock together a jitter buffer to handle recieving audio when I get home. |
@Bottersnike Ruuttu's initial code seems to no longer work, it fails decrypting the voice packets with some ciphertext error. If you get your code working, could you share atleast the voice packet decrypting part? Thanks! |
I implemented it in node the other day because that was the only language I could find a good lib for receiving. It shouldn't be too hard to port it over and then make it conform to d.py. |
Did you guys ever end up figuring out a reliable solution for audio receive? I would be happy to use someone's fork in the meantime if it's not good enough to be merged upstream. My use case: I want to set up a Raspberry Pi running discord.py that will operate as a passthrough audio device to both transmit to and receive from a discord channel using the microphone and headphone jack of a USB audio adapter connected to the Pi. Then I plan to connect the mic jack to a feed coming from my Playstation 4, and the headphone jack to a line in adapter for the PS4... connect the PS4 to Party Chat and leave both it and the Pi running, and suddenly I have an official PSN Party that will allow PS4 players to chat with Discord users (who are playing the same cross-platform MMO on PC). It's for my Final Fantasy XIV group... But I imagine the 2-way Discord audio on the Pi might be useful for others too. |
Looks like I might have better luck using https://discord.js.org instead. |
Indeed. The packet parsing that I was using was relying on the fact that Discord was not using the most up-to-date structure. Because of that, the entire RFC wasn't implemented. Due to my lack of motivation, I'm unlikely to ever fix it. |
Sorry if I'm not up to date on this, has there been any work on this ? |
See #1094 |
Thanks for this :) |
Disord py doesn't yet let us simply read/listen audio present in a voice channel. See Rapptz/discord.py#1094 and Rapptz/discord.py#444 It needs probably more work than I intend to do, websockets ack rec convert audio async etc Nothing impossible but I expected to just use a play and record functions, not having to implement one.
This feature is useful, for example transcribe audio from channel and translate it in real time. |
Hi guys, I'm wondering if the library has capability to read audio bytes from the voice channels? I'm building a Bot that will read the voice and try to convert it to text commands.
Can anyone enlighten me?
Thanks!
The text was updated successfully, but these errors were encountered: