Profanity filter for Nova #771

robfig · 2024-05-21T15:30:12Z

robfig
May 21, 2024

We use Deepgram for Closed Captioning in meetings. I got feedback from a customer today:

Hi, I just wanted to let you know about something that happened yesterday. You know how Roam's captions feature tends to self-correct? Like it'd type out something but then when it "realizes" that the person said something else, it'd instantly change that word to the correct word. Well, I was in a meeting yesterday and one of the wrong words was a racist slur that shouldn't be in anybody's vocabulary. It was corrected after a couple of seconds, but I still saw it. I was shocked that the captions feature even "knew" that word. I thought you'd want to know!

I see that Deepgram supports a Profanity filter, but that only applies to Base models. We use Nova-2. Do you have any plans to add support, or is there an alternative approach for handling this that you'd recommend?

Thanks for your help
Rob

team-deepgram · 2024-05-21T15:30:22Z

team-deepgram
May 21, 2024
Maintainer

Thanks for asking your question about Deepgram! If you didn't already include it in your post, please be sure to add as much detail as possible so we can assist you efficiently, such as:

The request_id if you have a question about your requests or transcription responses.
The features you used or the full api.deepgram.com URL you sent your request to, including parameters.
Any code snippets you can share.

0 replies

jpvajda · 2024-05-21T22:47:57Z

jpvajda
May 21, 2024
Maintainer

@robfig I can pass this need onto our product team for feedback.

0 replies

jkroll-deepgram · 2024-05-24T21:10:17Z

jkroll-deepgram
May 24, 2024
Collaborator

@robfig , thank you for sharing this feedback, and I apologize for you and your customer being put in this situation.

As you noted, Deepgram only supports profanity filtering for our Base model, which we wouldn't recommend as Nova-2 is newer and more accurate.

Our models are trained on very large volumes of data, which may include instances of profane language. We don't apply any cleaning to our training data or models that prevents certain words from being predicted.

For other customers of ours who have wanted to implement their own profanity post-processing, we have compiled a list of 1200 terms covering a range of English profanity (cursing, offensive language, etc). Let me know if you would be interested in consuming this list in your own post-processing, and I can share it with you privately. You can also implement the idea of censoring a shorter list of terms that you want to prevent from ever being displayed in captions.

Please be assured that I've raised this broader product question internally as well for further discussion.

6 replies

jkroll-deepgram Jun 4, 2024
Collaborator

Hi @nccscott, it is a txt file with each line being a word to be redacted. You could consume it in any preferred post-processing logic to search for each of those words, and replace it with an empty string or other symbol.

If you can share an email address, I can reach out to you directly (and you could delete it here afterward). Also if you provide your Deepgram project ID or a request ID, I can look up the associated email.

jkroll-deepgram Jun 4, 2024
Collaborator

@nccscott Sent to you 👍

nccscott Jun 4, 2024

Thank you

robfig Jun 4, 2024
Author

I'd be interested to implement this as well, please send to [email protected]
Thank you Julia!

ian-oz Jun 21, 2024

Hi, my strong preference would be that optional profanity post processing is added on the server side in nova-2, but in the meantime my project ID is 5f685c03-1a7b-4161-b6d8-b362228e96c6. Please email your list to me as well. Thanks Julia. (for others, see here as well - [https://www.freewebheaders.com/full-list-of-bad-words-banned-by-google/]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepgram

Profanity filter for Nova #771

{{title}}

Replies: 3 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Deepgram

Profanity filter for Nova #771

robfig May 21, 2024

Replies: 3 comments · 6 replies

team-deepgram May 21, 2024 Maintainer

jpvajda May 21, 2024 Maintainer

jkroll-deepgram May 24, 2024 Collaborator

jkroll-deepgram Jun 4, 2024 Collaborator

jkroll-deepgram Jun 4, 2024 Collaborator

nccscott Jun 4, 2024

robfig Jun 4, 2024 Author

ian-oz Jun 21, 2024

robfig
May 21, 2024

Replies: 3 comments 6 replies

team-deepgram
May 21, 2024
Maintainer

jpvajda
May 21, 2024
Maintainer

jkroll-deepgram
May 24, 2024
Collaborator

jkroll-deepgram Jun 4, 2024
Collaborator

jkroll-deepgram Jun 4, 2024
Collaborator

robfig Jun 4, 2024
Author