We all would love to have better audio for our podcasts, videos, and other content. There are already a bunch of AI tools that can help. ClearerVoice-Studio by Tongyi Lab is an open source voice processing framework with speech enhancement, speech separation, and audio-video speaker extraction.
This tool can remove background noise and separate target speech from complex audio mixes. It has pre-trained models fine-tuned on high quality datasets. ClearVoice supports 16kHz to 48kHz audio outputs. As the above video shows, it can even extract voices from multi-speaker videos. You can already try this tool Hugging Face.