Is there a way to obscure speech recording so there is no way to play it and get something intelligible, but still keep it useful for machine learning? For my project I have to collect data in uncontrolled environment, and I would like to do it without accidentally storing sensitive information.

It seems to be an uncommon problem, and I haven’t found much. I am currently using spectrograms to extract features. For what I have found, making a spectrogram from a soundwave uses STFT and doesn’t store phase information, so there is not enough information to perform the inverse transformation. Do I understand this correctly? What are other ways to do it?

  • ginger_turmeric@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    maybe define some audio noising function. Then apply the noising function to your training data, and train your network to output the denoised version?