Is there a way to obscure speech recording so there is no way to play it and get something intelligible, but still keep it useful for machine learning? For my project I have to collect data in uncontrolled environment, and I would like to do it without accidentally storing sensitive information.

It seems to be an uncommon problem, and I haven’t found much. I am currently using spectrograms to extract features. For what I have found, making a spectrogram from a soundwave uses STFT and doesn’t store phase information, so there is not enough information to perform the inverse transformation. Do I understand this correctly? What are other ways to do it?

  • RedwoodsCool@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Is there a way to obscure speech recording so there is no way to play it and get something intelligible, but still keep it useful for machine learning?

    Depends how you define “obscure”, “useful”, etc.

    Maybe feed it directly to your machine then trash it? The info will still be vulnerable when in transit or in memory though.

    • vladisser@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I would like to store data for experiments, unfortunately just using and momentarily discarding it defeats the reason for collecting it.

      • RedwoodsCool@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        The only data transformation which wouldn’t be trivial to reverse is encryption. But you still need to trust yourself, the machine, the network, and everything in between to not leak the key or the data.

        If you “obscure” the data enough, then it won’t be useful. There’s no solution to your problem as far as I can tell.

  • ginger_turmeric@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    maybe define some audio noising function. Then apply the noising function to your training data, and train your network to output the denoised version?