You will probably get better answers at r/LearnMachineLearning. You need some basic neural networks courses for beginning. Then e.g. HuggingFace’s audio processing course, it’s short and high-level, but will be a nice intro. In general you will focus on convolutional networks (CNNs) and processing audio either as 1D signals or 2D images (spectrograms).
r/datasets. Also, I would be very surprised if such dataset has even been even gathered and made available publicly. This sounds like typical long-term project with EU humanities scientific grants (I have personally worked with a similar one).