I'm developing an audio classification project with four classes using an RNN with LSTM. While the model performs well in classifying audio in silent environments, it struggles with noisy situations like crowds or disturbances. How can I train the model more robustly to accurately predict classifications in such challenging environments?