vocalpy.segment.ava#
- vocalpy.segment.ava(sound: Sound, nperseg: int = 1024, noverlap: int = 512, min_freq: int = 30000.0, max_freq: int = 110000.0, spect_min_val: float | None = None, spect_max_val: float | None = None, thresh_lowest: float = 0.1, thresh_min: float = 0.2, thresh_max: float = 0.3, min_dur: float = 0.03, max_dur: float = 0.2, min_isi_dur: float | None = None, use_softmax_amp: bool = True, temperature: float = 0.5, smoothing_timescale: float = 0.007, scale: bool = True, scale_val: int | float = 32768, scale_dtype: npt.DTypeLike = <class 'numpy.int16'>) Segments[source]#
Find segments in audio, using algorithm from
avapackage.Segments audio by generating a spectrogram from it, summing power across frequencies, and then thresholding this summed spectral power as if it were an amplitude trace.
The spectral power is segmented with three thresholds,
thresh_lowest,thresh_min, andthresh_max, wherethresh_lowest <= thresh_min <= thresh_max. The segmenting algorithm works as follows: first detect all local maxima that exceedthresh_max. Then for each local maximum, find onsets and offsets. An offset is detected wherever a local maxima is followed by a subsequent local minimum in the summed spectral power less thanthresh_min, or when the power is less thanthresh_lowest. Onsets are located in the same way, by looking for a preceding local minimum less thanthresh_min, or any value less thanthresh_lowest.- Parameters:
- soundvocalpy.Sound
Sound loaded from an audio file.
- npersegint
Number of samples per segment for Short-Time Fourier Transform. Default is 1024.
- noverlapint
Number of samples to overlap per segment for Short-Time Fourier Transform. Default is 512.
- min_freqint
Minimum frequency. Spectrogram is “cropped” below this frequency (instead of, e.g., bandpass filtering). Default is 30e3.
- max_freqint
Maximum frequency. Spectrogram is “cropped” above this frequency (instead of, e.g., bandpass filtering). Default is 110e3.
- spect_min_valfloat, optional
Expected minimum value of spectrogram after transforming to the log of the magnitude. Used for a min-max scaling: \((s - s_{min} / (s_{max} - s_{min})\) where
spect_min_valis \(s_{min}\). Default is None, in which case the minimum value of the spectrogram is used.- spect_max_valfloat, optional
Expected maximum value of spectrogram after transforming to the log of the magnitude. Used for a min-max scaling: \((s - s_{min} / (s_{max} - s_{min})\) where
spect_min_valis \(s_{min}\). Default is None, in which case the maximum value of the spectrogram is used.- thresh_maxfloat
Threshold used to find local maxima.
- thresh_minfloat
Threshold used to find local minima, in relation to local maxima. Used to find onsets and offsets of segments.
- thresh_lowestfloat
Lowest threshold used to find onsets and offsets of segments.
- min_durfloat
Minimum duration of a segment, in seconds.
- max_durfloat
Maximum duration of a segment, in seconds.
- min_isi_durfloat, optional
Minimum duration of inter-segment intervals, in seconds. If specified, any inter-segment intervals shorter than this value will be removed, and the adjacent segments merged. Default is None.
- use_softmax_ampbool
If True, compute summed spectral power from spectrogram with a softmax operation on each column. Default is True.
- temperaturefloat
Temperature for softmax. Only used if
use_softmax_ampis True.- smoothing_timescalefloat
Timescale to use when smoothing summed spectral power with a gaussian filter. The window size will be
dt - smoothing_timescale / samplerate, wheredtis the size of a time bin in the spectrogram.- scalebool
If True, scale the
sound.data. Default is True. This is needed to replicate the behavior ofava, which assumes the audio data is loaded as 16-bit integers. Since the default forvocalpy.Soundis to load sounds with a numpy dtype of float64, this function defaults to multiplying thesound.databy 2**15, and then casting to the int16 dtype. This replicates the behavior of theavafunction, given data with dtype float64. If you have loaded a sound with a dtype of int16, then set this to False.- scale_val
Value to multiply the
sound.databy, to scale the data. Default is 2**15. Only used ifscaleisTrue. This is needed to replicate the behavior ofava, which assumes the audio data is loaded as 16-bit integers.- scale_dtypenumpy.dtype
Numpy Dtype to cast
sound.datato, after scaling. Default isnp.int16. Only used ifscaleisTrue. This is needed to replicate the behavior ofava, which assumes the audio data is loaded as 16-bit integers.
- Returns:
- segmentsvocalpy.Segments
Instance of
vocalpy.Segmentsrepresenting the segments found.
Notes
Code is adapted from [2]. Default parameters are taken from example script here: pearsonlab/autoencoded-vocal-analysis Note that example script suggests tuning these parameters using functionality built into it, that we do not replicate here.
Versions of this algorithm were also used to segment rodent vocalizations in [4] (see code in [5]) and [6] (see code in [7]).
References
[1]Goffinet, J., Brudner, S., Mooney, R., & Pearson, J. (2021). Low-dimensional learned feature spaces quantify individual and group differences in vocal repertoires. eLife, 10:e67855. https://doi.org/10.7554/eLife.67855
[3]Goffinet, J., Brudner, S., Mooney, R., & Pearson, J. (2021). Data from: Low-dimensional learned feature spaces quantify individual and group differences in vocal repertoires. Duke Research Data Repository. https://doi.org/10.7924/r4gq6zn8w
[4]Nicholas Jourjine, Maya L. Woolfolk, Juan I. Sanguinetti-Scheck, John E. Sabatini, Sade McFadden, Anna K. Lindholm, Hopi E. Hoekstra, Two pup vocalization types are genetically and functionally separable in deer mice, Current Biology, 2023 https://doi.org/10.1016/j.cub.2023.02.045
[6]Peterson, Ralph Emilio, Aman Choudhri, Catalin MItelut, Aramis Tanelus, Athena Capo-Battaglia, Alex H. Williams, David M. Schneider, and Dan H. Sanes. “Unsupervised discovery of family specific vocal usage in the Mongolian gerbil.” bioRxiv (2023): 2023-03.
Examples
>>> jourjineetal2023paths = voc.example('jourjine-et-al-2023') >>> wav_path = jourjineetal2023paths[0] >>> sound = voc.Sound.read(wav_path) >>> params = {**voc.segment.ava.JOURJINEETAL2023} >>> del params['min_isi_dur'] >>> segments = voc.segment.ava(sound, **params) >>> spect = voc.spectrogram(sound) >>> rows = 3; cols = 4 >>> import matplotlib.pyplot as plt >>> fig, ax_arr = plt.subplots(rows, cols) >>> start_inds, stop_inds = segments.start_inds, segments.stop_inds >>> ax_to_use = ax_arr.ravel()[:start_inds.shape[0]] >>> for start_ind, stop_ind, ax in zip(start_inds, stop_inds, ax_to_use): ... data = sound.data[:, start_ind:stop_ind] ... newsound = voc.Sound(data=data, samplerate=sound.samplerate) ... spect = voc.spectrogram(newsound) ... ax.pcolormesh(spect.times, spect.frequencies, np.squeeze(spect.data)) >>> for ax in ax_arr.ravel()[:start_inds.shape[0]]: ... ax.set_axis_off() >>> for ax in ax_arr.ravel()[start_inds.shape[0]:]: ... ax.remove()