vocalpy.segment.ava

vocalpy.segment.ava#

vocalpy.segment.ava(sound: Sound, nperseg: int = 1024, noverlap: int = 512, min_freq: int = 30000.0, max_freq: int = 110000.0, spect_min_val: float | None = None, spect_max_val: float | None = None, thresh_lowest: float = 0.1, thresh_min: float = 0.2, thresh_max: float = 0.3, min_dur: float = 0.03, max_dur: float = 0.2, min_isi_dur: float | None = None, use_softmax_amp: bool = True, temperature: float = 0.5, smoothing_timescale: float = 0.007, scale: bool = True, scale_val: int | float = 32768, scale_dtype: npt.DTypeLike = <class 'numpy.int16'>) → Segments[source]#

Find segments in audio, using algorithm from ava package.

Segments audio by generating a spectrogram from it, summing power across frequencies, and then thresholding this summed spectral power as if it were an amplitude trace.

The spectral power is segmented with three thresholds, thresh_lowest, thresh_min, and thresh_max, where thresh_lowest <= thresh_min <= thresh_max. The segmenting algorithm works as follows: first detect all local maxima that exceed thresh_max. Then for each local maximum, find onsets and offsets. An offset is detected wherever a local maxima is followed by a subsequent local minimum in the summed spectral power less than thresh_min, or when the power is less than thresh_lowest. Onsets are located in the same way, by looking for a preceding local minimum less than thresh_min, or any value less than thresh_lowest.

Parameters:

soundvocalpy.Sound: Sound loaded from an audio file.
npersegint: Number of samples per segment for Short-Time Fourier Transform. Default is 1024.
noverlapint: Number of samples to overlap per segment for Short-Time Fourier Transform. Default is 512.
min_freqint: Minimum frequency. Spectrogram is “cropped” below this frequency (instead of, e.g., bandpass filtering). Default is 30e3.
max_freqint: Maximum frequency. Spectrogram is “cropped” above this frequency (instead of, e.g., bandpass filtering). Default is 110e3.
spect_min_valfloat, optional: Expected minimum value of spectrogram after transforming to the log of the magnitude. Used for a min-max scaling: \((s - s_{min} / (s_{max} - s_{min})\) where spect_min_val is \(s_{min}\). Default is None, in which case the minimum value of the spectrogram is used.
spect_max_valfloat, optional: Expected maximum value of spectrogram after transforming to the log of the magnitude. Used for a min-max scaling: \((s - s_{min} / (s_{max} - s_{min})\) where spect_min_val is \(s_{min}\). Default is None, in which case the maximum value of the spectrogram is used.
thresh_maxfloat: Threshold used to find local maxima.
thresh_minfloat: Threshold used to find local minima, in relation to local maxima. Used to find onsets and offsets of segments.
thresh_lowestfloat: Lowest threshold used to find onsets and offsets of segments.
min_durfloat: Minimum duration of a segment, in seconds.
max_durfloat: Maximum duration of a segment, in seconds.
min_isi_durfloat, optional: Minimum duration of inter-segment intervals, in seconds. If specified, any inter-segment intervals shorter than this value will be removed, and the adjacent segments merged. Default is None.
use_softmax_ampbool: If True, compute summed spectral power from spectrogram with a softmax operation on each column. Default is True.
temperaturefloat: Temperature for softmax. Only used if use_softmax_amp is True.
smoothing_timescalefloat: Timescale to use when smoothing summed spectral power with a gaussian filter. The window size will be dt - smoothing_timescale / samplerate, where dt is the size of a time bin in the spectrogram.
scalebool: If True, scale the sound.data. Default is True. This is needed to replicate the behavior of ava, which assumes the audio data is loaded as 16-bit integers. Since the default for vocalpy.Sound is to load sounds with a numpy dtype of float64, this function defaults to multiplying the sound.data by 2**15, and then casting to the int16 dtype. This replicates the behavior of the ava function, given data with dtype float64. If you have loaded a sound with a dtype of int16, then set this to False.
scale_val: Value to multiply the sound.data by, to scale the data. Default is 2**15. Only used if scale is True. This is needed to replicate the behavior of ava, which assumes the audio data is loaded as 16-bit integers.
scale_dtypenumpy.dtype: Numpy Dtype to cast sound.data to, after scaling. Default is np.int16. Only used if scale is True. This is needed to replicate the behavior of ava, which assumes the audio data is loaded as 16-bit integers.

Returns:

segmentsvocalpy.Segments: Instance of vocalpy.Segments representing the segments found.

Notes

Code is adapted from [2]. Default parameters are taken from example script here: pearsonlab/autoencoded-vocal-analysis Note that example script suggests tuning these parameters using functionality built into it, that we do not replicate here.

Versions of this algorithm were also used to segment rodent vocalizations in [4] (see code in [5]) and [6] (see code in [7]).

References

[1]

Goffinet, J., Brudner, S., Mooney, R., & Pearson, J. (2021). Low-dimensional learned feature spaces quantify individual and group differences in vocal repertoires. eLife, 10:e67855. https://doi.org/10.7554/eLife.67855

[2]

pearsonlab/autoencoded-vocal-analysis

[3]

Goffinet, J., Brudner, S., Mooney, R., & Pearson, J. (2021). Data from: Low-dimensional learned feature spaces quantify individual and group differences in vocal repertoires. Duke Research Data Repository. https://doi.org/10.7924/r4gq6zn8w

[4]

Nicholas Jourjine, Maya L. Woolfolk, Juan I. Sanguinetti-Scheck, John E. Sabatini, Sade McFadden, Anna K. Lindholm, Hopi E. Hoekstra, Two pup vocalization types are genetically and functionally separable in deer mice, Current Biology, 2023 https://doi.org/10.1016/j.cub.2023.02.045

[5]

nickjourjine/peromyscus-pup-vocal-evolution

[6]

Peterson, Ralph Emilio, Aman Choudhri, Catalin MItelut, Aramis Tanelus, Athena Capo-Battaglia, Alex H. Williams, David M. Schneider, and Dan H. Sanes. “Unsupervised discovery of family specific vocal usage in the Mongolian gerbil.” bioRxiv (2023): 2023-03.

[7]

ralphpeterson/gerbil-vocal-dialects

Examples

>>> jourjineetal2023paths = voc.example('jourjine-et-al-2023')
>>> wav_path = jourjineetal2023paths[0]
>>> sound = voc.Sound.read(wav_path)
>>> params = {**voc.segment.ava.JOURJINEETAL2023}
>>> del params['min_isi_dur']
>>> segments = voc.segment.ava(sound, **params)
>>> spect = voc.spectrogram(sound)
>>> rows = 3; cols = 4
>>> import matplotlib.pyplot as plt
>>> fig, ax_arr = plt.subplots(rows, cols)
>>> start_inds, stop_inds = segments.start_inds, segments.stop_inds
>>> ax_to_use = ax_arr.ravel()[:start_inds.shape[0]]
>>> for start_ind, stop_ind, ax in zip(start_inds, stop_inds, ax_to_use):
...     data = sound.data[:, start_ind:stop_ind]
...     newsound = voc.Sound(data=data, samplerate=sound.samplerate)
...     spect = voc.spectrogram(newsound)
...     ax.pcolormesh(spect.times, spect.frequencies, np.squeeze(spect.data))
>>> for ax in ax_arr.ravel()[:start_inds.shape[0]]:
...     ax.set_axis_off()
>>> for ax in ax_arr.ravel()[start_inds.shape[0]:]:
...     ax.remove()

vocalpy.segment.ava

Contents

vocalpy.segment.ava#