vocalpy.segment.meansquared

Contents

vocalpy.segment.meansquared#

vocalpy.segment.meansquared(sound: Sound, threshold: int = 5000, min_dur: float = 0.02, min_silent_dur: float = 0.002, freq_cutoffs: Iterable = (500, 10000), smooth_win: int = 2, scale: bool = True, scale_val: int | float = 32768, scale_dtype: npt.DTypeLike = <class 'numpy.int16'>) Segments[source]#

Segment audio by thresholding the mean squared signal.

Converts audio to the mean squared of the signal (using vocalpy.signal.audio.meansquared()). Then finds all continuous periods in the mean squared signal above threshold. These periods are considered candidate segments. Candidates are removed that have a duration less than minimum_dur; then, any two segments with a silent gap between them less than min_silent_dur are merged into a single segment. The segments remaining after this post-processing are returned as onset and offset times in two NumPy arrays.

Note that vocalpy.signal.audio.meansquared() first filters the audio, with vocalpy.signal.audio.bandpass_filtfilt(), using freq_cutoffs, and then computes a running average of the squared signal by convolving with a window of size smooth_win milliseconds.

Parameters:
sound: vocalpy.Sound

An audio signal.

thresholdint

Value above which mean squared signal is considered part of a segment. Default is 5000.

min_durfloat

Minimum duration of a segment, in seconds. Default is 0.02, i.e. 20 ms.

min_silent_durfloat

Minimum duration of silent gap between segments, in seconds. Default is 0.002, i.e. 2 ms.

freq_cutoffsIterable

Cutoff frequencies for bandpass filter. List or tuple with two elements, default is (500, 10000).

smooth_winint

Size of smoothing window in milliseconds. Default is 2.

scalebool

If True, scale the sound.data. Default is True. This is needed to replicate the behavior of evsonganaly, which assumes the audio data is loaded as 16-bit integers. Since the default for vocalpy.Sound is to load sounds with a numpy dtype of float64, this function defaults to multiplying the sound.data by 2**15, and then casting to the int16 dtype. This replicates the behavior of the evsonganaly function, given data with dtype float64. If you have loaded a sound with a dtype of int16, then set this to False.

scale_val

Value to multiply the sound.data by, to scale the data. Default is 2**15. Only used if scale is True. This is needed to replicate the behavior of evsonganaly, which assumes the audio data is loaded as 16-bit integers.

scale_dtypenumpy.dtype

Numpy Dtype to cast sound.data to, after scaling. Default is np.int16. Only used if scale is True. This is needed to replicate the behavior of evsonganaly, which assumes the audio data is loaded as 16-bit integers.

Returns:
segmentsvocalpy.Segments

Instance of vocalpy.Segments representing the segments found.

Examples

>>> sounds = voc.examples('bfsongrepo', return_type='sound')
>>> segments = voc.segment.meansquared(sounds[0])
>>> print(segments)