vocalpy.segment.meansquared

vocalpy.segment.meansquared#

vocalpy.segment.meansquared(sound: Sound, threshold: int = 5000, min_dur: float = 0.02, min_silent_dur: float = 0.002, freq_cutoffs: Iterable = (500, 10000), smooth_win: int = 2, scale: bool = True, scale_val: int | float = 32768, scale_dtype: npt.DTypeLike = <class 'numpy.int16'>) → Segments[source]#

Segment audio by thresholding the mean squared signal.

Converts audio to the mean squared of the signal (using vocalpy.signal.audio.meansquared()). Then finds all continuous periods in the mean squared signal above threshold. These periods are considered candidate segments. Candidates are removed that have a duration less than minimum_dur; then, any two segments with a silent gap between them less than min_silent_dur are merged into a single segment. The segments remaining after this post-processing are returned as onset and offset times in two NumPy arrays.

Note that vocalpy.signal.audio.meansquared() first filters the audio, with vocalpy.signal.audio.bandpass_filtfilt(), using freq_cutoffs, and then computes a running average of the squared signal by convolving with a window of size smooth_win milliseconds.

Parameters:

sound: vocalpy.Sound: An audio signal.
thresholdint: Value above which mean squared signal is considered part of a segment. Default is 5000.
min_durfloat: Minimum duration of a segment, in seconds. Default is 0.02, i.e. 20 ms.
min_silent_durfloat: Minimum duration of silent gap between segments, in seconds. Default is 0.002, i.e. 2 ms.
freq_cutoffsIterable: Cutoff frequencies for bandpass filter. List or tuple with two elements, default is (500, 10000).
smooth_winint: Size of smoothing window in milliseconds. Default is 2.
scalebool: If True, scale the sound.data. Default is True. This is needed to replicate the behavior of evsonganaly, which assumes the audio data is loaded as 16-bit integers. Since the default for vocalpy.Sound is to load sounds with a numpy dtype of float64, this function defaults to multiplying the sound.data by 2**15, and then casting to the int16 dtype. This replicates the behavior of the evsonganaly function, given data with dtype float64. If you have loaded a sound with a dtype of int16, then set this to False.
scale_val: Value to multiply the sound.data by, to scale the data. Default is 2**15. Only used if scale is True. This is needed to replicate the behavior of evsonganaly, which assumes the audio data is loaded as 16-bit integers.
scale_dtypenumpy.dtype: Numpy Dtype to cast sound.data to, after scaling. Default is np.int16. Only used if scale is True. This is needed to replicate the behavior of evsonganaly, which assumes the audio data is loaded as 16-bit integers.

Returns:

segmentsvocalpy.Segments: Instance of vocalpy.Segments representing the segments found.

Examples

>>> sounds = voc.examples('bfsongrepo', return_type='sound')
>>> segments = voc.segment.meansquared(sounds[0])
>>> print(segments)

vocalpy.segment.meansquared

Contents

vocalpy.segment.meansquared#