vocalpy.segment.meansquared#
- vocalpy.segment.meansquared(sound: Sound, threshold: int = 5000, min_dur: float = 0.02, min_silent_dur: float = 0.002, freq_cutoffs: Iterable = (500, 10000), smooth_win: int = 2, scale: bool = True, scale_val: int | float = 32768, scale_dtype: npt.DTypeLike = <class 'numpy.int16'>) Segments [source]#
Segment audio by thresholding the mean squared signal.
Converts audio to the mean squared of the signal (using
vocalpy.signal.audio.meansquared()
). Then finds all continuous periods in the mean squared signal abovethreshold
. These periods are considered candidate segments. Candidates are removed that have a duration less thanminimum_dur
; then, any two segments with a silent gap between them less thanmin_silent_dur
are merged into a single segment. The segments remaining after this post-processing are returned as onset and offset times in two NumPy arrays.Note that
vocalpy.signal.audio.meansquared()
first filters the audio, withvocalpy.signal.audio.bandpass_filtfilt()
, usingfreq_cutoffs
, and then computes a running average of the squared signal by convolving with a window of sizesmooth_win
milliseconds.- Parameters:
- sound: vocalpy.Sound
An audio signal.
- thresholdint
Value above which mean squared signal is considered part of a segment. Default is 5000.
- min_durfloat
Minimum duration of a segment, in seconds. Default is 0.02, i.e. 20 ms.
- min_silent_durfloat
Minimum duration of silent gap between segments, in seconds. Default is 0.002, i.e. 2 ms.
- freq_cutoffsIterable
Cutoff frequencies for bandpass filter. List or tuple with two elements, default is
(500, 10000)
.- smooth_winint
Size of smoothing window in milliseconds. Default is 2.
- scalebool
If True, scale the
sound.data
. Default is True. This is needed to replicate the behavior ofevsonganaly
, which assumes the audio data is loaded as 16-bit integers. Since the default forvocalpy.Sound
is to load sounds with a numpy dtype of float64, this function defaults to multiplying thesound.data
by 2**15, and then casting to the int16 dtype. This replicates the behavior of theevsonganaly
function, given data with dtype float64. If you have loaded a sound with a dtype of int16, then set this to False.- scale_val
Value to multiply the
sound.data
by, to scale the data. Default is 2**15. Only used ifscale
isTrue
. This is needed to replicate the behavior ofevsonganaly
, which assumes the audio data is loaded as 16-bit integers.- scale_dtypenumpy.dtype
Numpy Dtype to cast
sound.data
to, after scaling. Default isnp.int16
. Only used ifscale
isTrue
. This is needed to replicate the behavior ofevsonganaly
, which assumes the audio data is loaded as 16-bit integers.
- Returns:
- segmentsvocalpy.Segments
Instance of
vocalpy.Segments
representing the segments found.
Examples
>>> sounds = voc.examples('bfsongrepo', return_type='sound') >>> segments = voc.segment.meansquared(sounds[0]) >>> print(segments)