vocalpy.segment.AvaParams

vocalpy.segment.AvaParams#

class vocalpy.segment.AvaParams(nperseg: int = 1024, noverlap: int = 512, min_freq: float = 20000.0, max_freq: float = 125000.0, spect_min_val: float = 0.8, spect_max_val: float = 6.0, thresh_lowest: float = 0.3, thresh_min: float = 0.3, thresh_max: float = 0.35, min_dur: float = 0.015, max_dur: float = 1.0, min_isi_dur: float | None = None, use_softmax_amp: bool = False, temperature: float = 0.01, smoothing_timescale: float = 0.00025, scale: bool = True, scale_val: int | float = 32768, scale_dtype: ~numpy.dtype[~typing.Any] | None | type[~typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | str | tuple[~typing.Any, int] | tuple[~typing.Any, ~typing.SupportsIndex | ~collections.abc.Sequence[~typing.SupportsIndex]] | list[~typing.Any] | ~numpy._typing._dtype_like._DTypeDict | tuple[~typing.Any, ~typing.Any] = <class 'numpy.int16'>)[source]#

Bases: Params

Data class that represents parameters for vocalpy.segment.ava().

Constants in this module are instances of this class that represent parameters used in papers.

Attributes:
npersegint

Number of samples per segment for Short-Time Fourier Transform. Default is 1024.

noverlapint

Number of samples to overlap per segment for Short-Time Fourier Transform. Default is 512.

min_freqint

Minimum frequency. Spectrogram is “cropped” below this frequency (instead of, e.g., bandpass filtering). Default is 30e3.

max_freqint

Maximum frequency. Spectrogram is “cropped” above this frequency (instead of, e.g., bandpass filtering). Default is 110e3.

spect_min_valfloat, optional

Expected minimum value of spectrogram after transforming to the log of the magnitude. Used for a min-max scaling: \((s - s_{min} / (s_{max} - s_{min})\) where spect_min_val is \(s_{min}\). Default is None, in which case the minimum value of the spectrogram is used.

spect_max_valfloat, optional

Expected maximum value of spectrogram after transforming to the log of the magnitude. Used for a min-max scaling: \((s - s_{min} / (s_{max} - s_{min})\) where spect_min_val is \(s_{min}\). Default is None, in which case the maximum value of the spectrogram is used.

thresh_maxfloat

Threshold used to find local maxima.

thresh_minfloat

Threshold used to find local minima, in relation to local maxima. Used to find onsets and offsets of segments.

thresh_lowestfloat

Lowest threshold used to find onsets and offsets of segments.

min_durfloat

Minimum duration of a segment, in seconds.

max_durfloat

Maximum duration of a segment, in seconds.

min_isi_durfloat, optional

Minimum duration of inter-segment intervals, in seconds. If specified, any inter-segment intervals shorter than this value will be removed, and the adjacent segments merged. Default is None.

use_softmax_ampbool

If True, compute summed spectral power from spectrogram with a softmax operation on each column. Default is True.

temperaturefloat

Temperature for softmax. Only used if use_softmax_amp is True.

smoothing_timescalefloat

Timescale to use when smoothing summed spectral power with a gaussian filter. The window size will be dt - smoothing_timescale / samplerate, where dt is the size of a time bin in the spectrogram.

scalebool

If True, scale the sound.data. Default is True. This is needed to replicate the behavior of ava, which assumes the audio data is loaded as 16-bit integers. Since the default for vocalpy.Sound is to load sounds with a numpy dtype of float64, this function defaults to multiplying the sound.data by 2**15, and then casting to the int16 dtype. This replicates the behavior of the ava function, given data with dtype float64. If you have loaded a sound with a dtype of int16, then set this to False.

scale_val

Value to multiply the sound.data by, to scale the data. Default is 2**15. Only used if scale is True. This is needed to replicate the behavior of ava, which assumes the audio data is loaded as 16-bit integers.

scale_dtypenumpy.dtype

Signed integer type, compatible with C short.

Methods

scale_dtype

alias of int16

keys

Examples

>>> jourjineetal2023paths = voc.example('jourjine-et-al-2023')
>>> wav_path = jourjine2023paths[0]
>>> sound = voc.Sound.read(wav_path)
>>> onsets, offsets = voc.segment.ava(sound, **voc.segment.ava.JOURJINEETAL2023)
__init__(nperseg: int = 1024, noverlap: int = 512, min_freq: float = 20000.0, max_freq: float = 125000.0, spect_min_val: float = 0.8, spect_max_val: float = 6.0, thresh_lowest: float = 0.3, thresh_min: float = 0.3, thresh_max: float = 0.35, min_dur: float = 0.015, max_dur: float = 1.0, min_isi_dur: float | None = None, use_softmax_amp: bool = False, temperature: float = 0.01, smoothing_timescale: float = 0.00025, scale: bool = True, scale_val: int | float = 32768, scale_dtype: ~numpy.dtype[~typing.Any] | None | type[~typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | str | tuple[~typing.Any, int] | tuple[~typing.Any, ~typing.SupportsIndex | ~collections.abc.Sequence[~typing.SupportsIndex]] | list[~typing.Any] | ~numpy._typing._dtype_like._DTypeDict | tuple[~typing.Any, ~typing.Any] = <class 'numpy.int16'>) None#

Methods

__init__([nperseg, noverlap, min_freq, ...])

keys()

Attributes

max_dur

max_freq

min_dur

min_freq

min_isi_dur

noverlap

nperseg

scale

scale_val

smoothing_timescale

spect_max_val

spect_min_val

temperature

thresh_lowest

thresh_max

thresh_min

use_softmax_amp

scale_dtype#

alias of int16