vocalpy.segment.AvaParams#
- class vocalpy.segment.AvaParams(nperseg: int = 1024, noverlap: int = 512, min_freq: float = 20000.0, max_freq: float = 125000.0, spect_min_val: float = 0.8, spect_max_val: float = 6.0, thresh_lowest: float = 0.3, thresh_min: float = 0.3, thresh_max: float = 0.35, min_dur: float = 0.015, max_dur: float = 1.0, min_isi_dur: float | None = None, use_softmax_amp: bool = False, temperature: float = 0.01, smoothing_timescale: float = 0.00025, scale: bool = True, scale_val: int | float = 32768, scale_dtype: ~numpy.dtype[~typing.Any] | None | type[~typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | str | tuple[~typing.Any, int] | tuple[~typing.Any, ~typing.SupportsIndex | ~collections.abc.Sequence[~typing.SupportsIndex]] | list[~typing.Any] | ~numpy._typing._dtype_like._DTypeDict | tuple[~typing.Any, ~typing.Any] = <class 'numpy.int16'>)[source]#
Bases:
Params
Data class that represents parameters for
vocalpy.segment.ava()
.Constants in this module are instances of this class that represent parameters used in papers.
- Attributes:
- npersegint
Number of samples per segment for Short-Time Fourier Transform. Default is 1024.
- noverlapint
Number of samples to overlap per segment for Short-Time Fourier Transform. Default is 512.
- min_freqint
Minimum frequency. Spectrogram is “cropped” below this frequency (instead of, e.g., bandpass filtering). Default is 30e3.
- max_freqint
Maximum frequency. Spectrogram is “cropped” above this frequency (instead of, e.g., bandpass filtering). Default is 110e3.
- spect_min_valfloat, optional
Expected minimum value of spectrogram after transforming to the log of the magnitude. Used for a min-max scaling: \((s - s_{min} / (s_{max} - s_{min})\) where
spect_min_val
is \(s_{min}\). Default is None, in which case the minimum value of the spectrogram is used.- spect_max_valfloat, optional
Expected maximum value of spectrogram after transforming to the log of the magnitude. Used for a min-max scaling: \((s - s_{min} / (s_{max} - s_{min})\) where
spect_min_val
is \(s_{min}\). Default is None, in which case the maximum value of the spectrogram is used.- thresh_maxfloat
Threshold used to find local maxima.
- thresh_minfloat
Threshold used to find local minima, in relation to local maxima. Used to find onsets and offsets of segments.
- thresh_lowestfloat
Lowest threshold used to find onsets and offsets of segments.
- min_durfloat
Minimum duration of a segment, in seconds.
- max_durfloat
Maximum duration of a segment, in seconds.
- min_isi_durfloat, optional
Minimum duration of inter-segment intervals, in seconds. If specified, any inter-segment intervals shorter than this value will be removed, and the adjacent segments merged. Default is None.
- use_softmax_ampbool
If True, compute summed spectral power from spectrogram with a softmax operation on each column. Default is True.
- temperaturefloat
Temperature for softmax. Only used if
use_softmax_amp
is True.- smoothing_timescalefloat
Timescale to use when smoothing summed spectral power with a gaussian filter. The window size will be
dt - smoothing_timescale / samplerate
, wheredt
is the size of a time bin in the spectrogram.- scalebool
If True, scale the
sound.data
. Default is True. This is needed to replicate the behavior ofava
, which assumes the audio data is loaded as 16-bit integers. Since the default forvocalpy.Sound
is to load sounds with a numpy dtype of float64, this function defaults to multiplying thesound.data
by 2**15, and then casting to the int16 dtype. This replicates the behavior of theava
function, given data with dtype float64. If you have loaded a sound with a dtype of int16, then set this to False.- scale_val
Value to multiply the
sound.data
by, to scale the data. Default is 2**15. Only used ifscale
isTrue
. This is needed to replicate the behavior ofava
, which assumes the audio data is loaded as 16-bit integers.scale_dtype
numpy.dtypeSigned integer type, compatible with C
short
.
Methods
alias of
int16
keys
Examples
>>> jourjineetal2023paths = voc.example('jourjine-et-al-2023') >>> wav_path = jourjine2023paths[0] >>> sound = voc.Sound.read(wav_path) >>> onsets, offsets = voc.segment.ava(sound, **voc.segment.ava.JOURJINEETAL2023)
- __init__(nperseg: int = 1024, noverlap: int = 512, min_freq: float = 20000.0, max_freq: float = 125000.0, spect_min_val: float = 0.8, spect_max_val: float = 6.0, thresh_lowest: float = 0.3, thresh_min: float = 0.3, thresh_max: float = 0.35, min_dur: float = 0.015, max_dur: float = 1.0, min_isi_dur: float | None = None, use_softmax_amp: bool = False, temperature: float = 0.01, smoothing_timescale: float = 0.00025, scale: bool = True, scale_val: int | float = 32768, scale_dtype: ~numpy.dtype[~typing.Any] | None | type[~typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | str | tuple[~typing.Any, int] | tuple[~typing.Any, ~typing.SupportsIndex | ~collections.abc.Sequence[~typing.SupportsIndex]] | list[~typing.Any] | ~numpy._typing._dtype_like._DTypeDict | tuple[~typing.Any, ~typing.Any] = <class 'numpy.int16'>) None #
Methods
__init__
([nperseg, noverlap, min_freq, ...])keys
()Attributes
max_dur
max_freq
min_dur
min_freq
min_isi_dur
noverlap
nperseg
scale
scale_val
smoothing_timescale
spect_max_val
spect_min_val
temperature
thresh_lowest
thresh_max
thresh_min
use_softmax_amp
- scale_dtype#
alias of
int16