vocalpy.metrics.segmentation.ir.precision_recall_fscore

vocalpy.metrics.segmentation.ir.precision_recall_fscore#

vocalpy.metrics.segmentation.ir.precision_recall_fscore(hypothesis: ndarray[tuple[Any, ...], dtype[_ScalarT]], reference: ndarray[tuple[Any, ...], dtype[_ScalarT]], metric: str, tolerance: float | int | None = None, decimals: int | bool | None = None) tuple[float, int, IRMetricData][source]#

Helper function that computes precision, recall, and the F-score.

Since all these metrics require computing the number of true positives, and F-score is a combination of precision and recall, we rely on this helper function to compute them. You can compute each directly without needing the metric argument that this function requires by calling the appropriate function: precision(), recall(), and fscore(). See docstrings of those functions for definitions of the metrics in terms of segmentation algorithms.

Precision, recall, and F-score are computed using hits found with vocalpy.metrics.segmentation._ir_helper.find_hits(). See docstring of that function for more detail on how hits are computed.

Both hypothesis and reference must be 1-dimensional arrays of non-negative, strictly increasing values. If you have two arrays onsets and offsets, you can concatenate those into a single valid array of boundary times using concat_starts_and_stops() that you can then pass to this function.

Parameters:
hypothesisnumpy.ndarray

Boundaries, e.g., onsets or offsets of segments, as computed by some method.

referencenumpy.ndarray

Ground truth boundaries that the hypothesized boundaries hypothesis are compared to.

metricstr

The name of the metric to compute. One of: {"precision", "recall", "fscore"}.

tolerancefloat or int

Tolerance, in seconds. Elements in hypothesis are considered a true positive if they are within a time interval around any reference boundary \(t_0\) in reference plus or minus the tolerance, i.e., if a hypothesized boundary \(t_h\) is within the interval \(t_0 - \Delta t < t < t_0 + \Delta t\). Default is None, in which case it is set to 0 (either float or int, depending on the dtype of hypothesis and reference). See notes for more detail.

decimals: int

The number of decimal places to round both hypothesis and reference to, using numpy.round(). This mitigates inflated error rates due to floating point error. Rounding is only applied if both hypothesis and reference are floating point values. To avoid rounding, e.g. to compute strict precision and recall, pass in the value False. Default is 3, which assumes that the values are in seconds and should be rounded to milliseconds.

Returns:
metric_valuefloat

Value for metric.

n_tpint

The number of true positives.

metric_dataIRMetricData

Instance of IRMetricData with indices of hits in both hypothesis and reference, and the absolute difference between times in hypothesis and reference for the hits.

Notes

The addition of a tolerance parameter is based on [1]. This is also sometimes known as a “collar” [2] or “forgiveness collar” [3]. The value for the tolerance can be determined by visual inspection of the distribution; see for example [4].

References

[1]

Kemp, T., Schmidt, M., Whypphal, M., & Waibel, A. (2000, June). Strategies for automatic segmentation of audio data. In 2000 ieee international conference on acoustics, speech, and signal processing. proceedings (cat. no. 00ch37100) (Vol. 3, pp. 1423-1426). IEEE.

[2]

Jordán, P. G., & Giménez, A. O. (2023). Advances in Binary and Multiclass Sound Segmentation with Deep Learning Techniques.

[3]

NIST. (2009). The 2009 (RT-09) Rich Transcription Meeting Recognition Evaluation Plan. <https://web.archive.org/web/20100606041157if_/http://www.itl.nist.gov/iad/mig/thyps/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf>

[4]

Du, P., & Troyer, T. W. (2006). A segmentation algorithm for zebra finch song at the note level. Neurocomputing, 69(10-12), 1375-1379.