vocalpy.metrics.segmentation.ir.precision_recall_fscore#
- vocalpy.metrics.segmentation.ir.precision_recall_fscore(hypothesis: ndarray[tuple[Any, ...], dtype[_ScalarT]], reference: ndarray[tuple[Any, ...], dtype[_ScalarT]], metric: str, tolerance: float | int | None = None, decimals: int | bool | None = None) tuple[float, int, IRMetricData][source]#
Helper function that computes precision, recall, and the F-score.
Since all these metrics require computing the number of true positives, and F-score is a combination of precision and recall, we rely on this helper function to compute them. You can compute each directly without needing the
metricargument that this function requires by calling the appropriate function:precision(),recall(), andfscore(). See docstrings of those functions for definitions of the metrics in terms of segmentation algorithms.Precision, recall, and F-score are computed using hits found with
vocalpy.metrics.segmentation._ir_helper.find_hits(). See docstring of that function for more detail on how hits are computed.Both
hypothesisandreferencemust be 1-dimensional arrays of non-negative, strictly increasing values. If you have two arraysonsetsandoffsets, you can concatenate those into a single valid array of boundary times usingconcat_starts_and_stops()that you can then pass to this function.- Parameters:
- hypothesisnumpy.ndarray
Boundaries, e.g., onsets or offsets of segments, as computed by some method.
- referencenumpy.ndarray
Ground truth boundaries that the hypothesized boundaries
hypothesisare compared to.- metricstr
The name of the metric to compute. One of:
{"precision", "recall", "fscore"}.- tolerancefloat or int
Tolerance, in seconds. Elements in
hypothesisare considered a true positive if they are within a time interval around any reference boundary \(t_0\) inreferenceplus or minus thetolerance, i.e., if a hypothesized boundary \(t_h\) is within the interval \(t_0 - \Delta t < t < t_0 + \Delta t\). Default is None, in which case it is set to0(either float or int, depending on the dtype ofhypothesisandreference). See notes for more detail.- decimals: int
The number of decimal places to round both
hypothesisandreferenceto, usingnumpy.round(). This mitigates inflated error rates due to floating point error. Rounding is only applied if bothhypothesisandreferenceare floating point values. To avoid rounding, e.g. to compute strict precision and recall, pass in the valueFalse. Default is 3, which assumes that the values are in seconds and should be rounded to milliseconds.
- Returns:
- metric_valuefloat
Value for
metric.- n_tpint
The number of true positives.
- metric_dataIRMetricData
Instance of
IRMetricDatawith indices of hits in bothhypothesisandreference, and the absolute difference between times inhypothesisandreferencefor the hits.
Notes
The addition of a tolerance parameter is based on [1]. This is also sometimes known as a “collar” [2] or “forgiveness collar” [3]. The value for the tolerance can be determined by visual inspection of the distribution; see for example [4].
References
[1]Kemp, T., Schmidt, M., Whypphal, M., & Waibel, A. (2000, June). Strategies for automatic segmentation of audio data. In 2000 ieee international conference on acoustics, speech, and signal processing. proceedings (cat. no. 00ch37100) (Vol. 3, pp. 1423-1426). IEEE.
[2]Jordán, P. G., & Giménez, A. O. (2023). Advances in Binary and Multiclass Sound Segmentation with Deep Learning Techniques.
[3]NIST. (2009). The 2009 (RT-09) Rich Transcription Meeting Recognition Evaluation Plan. <https://web.archive.org/web/20100606041157if_/http://www.itl.nist.gov/iad/mig/thyps/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf>
[4]Du, P., & Troyer, T. W. (2006). A segmentation algorithm for zebra finch song at the note level. Neurocomputing, 69(10-12), 1375-1379.