Recall, Precision
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
An Empirical Analysis of Task-Induced Encoder Bias in Fr\'echet Audio Distance
Announce Type: replace-cross Abstract: Fr\'echet Audio Distance (FAD) is the de facto standard for evaluating text-to-audio generation, yet its scores depend on the underlying encoder's embedding space. An encoder's training task dictates which acoustic features are preserved or discarded, causing FAD to inherit systematic task-induced biases. We decompose evaluation into Recall, Precision, and Alignment (split into semantic and structural dimensions), using log-scale normalization for fair...
Agreement Metrics for LLM-as-Judge Evaluation: What to Report and Why
arXiv:2606.00093v1 Announce Type: cross Abstract: Validating an LLM judge against human annotations usually means reporting several agreement statistics: accuracy, precision, recall, $F_1$, Cohen's $\kappa$, and one or more rank correlations. A survey of 24 recent LLM-as-judge papers finds metric choice entangled with the judgment scale, tie handling, invalid outputs, and abstention handling, and those choices rarely stated. For binary criteria -- the common case in rubric-based evaluation,...
On Choosing the $\mu$ Parameter in Gaussian Differential Privacy
Announce Type: new Abstract: Recent work argues for using Gaussian differential privacy (GDP) to report the privacy guarantees in privacy-preserving machine learning. We provide principled mappings from pure-DP $\varepsilon$ to GDP $\mu$ by matching the worst-case success of a strong-adversary membership inference attack in terms of three metrics: multiplicative advantage at fixed FPR, precision at fixed recall, and the standard privacy profile.
Early Prediction of Liver Cirrhosis Up to Two Years in Advance: A Machine Learning Study Benchmarking Against the FIB-4 and APRI Scores
Announce Type: replace Abstract: Objective: Develop and evaluate machine learning (ML) models for predicting incident liver cirrhosis (LC) one and two years prior to diagnosis using routinely collected electronic health record (EHR) data and benchmark their performance against the FIB-4 and APRI clinical scores. Methods: We conducted a retrospective cohort study using de-identified EHR data from a large academic health system. XGBoost models were developed for 1- and 2-year prediction...
Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity Matching
arXiv:2512.03553v3 Announce Type: replace Abstract: Content moderation remains a critical yet challenging task for large-scale user-generated video platforms, especially in livestreaming environments where moderation must be timely, multimodal, and robust to evolving forms of unwanted content. We present a hybrid moderation framework deployed at production scale that combines supervised classification for known violations with reference-based similarity matching for novel or subtle cases....
Listening to the Workforce: Measuring Construction Worker Safety Attitudes from Social Media Discourse Using LLMs
Announce Type: new Abstract: Worker safety attitudes are key determinants of whether protective practices are applied or bypassed on construction sites. Yet measuring them at scale has remained out of reach. Safety attitudes are multidimensional, vary across topics, and surface most candidly in workers' own conversations.
Anomaly Detection for Electro-Hydrostatic Actuators using LSTM Autoencoder
Announce Type: new Abstract: Electro-Hydrostatic Actuators (EHAs) are widely used in aerospace and industrial systems, where timely detection of sensor anomalies is essential to ensure safe and reliable operation. However, the large volume and high sampling frequency of EHA sensor data pose challenges for accurate and efficient anomaly detection. Conventional statistical and classical machine-learning methods such as Z-score, Interquartile Range (IQR), Median Absolute Deviation (MAD),...
From Custom Logic to APIs: Understanding and Recommending API Replacement Refactorings
Announce Type: new Abstract: Software refactoring is essential for maintaining code quality. However, API replacement refactoring, which replaces custom logic with API calls, remains underexplored. Existing refactoring tools provide limited support for detecting such opportunities because they rely on predefined templates and have difficulty capturing complex, multi-statement semantic equivalents.
Comparison of Automated White Matter Lesion Segmentation Approaches for Use in Large, Multi-Site Data Analyses in Parkinson's Disease
Background: Parkinson's disease (PD) is the second most common neurodegenerative disorder. PD currently lacks effective disease-modifying treatments, likely due to its diverse clinical features and underlying neuropathology. The vascular role in PD is emerging, with vascular mechanisms increasingly implicated, yet the literature remains conflicted, motivating large-data analyses with greater statistical power.
Vision-Language Work Zone Intelligence for Safety-Critical Speed Regulation of Mixed-Autonomy Vehicles in Dynamic Environments
arXiv:2606.08860v1 Announce Type: new Abstract: Temporary work-zone speed limits are communicated through visually inconsistent signage and are often missing from digital maps, creating safety risks for human drivers and automated vehicle systems. We present a real-time, onboard perception pipeline that detects active work zones, recognizes associated temporary speed limits, and outputs a law-aware work-zone state and speed value suitable for driver alerts or downstream automated control.