Home Knowledge Base Recall, Precision

Recall, Precision

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

An Empirical Analysis of Task-Induced Encoder Bias in Fr\'echet Audio Distance

Announce Type: replace-cross Abstract: Fr\'echet Audio Distance (FAD) is the de facto standard for evaluating text-to-audio generation, yet its scores depend on the underlying encoder's embedding space. An encoder's training task dictates which acoustic features are preserved or discarded, causing FAD to inherit systematic task-induced biases. We decompose evaluation into Recall, Precision, and Alignment (split into semantic and structural dimensions), using log-scale normalization for fair...

arXiv CS 1d ago

Agreement Metrics for LLM-as-Judge Evaluation: What to Report and Why

arXiv:2606.00093v1 Announce Type: cross Abstract: Validating an LLM judge against human annotations usually means reporting several agreement statistics: accuracy, precision, recall, $F_1$, Cohen's $\kappa$, and one or more rank correlations. A survey of 24 recent LLM-as-judge papers finds metric choice entangled with the judgment scale, tie handling, invalid outputs, and abstention handling, and those choices rarely stated. For binary criteria -- the common case in rubric-based evaluation,...

arXiv Physics 8d ago

On Choosing the $\mu$ Parameter in Gaussian Differential Privacy

Announce Type: new Abstract: Recent work argues for using Gaussian differential privacy (GDP) to report the privacy guarantees in privacy-preserving machine learning. We provide principled mappings from pure-DP $\varepsilon$ to GDP $\mu$ by matching the worst-case success of a strong-adversary membership inference attack in terms of three metrics: multiplicative advantage at fixed FPR, precision at fixed recall, and the standard privacy profile.

arXiv CS 1d ago

Early Prediction of Liver Cirrhosis Up to Two Years in Advance: A Machine Learning Study Benchmarking Against the FIB-4 and APRI Scores

Announce Type: replace Abstract: Objective: Develop and evaluate machine learning (ML) models for predicting incident liver cirrhosis (LC) one and two years prior to diagnosis using routinely collected electronic health record (EHR) data and benchmark their performance against the FIB-4 and APRI clinical scores. Methods: We conducted a retrospective cohort study using de-identified EHR data from a large academic health system. XGBoost models were developed for 1- and 2-year prediction...

arXiv CS 8d ago

Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity Matching

arXiv:2512.03553v3 Announce Type: replace Abstract: Content moderation remains a critical yet challenging task for large-scale user-generated video platforms, especially in livestreaming environments where moderation must be timely, multimodal, and robust to evolving forms of unwanted content. We present a hybrid moderation framework deployed at production scale that combines supervised classification for known violations with reference-based similarity matching for novel or subtle cases....

arXiv CS 6d ago

Listening to the Workforce: Measuring Construction Worker Safety Attitudes from Social Media Discourse Using LLMs

Announce Type: new Abstract: Worker safety attitudes are key determinants of whether protective practices are applied or bypassed on construction sites. Yet measuring them at scale has remained out of reach. Safety attitudes are multidimensional, vary across topics, and surface most candidly in workers' own conversations.

arXiv CS 6d ago

Anomaly Detection for Electro-Hydrostatic Actuators using LSTM Autoencoder

Announce Type: new Abstract: Electro-Hydrostatic Actuators (EHAs) are widely used in aerospace and industrial systems, where timely detection of sensor anomalies is essential to ensure safe and reliable operation. However, the large volume and high sampling frequency of EHA sensor data pose challenges for accurate and efficient anomaly detection. Conventional statistical and classical machine-learning methods such as Z-score, Interquartile Range (IQR), Median Absolute Deviation (MAD),...

arXiv CS 5d ago

From Custom Logic to APIs: Understanding and Recommending API Replacement Refactorings

Announce Type: new Abstract: Software refactoring is essential for maintaining code quality. However, API replacement refactoring, which replaces custom logic with API calls, remains underexplored. Existing refactoring tools provide limited support for detecting such opportunities because they rely on predefined templates and have difficulty capturing complex, multi-statement semantic equivalents.

arXiv CS 2d ago

Comparison of Automated White Matter Lesion Segmentation Approaches for Use in Large, Multi-Site Data Analyses in Parkinson's Disease

Background: Parkinson's disease (PD) is the second most common neurodegenerative disorder. PD currently lacks effective disease-modifying treatments, likely due to its diverse clinical features and underlying neuropathology. The vascular role in PD is emerging, with vascular mechanisms increasingly implicated, yet the literature remains conflicted, motivating large-data analyses with greater statistical power.

bioRxiv 11d ago

Vision-Language Work Zone Intelligence for Safety-Critical Speed Regulation of Mixed-Autonomy Vehicles in Dynamic Environments

arXiv:2606.08860v1 Announce Type: new Abstract: Temporary work-zone speed limits are communicated through visually inconsistent signage and are often missing from digital maps, creating safety risks for human drivers and automated vehicle systems. We present a real-time, onboard perception pipeline that detects active work zones, recognizes associated temporary speed limits, and outputs a law-aware work-zone state and speed value suitable for driver alerts or downstream automated control.

arXiv CS 1d ago