Home Knowledge Base Spokenly

Spokenly

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Scaling few-shot spoken word classification with generative meta-continual learning

Announce Type: replace Abstract: Few-shot spoken word classification has largely been developed for applications where a small number of classes is considered, and so the potential of larger-scale few-shot spoken word classification remains untapped. This paper investigates the potential of a spoken word classifier to sequentially learn to distinguish between 1000 classes when it is given only five shots per class. We demonstrate that this scaling capability exists by training a model using...

arXiv CS 5d ago

Dalai Lama receives Grammy award for spoken-word album

Dalai Lama receives Grammy award for spoken-word album Dalai Lama receives Grammy award for spoken-word album Tibetan spiritual leader Tenzin Gyatso received his first Grammy award for his spoken-word album, “Meditations: The Reflections of His Holiness the Dalai Lama”. The award was presented by Indian classical music maestro Ustad Amjad Ali Khan. Published On 3 Jun 2026

Al Jazeera 6d ago

FormalASR: End-to-End Spoken Chinese to Formal Text

Announce Type: replace Abstract: Automatic speech recognition (ASR) systems are typically optimized for verbatim transcription, which preserves disfluencies, filler words, and informal spoken structures that are often unsuitable for downstream writing-oriented applications. A common workaround is a two-stage ASR+LLM pipeline for post-editing, but this design increases latency and memory cost and is difficult to deploy on-device. We present FormalASR, two compact end-to-end models (0.6B and...

arXiv CS 1d ago

Lost in Speech: Benchmarking, Evaluation, and Parsing of Spoken Bilingual Conversational Language Beyond Standard UD Assumptions

arXiv:2602.06307v2 Announce Type: replace Abstract: Spoken bilingual conversations pose substantial challenges for syntactic parsing because they often include disfluencies and discourse-driven structures that complicate dependency parsing under standard Universal Dependencies (UD) assumptions and evaluation practices. To systematically study these challenges, in this work, we first introduce a linguistically grounded taxonomy of conversational bilingual phenomena, together with SpokeBench,...

arXiv CS 1d ago

How Peter Phillips supported Harry in darkest hour but pair have not 'spoken in years'

How Peter Phillips supported Harry in darkest hour but pair have not 'spoken in years' The son of Princess Anne, who is tying the knot with his fiancé Harriet Sperling on Saturday, has found himself stuck in the middle of his warring cousins, William and Harry, as the Royal Family faced some of its most difficult times Peter Phillips, the son of Princess Anne, is preparing to tie the knot with his fiancé Harriet Sperling today, as the couple will marry in front of the Royal Family at All...

Daily Mirror 4d ago

LLM-Enhanced Dialogue Management for Full-Duplex Spoken Dialogue Systems

Announce Type: replace Abstract: Achieving full-duplex communication in spoken dialogue systems (SDS) requires real-time coordination between listening, speaking, and thinking. This paper proposes a semantic voice activity detection (VAD) module as a dialogue manager (DM) to efficiently manage turn-taking in full-duplex SDS. Implemented as a lightweight (0.5B) LLM fine-tuned on full-duplex conversation data, the semantic VAD predicts four control tokens to regulate turn-switching and...

arXiv CS 5d ago

IRAF: Interference-Resilient Adaptive Fusion for Noise-Robust End-to-End Full-Duplex Spoken Dialogue Systems

arXiv:2606.06559v1 Announce Type: new Abstract: Full-duplex spoken dialogue models allow voice agents to listen and speak concurrently, enabling natural interaction with real-time overlap. However, end-to-end dual-channel models that jointly encode user and agent streams may degrade in realistic acoustic environments: interfering speakers leaking into the user microphone can be encoded as part of the user query, corrupting the LLM's conditioning and causing unstable turn-taking and reduced...

arXiv CS 2d ago

Theta phase and theta-gamma coupling organise the spoken language network

Speech production requires rapid coordination of conceptual and lexical processes across distributed cortical networks, yet the neurophysiological mechanisms enabling this coordination remain poorly understood. Oscillatory coupling has emerged as a candidate mechanism for coordinating neural activity across spatial scales. Here, we used whole-head magnetoencephalography during overt picture naming to test how phase and phase-amplitude coupling organise neural dynamics preceding articulation.

bioRxiv 6d ago

The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning

arXiv:2603.17837v5 Announce Type: replace-cross Abstract: During conversational interactions, humans subconsciously engage in concurrent thinking while listening to a speaker. Although this internal cognitive processing may not always manifest as explicit linguistic structures, it is instrumental in formulating high-quality responses. Inspired by this cognitive phenomenon, we propose a novel Full-duplex LAtent and Internal Reasoning method named FLAIR that conducts latent thinking...

arXiv CS 5d ago

Extracting accent features in spoken Brazilian Portuguese without sociolinguistic labels

arXiv:2605.30457v2 Announce Type: replace-cross Abstract: Regional accent classification in Brazilian Portuguese (pt-BR) suffers from the need for reliable labeling. While large self-supervised learning (SSL) speech models are powerful, their training pipelines dilute sociophonetic information, since accent labels are generally not reliable or are not used in training objectives. This work introduces a novel workflow for feature extraction using only acoustic labels.

arXiv CS 6d ago