Home Business & Finance RedditPersona: A Modular Framework for...
Business & Finance

RedditPersona: A Modular Framework for Community-Conditioned LLM Adaptation from Reddit

Key Points

arXiv:2606.06027v1 Announce Type: new Abstract: Community-conditioned language model adaptation requires choices about data collection, community definition, and evaluation that are currently made independently in each study, making it hard to compare assumptions or reuse artifacts. We present RedditPersona, a modular framework that standardizes these choices: it collects Reddit posts and comments, profiles active users, partitions them under five grouping strategies (subreddit-based,...

arXiv:2606.06027v1 Announce Type: new Abstract: Community-conditioned language model adaptation requires choices about data collection, community definition, and evaluation that are currently made independently in each study, making it hard to compare assumptions or reuse artifacts. We present RedditPersona, a modular framework that standardizes these choices: it collects Reddit posts and comments, profiles active users, partitions them under five grouping strategies (subreddit-based, graph-structural, semantic, hybrid, and interaction-based), trains a parameter-efficient adapter per strategy via QLoRA, and evaluates them under a shared metric suite spanning fluency, fidelity, distributional alignment, and community identifiability. Applied to 112 subreddits in the urban well-being domain (301,429 user profiles, 16M+ comments), we find that adapters' behavioral identifiability tracks each strategy's intrinsic agreement with the subreddit baseline, and that a consistent trade-off between identifiability and distributional similarity to real text holds across all five strategies. The code and configuration files are available at: https://github.com/Ahghaffari/redditpersona.
RedditPersona (ORG) Modular Framework for Community (ORG) Reddit arXiv:2606.06027v1 Announce Type: (ORG) Reddit (ORG) fidelity (ORG)
Originally published by arXiv CS Read original →