Home Entertainment Empirical Evidence on Genre-Time Correlation in...
Entertainment

Empirical Evidence on Genre-Time Correlation in Box-Office Success Using Exploratory Data Analysis and Machine Learning

Key Points

arXiv:2606.13689v1 Announce Type: new Abstract: The movie industry is one of the fastest-growing global sectors, characterized by high production costs and significant financial risk. Given the capital-intensive nature of filmmaking, accurately predicting box office success is of critical importance for stakeholders ranging from producers to investors. This study investigates the correlation between movie genre and release timing as predictive factors for commercial success.

arXiv:2606.13689v1 Announce Type: new Abstract: The movie industry is one of the fastest-growing global sectors, characterized by high production costs and significant financial risk. Given the capital-intensive nature of filmmaking, accurately predicting box office success is of critical importance for stakeholders ranging from producers to investors. This study investigates the correlation between movie genre and release timing as predictive factors for commercial success. A combined approach involving EDA and supervised machine learning techniques is proposed to assess this relationship. The dataset, comprising the top 200 box office hits and the top 100 flops, was curated from reliable sources, including IMDb, Box Office Mojo, The Numbers, and Wikipedia. EDA revealed that specific genres show statistically significant patterns of success or failure in particular months. For instance, animated and superhero movies achieved their peak success rates in June and July (28% and 29%, respectively), while thrillers and romance genres showed higher hit rates in November. Conversely, the flop dataset showed genres like action and comedy more frequently underperforming in March, April, and August. To validate these findings, multiple regression-based machine learning models were applied using both cross-validation and percentage-split methods. Algorithms such as LWT, Multilayer Perceptron, Random Tree, and Decision Stamp demonstrated high predictive accuracy, reinforcing the hypothesis of genre-time dependency. The results consistently indicated a strong correlation between release month and genre performance, providing valuable insight for strategic planning in content production and release scheduling. This study highlights the growing need to apply data analytics in the media industry, like other data-driven domains, for risk mitigation and optimized decision-making.
Box-Office Success Using Exploratory Data Analysis (ORG) EDA (ORG) Box Office Mojo (ORG) Numbers (ORG) Wikipedia (ORG) LWT (ORG) Random Tree (PERSON)
Originally published by arXiv CS Read original →