This study introduces an advanced reinforcement learning (RL)-based control strategy for heating, ventilation, and air conditioning (HVAC) systems, employing a soft actor-critic agent with a customized reward mechanism. This strategy integrates time-varying outdoor temperature-dependent weighting factors to dynamically balance thermal comfort and energy efficiency. Our methodology has undergone rigorous evaluation across two distinct test cases within the building optimization testing (BOPTEST) framework, an open-source virtual simulator equipped with standardized key performance indicators (KPIs) for performance assessment. Each test case is strategically selected to represent distinct building typologies, climatic conditions, and HVAC system complexities, ensuring a thorough evaluation of our method across diverse settings. The first test case is a heating-focused scenario in a residential setting. Here, we directly compare our method against four advanced control strategies: an optimized rule-based controller inherently provided by BOPTEST, two sophisticated RL-based strategies leveraging BOPTEST’s KPIs as reward references, and a model predictive control (MPC)-based approach specifically tailored for the test case. Our results indicate that our approach outperforms the rule-based and other RL-based strategies and achieves outcomes comparable to the MPC-based controller. The second scenario, a cooling-dominated environment in an office setting, further validates the versatility of our strategy under varying conditions. The consistent performance of our strategy across both scenarios underscores its potential as a robust tool for smart building management, adaptable to both residential and office environments under different climatic challenges.