Multi-Objective Machine-Learning Based Resource Management
for Heterogeneous HPC Servers and Datacenters
With the rapidly increasing demand for computing resources, datacenters must address the challenges of energy efficiency and performance. Computing power consumption is estimated to be 1.3% of the global usage, and it is increasing at a rate of 20% per year. With that increasing usage come increasing costs. To address this challenge, scientists need to develop resource generation and management policies based on virtual machine (VM) characteristics.
Towards that objective, we propose a two-phase greedy heuristic and an ML-based approach. We assess them in terms of energy, quality of service (QoS), network traffic, migrations, and scalability for various datacentre scenarios. We also introduce a novel hyper-heuristic algorithm that dynamically finds the best algorithm according to a user-defined metric. For optimality assessment, we formulate an integer linear programming (ILP)-based VM allocation method to minimize energy consumption and data communication, which obtains optimal results, but is impractical at runtime.
Our results show that the ML approach provides up to 24% server-to-server network traffic improvement and reduces execution time by up to 480× compared to conventional approaches, for large-scale scenarios. On the contrary, the heuristic approach outperforms the ML method in terms of energy and network traffic for reduced scenarios. We also show that the heuristic and ML approaches have up to 6% energy consumption overhead compared to ILP-based optimal solution. Our hyper-heuristic integrates the strengths of both the heuristic and the ML methods by selecting the best one during runtime.
We also introduce ECOGreen, a holistic strategy to jointly optimize the datacenter regulation service problem and virtual machine (VM) allocation that satisfies the hour-ahead power market constraints in the presence of electrical energy storage (EES) and renewable energy. After determining the best power and reserve bidding values and the number of active servers in a fast analytical way, we present an online adaptive policy that modulates datacenter power consumption by controlling VMs CPU resource limits and efficiently utilizing demand-side EES and renewable power, while guaranteeing quality-of-service (QoS) constraints.
Our results demonstrate that ECOGreen can provide 76% of the datacenter power consumption on average as reserves to the market. Our study shows that ECOGreen could save up to 71% electricity costs when compared to other state-of-the-art datacenter electricity cost minimization techniques.