Cost Optimization for Big Data Workloads Based on Dynamic Scheduling and Cluster-Size Tuning

Ślęzak, Dominik; Lameski, Petre; Apanowicz, Cas; Grzegorowski, Marek; Zdravevski, Eftim; Janusz, Andrzej

doi:10.1016/J.BDR.2021.100203

Artykuł w czasopiśmie

Licencja

Dostęp zamknięty

Cost Optimization for Big Data Workloads Based on Dynamic Scheduling and Cluster-Size Tuning

DOI

10.1016/J.BDR.2021.100203

Autor

Ślęzak, Dominik

Lameski, Petre

Apanowicz, Cas

Grzegorowski, Marek

Zdravevski, Eftim

Janusz, Andrzej

Data publikacji

2021

Abstrakt (EN)

Analytical data processing has become the cornerstone of today's businesses success, and it is facilitated by Big Data platforms that offer virtually limitless scalability. However, minimizing the total cost of ownership (TCO) for the infrastructure can be challenging. We propose a novel method to build resilient clusters on cloud resources that are fine-tuned to the particular data processing task. The presented architecture follows the infrastructure-as-a-code paradigm so that the cluster can be dynamically configured and managed. It first identifies the optimal cluster size to perform a job in the required time. Then, by analyzing spot instance price history and using ARIMA models, it optimizes the schedule of the job execution to leverage the discounted prices of the cloud spot market. In particular, we evaluated savings opportunities when using Amazon EC2 spot instances comparing to on-demand resources. The performed experiments confirmed that the prediction module significantly improved the cost-effectiveness of the solution – up to 80% savings compared to the on-demand prices, and at the worst-case, 1% more cost than the absolute minimum. The production deployments of the architecture show that it is invaluable for minimizing the total cost of ownership of analytical data processing solutions.

Słowa kluczowe EN

Big Data

ETL

Cloud computing

Spot price prediction

ARIMA

Spark

Dyscyplina PBN

informatyka

Czasopismo

Big Data Research

Tom

25

Strony od-do

1-13

ISSN

2214-5796

Link do źródła

https://api.elsevier.com/content/article/PII:S2214579621000204?httpAccept=text/xml

Licencja otwartego dostępu

Dostęp zamknięty

Licencja

Cost Optimization for Big Data Workloads Based on Dynamic Scheduling and Cluster-Size Tuning

Opcje

DOI

Abstrakt (EN)