Artykuł w czasopiśmie
Brak miniatury
Licencja

ClosedAccessDostęp zamknięty
 

Cost Optimization for Big Data Workloads Based on Dynamic Scheduling and Cluster-Size Tuning

Uproszczony widok
cris.lastimport.scopus2024-02-12T19:52:05Z
dc.abstract.enAnalytical data processing has become the cornerstone of today's businesses success, and it is facilitated by Big Data platforms that offer virtually limitless scalability. However, minimizing the total cost of ownership (TCO) for the infrastructure can be challenging. We propose a novel method to build resilient clusters on cloud resources that are fine-tuned to the particular data processing task. The presented architecture follows the infrastructure-as-a-code paradigm so that the cluster can be dynamically configured and managed. It first identifies the optimal cluster size to perform a job in the required time. Then, by analyzing spot instance price history and using ARIMA models, it optimizes the schedule of the job execution to leverage the discounted prices of the cloud spot market. In particular, we evaluated savings opportunities when using Amazon EC2 spot instances comparing to on-demand resources. The performed experiments confirmed that the prediction module significantly improved the cost-effectiveness of the solution – up to 80% savings compared to the on-demand prices, and at the worst-case, 1% more cost than the absolute minimum. The production deployments of the architecture show that it is invaluable for minimizing the total cost of ownership of analytical data processing solutions.
dc.affiliationUniwersytet Warszawski
dc.contributor.authorŚlęzak, Dominik
dc.contributor.authorLameski, Petre
dc.contributor.authorApanowicz, Cas
dc.contributor.authorGrzegorowski, Marek
dc.contributor.authorZdravevski, Eftim
dc.contributor.authorJanusz, Andrzej
dc.date.accessioned2024-01-24T20:53:29Z
dc.date.available2024-01-24T20:53:29Z
dc.date.issued2021
dc.description.financePublikacja bezkosztowa
dc.description.volume25
dc.identifier.doi10.1016/J.BDR.2021.100203
dc.identifier.issn2214-5796
dc.identifier.urihttps://repozytorium.uw.edu.pl//handle/item/103814
dc.identifier.weblinkhttps://api.elsevier.com/content/article/PII:S2214579621000204?httpAccept=text/xml
dc.languageeng
dc.pbn.affiliationcomputer and information sciences
dc.relation.ispartofBig Data Research
dc.relation.pages1-13
dc.rightsClosedAccess
dc.sciencecloudnosend
dc.subject.enBig Data
dc.subject.enETL
dc.subject.enCloud computing
dc.subject.enSpot price prediction
dc.subject.enARIMA
dc.subject.enSpark
dc.titleCost Optimization for Big Data Workloads Based on Dynamic Scheduling and Cluster-Size Tuning
dc.typeJournalArticle
dspace.entity.typePublication