Artykuł w czasopiśmie
Brak miniatury
Licencja

ClosedAccessDostęp zamknięty

Approximate Decision Tree Induction over Approximately Engineered Data Features

Autor
Ślęzak, Dominik
Chądzyńska-Krasowska, Agnieszka
Data publikacji
2020
Abstrakt (EN)

We propose a simple SQL-based decision tree induction algorithm which makes its heuristic choices how to split the data basing on the results of automatically generated analytical queries. We run this algorithm using standard SQL and the approximate SQL engine which works on granulated data summaries. We compare the accuracy of trees obtained in these two modes on the real-world dataset provided to participants of the Suspicious Network Event Recognition competition organized at IEEE BigData 2019. We investigate whether trees induced using approximate SQL queries – although execution of such queries is incomparably faster – may yield poorer accuracy than in the standard scenario. Next, we investigate features – inputs to the decision tree induction algorithm – derived using SQL from a bigger associated data table which was provided in the aforementioned competition too. As before, we run standard and approximate SQL, although again, that latter mode needs to be checked with respect to the accuracy of trees learnt over the data with approximately extracted features.

Słowa kluczowe EN
SQL-based decision tree induction
SQL-based feature engineering
Approximate SQL engines
Granulated data summarization
Big data analytics
Cybersecurity analytics
Dyscyplina PBN
informatyka
Strony od-do
376-384
Licencja otwartego dostępu
Dostęp zamknięty