Artykuł w czasopiśmie
Brak miniatury
Licencja

ClosedAccessDostęp zamknięty
 

On Many-Actions Policy Gradient

dc.abstract.enWe study the variance of stochastic policy gradients (SPGs) with many action samples per state. We derive a many-actions optimality condition, which determines when many-actions SPG yields lower variance as compared to a single-action agent with proportionally extended trajectory. We propose Model-Based Many-Actions (MBMA), an approach leveraging dynamics models for many-actions sampling in the context of SPG. MBMA addresses issues associated with existing implementations of many-actions SPG and yields lower bias and comparable variance to SPG estimated from states in model-simulated rollouts. We find that MBMA bias and variance structure matches that predicted by theory. As a result, MBMA achieves improved sample efficiency and higher returns on a range of continuous action environments as compared to model-free, many-actions, and model-based on-policy SPG baselines.
dc.affiliationUniwersytet Warszawski
dc.conference.countryStany Zjednoczone
dc.conference.datefinish2023-07-29
dc.conference.datestart2023-07-23
dc.conference.placeHonolulu
dc.conference.seriesInternational Conference on Machine Learning
dc.conference.seriesInternational Conference on Machine Learning
dc.conference.seriesshortcutICML
dc.conference.shortcutICML 2023
dc.conference.weblinkhttps://icml.cc/Conferences/2023/Dates
dc.contributor.authorCygan, Marek
dc.contributor.authorNauman, Michal
dc.date.accessioned2024-01-25T15:45:23Z
dc.date.available2024-01-25T15:45:23Z
dc.date.issued2023
dc.description.financePublikacja bezkosztowa
dc.identifier.urihttps://repozytorium.uw.edu.pl//handle/item/114719
dc.identifier.weblinkhttps://proceedings.mlr.press/v202/nauman23a.html
dc.languageeng
dc.pbn.affiliationcomputer and information sciences
dc.relation.pages202:25769-25789
dc.rightsClosedAccess
dc.sciencecloudnosend
dc.titleOn Many-Actions Policy Gradient
dc.typeJournalArticle
dspace.entity.typePublication