dc.contributor.author |
Hernandez-Leal, P
|
|
dc.contributor.author |
Taylor, ME
|
|
dc.contributor.author |
Rosman, Benjamin S
|
|
dc.contributor.author |
Sucar, LE
|
|
dc.contributor.author |
Munoz de Cote, E
|
|
dc.date.accessioned |
2017-05-17T07:18:30Z |
|
dc.date.available |
2017-05-17T07:18:30Z |
|
dc.date.issued |
2016-02 |
|
dc.identifier.citation |
Hernandez-Leal, P., Taylor, M.E., Rosman, B.S., Sucar, L.E. and Munoz de Cote, E. 2016. Identifying and tracking switching, non-stationary opponents: a Bayesian approach. Workshop on Multiagent Interaction without Prior Coordination (MIPC) at AAAI-16, 13 February 2016, Phoenix, Arizona USA, p. 560-566 |
en_US |
dc.identifier.uri |
https://www.aaai.org/ocs/index.php/WS/AAAIW16/paper/view/12584/12424
|
|
dc.identifier.uri |
http://mipc.inf.ed.ac.uk/2016/papers/mipc2016_hernandezleal_etal.pdf
|
|
dc.identifier.uri |
http://hdl.handle.net/10204/9091
|
|
dc.description |
Workshop on Multiagent Interaction without Prior Coordination (MIPC) at AAAI-16, 13 February 2016, Phoenix, Arizona USA |
en_US |
dc.description.abstract |
In many situations, agents are required to use a set of strategies (behaviors) and switch among them during the course of an interaction. This work focuses on the problem of recognizing the strategy used by an agent within a small number of interactions. We propose using a Bayesian framework to address this problem. Bayesian policy reuse (BPR) has been empirically shown to be efficient at correctly detecting the best policy to use from a library in sequential decision tasks. In this paper we extend BPR to adversarial settings, in particular, to opponents that switch from one stationary strategy to another. Our proposed extension enables learning new models in an online fashion when the learning agent detects that the current policies are not performing optimally. Experiments presented in repeated games show that our approach is capable of efficiently detecting opponent strategies and reacting quickly to behavior switches, thereby yielding better performance than state-of-the-art approaches in terms of average rewards. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
Association for the Advancement of Artificial Intelligence (AAAI) |
en_US |
dc.relation.ispartofseries |
Worklist;16648 |
|
dc.subject |
Policy reuse |
en_US |
dc.subject |
Non-stationary opponents |
en_US |
dc.subject |
Repeated games |
en_US |
dc.title |
Identifying and tracking switching, non-stationary opponents: a Bayesian approach |
en_US |
dc.type |
Conference Presentation |
en_US |
dc.identifier.apacitation |
Hernandez-Leal, P., Taylor, M., Rosman, B. S., Sucar, L., & Munoz de Cote, E. (2016). Identifying and tracking switching, non-stationary opponents: a Bayesian approach. Association for the Advancement of Artificial Intelligence (AAAI). http://hdl.handle.net/10204/9091 |
en_ZA |
dc.identifier.chicagocitation |
Hernandez-Leal, P, ME Taylor, Benjamin S Rosman, LE Sucar, and E Munoz de Cote. "Identifying and tracking switching, non-stationary opponents: a Bayesian approach." (2016): http://hdl.handle.net/10204/9091 |
en_ZA |
dc.identifier.vancouvercitation |
Hernandez-Leal P, Taylor M, Rosman BS, Sucar L, Munoz de Cote E, Identifying and tracking switching, non-stationary opponents: a Bayesian approach; Association for the Advancement of Artificial Intelligence (AAAI); 2016. http://hdl.handle.net/10204/9091 . |
en_ZA |
dc.identifier.ris |
TY - Conference Presentation
AU - Hernandez-Leal, P
AU - Taylor, ME
AU - Rosman, Benjamin S
AU - Sucar, LE
AU - Munoz de Cote, E
AB - In many situations, agents are required to use a set of strategies (behaviors) and switch among them during the course of an interaction. This work focuses on the problem of recognizing the strategy used by an agent within a small number of interactions. We propose using a Bayesian framework to address this problem. Bayesian policy reuse (BPR) has been empirically shown to be efficient at correctly detecting the best policy to use from a library in sequential decision tasks. In this paper we extend BPR to adversarial settings, in particular, to opponents that switch from one stationary strategy to another. Our proposed extension enables learning new models in an online fashion when the learning agent detects that the current policies are not performing optimally. Experiments presented in repeated games show that our approach is capable of efficiently detecting opponent strategies and reacting quickly to behavior switches, thereby yielding better performance than state-of-the-art approaches in terms of average rewards.
DA - 2016-02
DB - ResearchSpace
DP - CSIR
KW - Policy reuse
KW - Non-stationary opponents
KW - Repeated games
LK - https://researchspace.csir.co.za
PY - 2016
T1 - Identifying and tracking switching, non-stationary opponents: a Bayesian approach
TI - Identifying and tracking switching, non-stationary opponents: a Bayesian approach
UR - http://hdl.handle.net/10204/9091
ER -
|
en_ZA |