Reinforcement learning with temperature distribution based on likelihood function

Norimasa Kobori, Kenji Suzuji, Pitoyo Hartono, Shuji Hashimoto

    Research output: Contribution to journalArticle

    2 Citations (Scopus)

    Abstract

    In the existing Reinforcement Learning, it is difficult and time consuming to find appropriate the meta-parameters such as learning rate, eligibility traces and temperature for exploration, in particular on a complicated and large-scale problem, the delayed reward often occurs and causes a difficulty in solving the problem. In this paper, we propose a novel method introducing a temperature distribution for reinforcement learning. In addition to the acquirement of policy based on profit sharing, the temperature is given to each state and is trained by hill-climbing method using likelihood function based on success and failure of the task. The proposed method can reduce the parameter setting according to the given problems. We showed the performance on the grid world problem and the control of Acrobot.

    Original languageEnglish
    Pages (from-to)297-305
    Number of pages9
    JournalTransactions of the Japanese Society for Artificial Intelligence
    Volume20
    Issue number4
    DOIs
    Publication statusPublished - 2005

    Fingerprint

    Reinforcement learning
    Temperature distribution
    Profitability
    Temperature

    Keywords

    • Delayed reward
    • Maximum likelihood estimation
    • Meta-parameter control
    • Profit sharing
    • Reinforcement Learning
    • Temperature distribution

    ASJC Scopus subject areas

    • Artificial Intelligence

    Cite this

    Reinforcement learning with temperature distribution based on likelihood function. / Kobori, Norimasa; Suzuji, Kenji; Hartono, Pitoyo; Hashimoto, Shuji.

    In: Transactions of the Japanese Society for Artificial Intelligence, Vol. 20, No. 4, 2005, p. 297-305.

    Research output: Contribution to journalArticle

    Kobori, Norimasa ; Suzuji, Kenji ; Hartono, Pitoyo ; Hashimoto, Shuji. / Reinforcement learning with temperature distribution based on likelihood function. In: Transactions of the Japanese Society for Artificial Intelligence. 2005 ; Vol. 20, No. 4. pp. 297-305.
    @article{986170d4f2704656b8026c1007f89dcd,
    title = "Reinforcement learning with temperature distribution based on likelihood function",
    abstract = "In the existing Reinforcement Learning, it is difficult and time consuming to find appropriate the meta-parameters such as learning rate, eligibility traces and temperature for exploration, in particular on a complicated and large-scale problem, the delayed reward often occurs and causes a difficulty in solving the problem. In this paper, we propose a novel method introducing a temperature distribution for reinforcement learning. In addition to the acquirement of policy based on profit sharing, the temperature is given to each state and is trained by hill-climbing method using likelihood function based on success and failure of the task. The proposed method can reduce the parameter setting according to the given problems. We showed the performance on the grid world problem and the control of Acrobot.",
    keywords = "Delayed reward, Maximum likelihood estimation, Meta-parameter control, Profit sharing, Reinforcement Learning, Temperature distribution",
    author = "Norimasa Kobori and Kenji Suzuji and Pitoyo Hartono and Shuji Hashimoto",
    year = "2005",
    doi = "10.1527/tjsai.20.297",
    language = "English",
    volume = "20",
    pages = "297--305",
    journal = "Transactions of the Japanese Society for Artificial Intelligence",
    issn = "1346-0714",
    publisher = "Japanese Society for Artificial Intelligence",
    number = "4",

    }

    TY - JOUR

    T1 - Reinforcement learning with temperature distribution based on likelihood function

    AU - Kobori, Norimasa

    AU - Suzuji, Kenji

    AU - Hartono, Pitoyo

    AU - Hashimoto, Shuji

    PY - 2005

    Y1 - 2005

    N2 - In the existing Reinforcement Learning, it is difficult and time consuming to find appropriate the meta-parameters such as learning rate, eligibility traces and temperature for exploration, in particular on a complicated and large-scale problem, the delayed reward often occurs and causes a difficulty in solving the problem. In this paper, we propose a novel method introducing a temperature distribution for reinforcement learning. In addition to the acquirement of policy based on profit sharing, the temperature is given to each state and is trained by hill-climbing method using likelihood function based on success and failure of the task. The proposed method can reduce the parameter setting according to the given problems. We showed the performance on the grid world problem and the control of Acrobot.

    AB - In the existing Reinforcement Learning, it is difficult and time consuming to find appropriate the meta-parameters such as learning rate, eligibility traces and temperature for exploration, in particular on a complicated and large-scale problem, the delayed reward often occurs and causes a difficulty in solving the problem. In this paper, we propose a novel method introducing a temperature distribution for reinforcement learning. In addition to the acquirement of policy based on profit sharing, the temperature is given to each state and is trained by hill-climbing method using likelihood function based on success and failure of the task. The proposed method can reduce the parameter setting according to the given problems. We showed the performance on the grid world problem and the control of Acrobot.

    KW - Delayed reward

    KW - Maximum likelihood estimation

    KW - Meta-parameter control

    KW - Profit sharing

    KW - Reinforcement Learning

    KW - Temperature distribution

    UR - http://www.scopus.com/inward/record.url?scp=18444366749&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=18444366749&partnerID=8YFLogxK

    U2 - 10.1527/tjsai.20.297

    DO - 10.1527/tjsai.20.297

    M3 - Article

    AN - SCOPUS:18444366749

    VL - 20

    SP - 297

    EP - 305

    JO - Transactions of the Japanese Society for Artificial Intelligence

    JF - Transactions of the Japanese Society for Artificial Intelligence

    SN - 1346-0714

    IS - 4

    ER -