Hybrid learning strategy to solve pendulum swing-up problem for real hardware

Shingo Nakamura, Shuji Hashimoto

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    3 Citations (Scopus)

    Abstract

    In this paper, we propose a machine learning strategy to obtain the optimal controller for actual machine using hybrid platforms; real hardware and simulator. A simulator consists of the neural networks which directly can learn actual behaviors of the latest hardware and emulates them without physical modeling. On the other hand, the controller of the hardware is trained with the simulator by the reinforcement learning method to realize the optimal control for the target task, and applied to the real hardware. Then, as long as the iteration of these processes is simultaneously performed, the system can automatically generate the optimal controller without any works even when hardware constitution is changed or switched. In this manner, the real hardware and the simulator affect each other to make the system adaptable. Furthermore, in the processes of sampling and supplying hardware data, we put a buffering component. It keeps the latest data of the hardware and supplies non-biased data to the simulator. As an example of the proposal method, we pick up the pendulum swing-up problem. In the experiments, firstly, the optimization process performs step by step for the initial hardware constitution and the basic idea of the method is evaluated. Afterward, by changing a pendulum, we confirm system can autonomously generate the new optimal controller for the real hardware without any human operations.

    Original languageEnglish
    Title of host publication2007 IEEE International Conference on Robotics and Biomimetics, ROBIO
    Pages1972-1977
    Number of pages6
    DOIs
    Publication statusPublished - 2008
    Event2007 IEEE International Conference on Robotics and Biomimetics, ROBIO - Yalong Bay, Sanya
    Duration: 2007 Dec 152007 Dec 18

    Other

    Other2007 IEEE International Conference on Robotics and Biomimetics, ROBIO
    CityYalong Bay, Sanya
    Period07/12/1507/12/18

    Fingerprint

    Pendulums
    Hardware
    Simulators
    Controllers
    Reinforcement learning
    Learning systems
    Sampling
    Neural networks

    Keywords

    • Machine learning
    • Pendulum swing-up problem
    • Simulator construction

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Control and Systems Engineering
    • Biomaterials

    Cite this

    Nakamura, S., & Hashimoto, S. (2008). Hybrid learning strategy to solve pendulum swing-up problem for real hardware. In 2007 IEEE International Conference on Robotics and Biomimetics, ROBIO (pp. 1972-1977). [4522469] https://doi.org/10.1109/ROBIO.2007.4522469

    Hybrid learning strategy to solve pendulum swing-up problem for real hardware. / Nakamura, Shingo; Hashimoto, Shuji.

    2007 IEEE International Conference on Robotics and Biomimetics, ROBIO. 2008. p. 1972-1977 4522469.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Nakamura, S & Hashimoto, S 2008, Hybrid learning strategy to solve pendulum swing-up problem for real hardware. in 2007 IEEE International Conference on Robotics and Biomimetics, ROBIO., 4522469, pp. 1972-1977, 2007 IEEE International Conference on Robotics and Biomimetics, ROBIO, Yalong Bay, Sanya, 07/12/15. https://doi.org/10.1109/ROBIO.2007.4522469
    Nakamura S, Hashimoto S. Hybrid learning strategy to solve pendulum swing-up problem for real hardware. In 2007 IEEE International Conference on Robotics and Biomimetics, ROBIO. 2008. p. 1972-1977. 4522469 https://doi.org/10.1109/ROBIO.2007.4522469
    Nakamura, Shingo ; Hashimoto, Shuji. / Hybrid learning strategy to solve pendulum swing-up problem for real hardware. 2007 IEEE International Conference on Robotics and Biomimetics, ROBIO. 2008. pp. 1972-1977
    @inproceedings{3bd7e342acae4a678e1f4e6b0afeae7d,
    title = "Hybrid learning strategy to solve pendulum swing-up problem for real hardware",
    abstract = "In this paper, we propose a machine learning strategy to obtain the optimal controller for actual machine using hybrid platforms; real hardware and simulator. A simulator consists of the neural networks which directly can learn actual behaviors of the latest hardware and emulates them without physical modeling. On the other hand, the controller of the hardware is trained with the simulator by the reinforcement learning method to realize the optimal control for the target task, and applied to the real hardware. Then, as long as the iteration of these processes is simultaneously performed, the system can automatically generate the optimal controller without any works even when hardware constitution is changed or switched. In this manner, the real hardware and the simulator affect each other to make the system adaptable. Furthermore, in the processes of sampling and supplying hardware data, we put a buffering component. It keeps the latest data of the hardware and supplies non-biased data to the simulator. As an example of the proposal method, we pick up the pendulum swing-up problem. In the experiments, firstly, the optimization process performs step by step for the initial hardware constitution and the basic idea of the method is evaluated. Afterward, by changing a pendulum, we confirm system can autonomously generate the new optimal controller for the real hardware without any human operations.",
    keywords = "Machine learning, Pendulum swing-up problem, Simulator construction",
    author = "Shingo Nakamura and Shuji Hashimoto",
    year = "2008",
    doi = "10.1109/ROBIO.2007.4522469",
    language = "English",
    isbn = "9781424417582",
    pages = "1972--1977",
    booktitle = "2007 IEEE International Conference on Robotics and Biomimetics, ROBIO",

    }

    TY - GEN

    T1 - Hybrid learning strategy to solve pendulum swing-up problem for real hardware

    AU - Nakamura, Shingo

    AU - Hashimoto, Shuji

    PY - 2008

    Y1 - 2008

    N2 - In this paper, we propose a machine learning strategy to obtain the optimal controller for actual machine using hybrid platforms; real hardware and simulator. A simulator consists of the neural networks which directly can learn actual behaviors of the latest hardware and emulates them without physical modeling. On the other hand, the controller of the hardware is trained with the simulator by the reinforcement learning method to realize the optimal control for the target task, and applied to the real hardware. Then, as long as the iteration of these processes is simultaneously performed, the system can automatically generate the optimal controller without any works even when hardware constitution is changed or switched. In this manner, the real hardware and the simulator affect each other to make the system adaptable. Furthermore, in the processes of sampling and supplying hardware data, we put a buffering component. It keeps the latest data of the hardware and supplies non-biased data to the simulator. As an example of the proposal method, we pick up the pendulum swing-up problem. In the experiments, firstly, the optimization process performs step by step for the initial hardware constitution and the basic idea of the method is evaluated. Afterward, by changing a pendulum, we confirm system can autonomously generate the new optimal controller for the real hardware without any human operations.

    AB - In this paper, we propose a machine learning strategy to obtain the optimal controller for actual machine using hybrid platforms; real hardware and simulator. A simulator consists of the neural networks which directly can learn actual behaviors of the latest hardware and emulates them without physical modeling. On the other hand, the controller of the hardware is trained with the simulator by the reinforcement learning method to realize the optimal control for the target task, and applied to the real hardware. Then, as long as the iteration of these processes is simultaneously performed, the system can automatically generate the optimal controller without any works even when hardware constitution is changed or switched. In this manner, the real hardware and the simulator affect each other to make the system adaptable. Furthermore, in the processes of sampling and supplying hardware data, we put a buffering component. It keeps the latest data of the hardware and supplies non-biased data to the simulator. As an example of the proposal method, we pick up the pendulum swing-up problem. In the experiments, firstly, the optimization process performs step by step for the initial hardware constitution and the basic idea of the method is evaluated. Afterward, by changing a pendulum, we confirm system can autonomously generate the new optimal controller for the real hardware without any human operations.

    KW - Machine learning

    KW - Pendulum swing-up problem

    KW - Simulator construction

    UR - http://www.scopus.com/inward/record.url?scp=49249088407&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=49249088407&partnerID=8YFLogxK

    U2 - 10.1109/ROBIO.2007.4522469

    DO - 10.1109/ROBIO.2007.4522469

    M3 - Conference contribution

    AN - SCOPUS:49249088407

    SN - 9781424417582

    SP - 1972

    EP - 1977

    BT - 2007 IEEE International Conference on Robotics and Biomimetics, ROBIO

    ER -