Neural networks are powerful tool to simulate nonlinear systems. However, obtaining reliable neural networks is usually a time-consuming task, which requires repeated training of the networks with the available data. Recently, some attempts to accelerate the neural network training by utilizing paralleled hardware have been proposed. One of the challenges in hardware acceleration is implementing the floating-point squashing functions, like sigmoid(x) and tanh(x), that have vast input domain. However, previous implementations of squashing functions either suffer from low speed and poor accuracy or require large area and lots of manual works. In this paper, we present an automatic method to implement the squashing functions. Based on the proposed domain partition algorithm and coefficient compression method, squashing functions with smaller size, faster speed, and higher precision are obtained. Experiment on sigmoid(x) shows that less memory usage, up to 20k times smaller error rate, 300 times synthesis speedup, and 50% reduction of LUTs and flop-flops usage are achieved than conventional method.
|ホスト出版物のタイトル||International Conference on Solid-State and Integrated Circuits Technology Proceedings, ICSICT|
|出版ステータス||Published - 2008|
|イベント||2008 9th International Conference on Solid-State and Integrated-Circuit Technology, ICSICT 2008 - Beijing|
継続期間: 2008 10 20 → 2008 10 23
|Other||2008 9th International Conference on Solid-State and Integrated-Circuit Technology, ICSICT 2008|
|Period||08/10/20 → 08/10/23|
ASJC Scopus subject areas