This paper proposes a parallel LSI architecture for LDPC decoder which improves a message-passing schedule. The proposed LDPC decoder is characterized as follows: (i) The column operations follow the row operations in a pipelined architecture to ensure that the row and column operations are performed concurrently, (ii) The proposed parallel pipelined bit functional unit enables the decoder to perform every column operation using the messages which is updated by the row operations. These column operations can be performed without extending the single iterative decoding delay. Hardware implementation and simulation results show that the proposed decoder improves the decoding throughput and bit error performance with a small hardware overhead.