This paper presents an efficient VLSI architecture of intra prediction for 8K×4K HEVC decoder. It supports all 35 intra prediction modes and prediction sizes ranging from 4×4 to 64×64. This works proposed a Cyclic SRAM Banks based Parallel Reference Sample Fetching (CSB-PRSF), which guarantees enough reference samples for prediction and reduces the number of registers used for storing reference samples. To guarantee high throughput, 16 pixels are predicted by 4×4 Block Based Pipelining, and dependency between neighboring blocks is eliminated by Hybrid Data Forwarding and Block Reordering. This architecture is synthesized using 90nm technology and the maximum working frequency is 469 MHz, with 72.1K gates area. Running at 397MHz, the architecture can support 4320p@120fps HEVC intra decoding, with full modes and full sizes.