Latency tolerance is essential in achieving high performance on parallel computers for remote function calls and fine-grained remote memory accesses. EM-X supports interprocessor communication on an execution pipeline with small and simple packets. It can create a packet in one cycle, and receive a packet from the network in the on-chip buffer without interruption. EM-X invokes threads on packet arrival, minimizing the overhead of thread switching. It can tolerate communication latency by using efficient multi-threading and optimizing packet flow of fine grain communication. EM-X also supports the synchronization of two operands, direct remote memory read/write operations and flexible packet scheduling with priority. This paper describes distinctive features of the EM-X architecture and reports the performance of small synthetic programs and larger more realistic programs.