It seems that Instruction Level Parallelism (ILP) approach, which has been used by various superscalar processors and VLIW processors for a long time, reaches its limitation of performance improvement. To obtain scalable performance improvement, cost effectiveness and high productivity even in the era of one billion transistors, the cooperative work between software and hardware is getting increasingly important. For this reason, the authors have developed OSCAR (Optimally Scheduled Advanced multiprocessoR) Chip Multiprocessor (OSCAR CMP) and OSCAR multigrain compiler simultaneously. To preserve the scalability in the future, OSCAR CMP has mechanisms for efficient use of parallelism and data locality, and for hiding data transfer overhead. These mechanisms can be fully controlled by the OSCAR multigrain compiler. In this paper, the authors focus on multigrain parallel processing on OSCAR CMP, which enables us to exploit loop iteration level parallelism and coarse grain task parallelism in addition to ILP from the entire of a program. Performance of multigrain parallel processing on OSCAR CMP architecture is evaluated using SPEC fp 2000/95 benchmark suite. When microSPARC like single issue core is used, OSCAR CMP gives us from 1.77 to 3.96 times speedup for four processors against single processor. In addition, OSCAR CMP is compared with Sun UltraSPARC II like processor to evaluate cost effectiveness. As a result, OSCAR CMP gives us 1.66 times better performance on the average under the condition that OSCAR CMP and UltraSPARC II are built from almost same number of transistors.