A recent development in multicore technology has enabled development of hundreds or thousands core processor. However, on such multicore processor, an efficient hardware cache coherence scheme will become very complex and expensive to develop. This paper proposes a parallelizing compiler directed software coherence scheme for shared memory multicore systems without hardware cache coherence control. The general idea of the proposed method is that an automatic parallelizing compiler analyzes the control dependency and data dependency among coarse grain task in the program. Then based on the obtained information, task parallelization, false sharing detection and data restructuration to prevent false sharing are performed. Next the compiler inserts cache control code to handle stale data problem. The proposed method is built on OSCAR automatic parallelizing compiler and evaluated on Renesas RP2 with 8 SH-4A cores processor. The hardware cache coherence scheme on the RP2 processor is only available for up to 4 cores and the hardware cache coherence can be completely turned off for non-coherence cache mode. Performance evaluation is performed using 10 benchmark program from SPEC2000, SPEC2006, NAS Parallel Benchmark (NPB) and Mediabench II. The proposed method performs as good as or better than hardware cache coherence scheme. For example, 4 cores with the hardware coherence mechanism gave us speed up of 2.52 times against 1 core for SPEC2000 'equake', 2.9 times for SPEC2006 'lbm', 3.34 times for NPB 'cg', and 3.17 times for MediaBench II MPEG2 Encoder. The proposed software cache coherence control gave us 2.63 times for 4 cores and 4.37 for 8 cores for 'equake', 3.28 times for 4 cores and 4.76 times for 8 cores for lbm, 3.71 times for 4 cores and 4.92 times for 8 cores for 'MPEG2 Encoder'.