Investigation of the runtime behavior is one of the most important processes for performance tuning on a computer system. Profiling tools have been widely used to detect hot-spots in a program. In addition to them, tracing tools produce valuable information especially from parallelized programs, such as thread scheduling, barrier synchronizations, context switching, thread migration, and jitter by interrupts. Users can optimize a runtime system and hardware configuration in addition to a program itself by utilizing the attained information. However, existing tools provide information per process or per function. Finer information like task-or loop-granularity should be required to understand the program behavior more precisely. This paper has proposed a tracing tool, Annotatable Systrace, to investigate runtime execution behavior of a parallelized program based on an extended Linux ftrace. The Annotatable Systrace can add arbitrary annotations in a trace of a target program. The proposed tool exploits traces from 183.equake, 179.art, and mpeg2enc on Intel Xeon X7560 and ARMv7 as an evaluation. The evaluation shows that the tool enables us to observe load imbalance along with the program execution. It can also generate a trace with the inserted annotations even on a 32-core machine. The overhead of one annotation on Intel Xeon is 1.07 us and the one on ARMv7 is 4.44 us, respectively.