3D ball tracking is a critical function in manyapplications such as game and players behavior analysis, andreal time implementation has become increasingly importantfor it can be used for live broadcast and TV contents. To reacha high accuracy, algorithms usually are time consuming due toa large set of calculations which is challenging to meet realtime demanding. This paper proposes multiple commandqueues, tactical threads allocation and stepped iterativeaddition to empower such a capacity on the CPU-GPUplatform. Multiple command queues achieves a parallelismbetween tasks in the algorithm. Secondly, the tactical threadsallocation helps mapping the algorithm into GPU andenhances synchronism between threads. And this paperproposes stepped iterative addition to achieve partialparallelism in a sequential operation. This work implements inan Intel Core i7-6700 GPU and AMD Radeon R9 FURY GPU.Tracking speed of our work increases 37.8 times from original431ms to 11.7ms while the success rate of the algorithm retainsover 99%. This result fully meets the requirement of 16.6msper frame for 60fps video real-time tracking.