Title: Go language concurrency actual combat: optimization comparison of prime statistical performance (“traditional serial implementation” and “multi-processing concurrency implementation”)
In programming learning, judging prime numbers is a classic algorithm problem. When the amount of data is small, the traditional serial loop is enough to cope; but when the amount of data is expanded to hundreds of thousands or more, how to improve the computing efficiency has become the key.
Based on the Go language, this paper will compare the performance differences between “traditional serial implementation” and “multi-processing concurrent implementation” in the range of prime numbers in the range of 1-200,000, and analyze the principles behind them in depth.
1. Requirements and analysis ideas
Task target:
Count all primes between 1 and 200,000, and compare the time-consuming of different implementations.
1. Traditional method (serial):
Use a for loop, traverse from 1 to 200,000. Each number is called to the Isprime function to judge. All calculations of this method are executed sequentially in the main thread, and cannot take advantage of the multi-core CPU.
2. Concurrent method (parallel – task queue mode):
The core idea is “dynamic load balancing”. We no longer assign a fixed number interval to each coroutine, but establish a “task channel”.
Architecture Design:
– Task Producer: Start a coroutine, which is responsible for placing the numbers 1 to 200,000 into the IntChan (task channel).
– Task consumer (multiple goroutines): Launch 2 goroutines (or adjust according to the CPU core number). These coroutines don’t care which number they are dealing with, they are only responsible for “grabing” tasks from IntChan. Whoever is free, whoever takes a number from the channel to judge.
– Result collection (PrimeChan): If a coroutine determines that the current number is a prime number, write it to PrimeChan.
– Synchronized control (ExitChan): When all the compute coroutines have finished processing all tasks in the channel, the main thread statistics are notified to end.
The advantage of this model is that if a number is particularly time-consuming and will not block other coroutines, other coroutines will continue to receive new tasks from the channel, achieving real dynamic load balancing.
The code comparison
1. Traditional Serial Implementation (Goroutine-Apply02/Main.go)
This implementation is very intuitive:
package main
import (
"fmt"
"time"
)
func main() {
start := time.Now().Unix()
for num := 1; num <= 200000; num++ {
flag := true //假设是素数
//判断num是不是素数
for i := 2; i < num; i++ {
if num%i == 0 { //说明该num不是素数
flag = false
break
}
}
if flag {
//将这个数就放入到primeChan
//primeChan<- num
//将结果输出
//fmt.Printf("素数=%d\n", num)
}
}
end := time.Now().Unix()
fmt.Println("普通的方法耗时=", end-start)
}
2. Multi-coroutine concurrent implementation (goroutine-apply/main.go)
The concurrent implementation utilizes the channel for task distribution:
package main
import (
"fmt"
"time"
)
// 向 intChan 放入 1-200000 个数
func putNum(intChan chan int) {
for i := 1; i <= 200000; i++ {
intChan <- i
}
//关闭intChan
close(intChan)
}
// 从 intChan取出数据,并判断是否为素数,如果是,就放入到 primeChan
func primeNum(intChan chan int, primeChan chan int, exitChan chan bool) {
// 使用 for 循环
// var num int
var flag bool //
for {
//time.Sleep(time.Millisecond * 10)
num, ok := <-intChan //intChan 取不到..
if !ok {
break
}
flag = true //假设是素数
//判断num是不是素数
for i := 2; i < num; i++ {
if num%i == 0 { //说明该num不是素数
flag = false
break
}
}
if flag {
//将这个数就放入到primeChan
primeChan <- num
}
}
fmt.Println("有一个primeNum 协程因为取不到数据,退出")
//这里我们还不能关闭 primeChan
//向 exitChan 写入true
exitChan <- true
}
func main() {
intChan := make(chan int, 1000)
primeChan := make(chan int, 20000) //放入结果
//标识退出的管道
exitChan := make(chan bool, 2) // 2个
start := time.Now().Unix()
//开启一个协程,向 intChan 放入 1-200000 个数
go putNum(intChan)
//开启4个协程,从 intChan 取出数据,并判断是否为素数,如果是,就
//放入到 primeChan
for i := 0; i < 2; i++ {
go primeNum(intChan, primeChan, exitChan)
}
//这里我们主线程,进行处理
//直接
go func() {
for i := 0; i < 2; i++ {
<-exitChan
}
end := time.Now().Unix()
fmt.Println("使用协程耗时=", end-start)
//当我们从 exitChan 取出了4个结果,就可以放心的关闭 prprimeChan
close(primeChan)
}()
//遍历我们的 primeChan ,把结果取出
for {
_, ok := <-primeChan
if !ok {
break
}
//将结果输出
//fmt.Printf("素数=%d\n", res)
}
fmt.Println("main线程退出")
}
3. Performance test results
In my test environment (2 core CPU), the prime number of 1-200,000 is counted, the results are as follows: as shown in Figure 1

– Traditional serial implementation: takes about 25 seconds.
– Multi-channel concurrency implementation: takes about 12 seconds.
It can be seen that the efficiency of concurrent realization has increased by about 50%. In theory, if it is a 4-core CPU, it is expected to take about 6 seconds.
4. In-depth analysis: why is concurrency faster?
1. Dynamic task distribution:
As you have pointed out, PrimeNum-related coroutines are done at the same time. They compete for Num := Range IntChan to compete together. This means that if the CPU has two cores, then there are two coroutines in parallel computing at the same time. When a coroutine calculates a number, it will immediately go to the channel to get one, without waiting for a specific ‘interval’ to end.
2. CPU core utilization:
In serial mode, only one core is running at full speed. In the concurrent mode, the Go runtime scheduler will map these 2 goroutines to the available operating system threads, so that the two cores can participate in the calculation of prime number judgment at the same time, which significantly reduces the total waiting time.
3. Synchronization of channels:
IntChan played the role of ‘task pool’ here. It decouples the generation and consumption of tasks. Regardless of whether the producer is fast or slow, consumers can handle the tasks available at the fastest speed. ExitChan ensures that the main thread is only counted after all subtasks are actually completed, avoiding data omission.
5. Summary and suggestions
Through the case of this prime number, we can draw the following conclusions:
1. Concurrent splitting for CPU-intensive tasks: For a large number of independent computing tasks, the use of multi-core advantages can significantly shorten the total time-consuming.
2. ‘Task Queue’ is more flexible than ‘static sharding’: through the channel distribution task, it can automatically handle the problem of load uneven, and the code scalability is better.
3. Channels are the cornerstone of concurrency security: reasonable use of channel can avoid complex lock operations, making concurrent code more concise and easy to read.
In actual development, if you encounter similar large-scale data processing scenarios, you might as well try Go’s concurrent models, which can often bring you unexpected performance surprises.