My personal blog, today suddenly responded to 504’s troubleshooting process

by 永夜 · 2026/02/25

浏览量： 15

1. My personal blog suddenly responded to 504’s investigation and resolution process today. as shown in Figure 1

My personal blog, suddenly responded today 504

2. 504 Gateway Timeout Description Nginx (or other reverse proxy) waits for the backend (PHP-FPM) response timeout. Check in the following order:

3. Execute the top command, the result is shown as follows: The problem is very clear, and the server is seriously overloaded. Key data: load 19.77 (normally it should be below 1-2), 21 processes are running (too much), the CPU usage is 87.4%, and the memory is 1968MB and only 96MB is available. This is a small machine with 2G memory and has been crushed. as shown in Figure 2

Execute the top command, the result is shown as follows: The problem is very clear, the server is seriously overloaded. Key data: load 19.77 (normally it should be below 1-2), 21 processes are running (too much), the CPU usage is 87.4%, and the memory is 1968MB and only 96MB is available. This is a small machine with 2G memory and has been crushed.


top - 15:21:01 up 993 days,  3:25,  1 user,  load average: 18.68, 19.55, 19.55
Tasks: 139 total,  20 running, 119 sleeping,   0 stopped,   0 zombie
%Cpu(s): 88.0 us,  9.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  1.3 hi,  1.7 si,  0.0 st
MiB Mem :   1968.6 total,     76.8 free,   1481.2 used,    410.6 buff/cache
MiB Swap:   2048.0 total,   1816.6 free,    231.4 used.    262.7 avail Mem

3. Exit the top and execute: ps aux –sort=-%cpu | head -20, see what process is eating resources. The reason was found. 15+ PHP-FPM subprocesses run at the same time, each occupying 110MB of memory, and php-fpm alone has eaten nearly 1.6GB, and the server has only 2GB of memory in total, which is directly bursting.


[root@iZ23wv7v5ggZ ~]# ps aux --sort=-%cpu | head -20
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
www      2097069  5.6  5.4 583184 109280 ?       S    14:41   2:29 php-fpm: pool www
www      2097086  5.6  5.3 583376 108720 ?       R    14:43   2:23 php-fpm: pool www
www      2097087  5.6  5.5 585100 111712 ?       R    14:43   2:22 php-fpm: pool www
www      2097088  5.6  5.4 583136 109108 ?       S    14:43   2:22 php-fpm: pool www
www      2097093  5.6  5.4 583132 109292 ?       R    14:44   2:18 php-fpm: pool www
www      2097094  5.6  5.4 583132 109344 ?       R    14:44   2:18 php-fpm: pool www
www      2097109  5.6  5.4 583372 109752 ?       R    14:45   2:16 php-fpm: pool www
www      2097125  5.6  5.4 583116 109200 ?       R    14:47   2:09 php-fpm: pool www
www      2097162  5.6  5.4 583324 109732 ?       R    14:49   2:01 php-fpm: pool www
www      2097187  5.6  5.4 583244 109848 ?       R    14:53   1:48 php-fpm: pool www
www      2097281  5.6  5.4 583124 109308 ?       R    15:03   1:17 php-fpm: pool www
www      2097299  5.6  5.4 583052 109328 ?       R    15:05   1:09 php-fpm: pool www
www      2097341  5.6  5.4 583116 109408 ?       S    15:10   0:52 php-fpm: pool www
www      2097671  5.6  5.5 583052 112304 ?       R    15:18   0:24 php-fpm: pool www
www      2097746  5.5  5.5 582836 111876 ?       S    15:24   0:02 php-fpm: pool www
www      2097747  5.4  5.5 582852 111988 ?       R    15:25   0:01 php-fpm: pool www
root     1007745  3.2  1.0 154220 20344 ?        Ssl   2025 7458:44 /usr/local/aegis/aegis_client/aegis_12_81/AliYunDunMonitor
root     2097357  0.5  7.7 516416 156836 ?       RNs  15:11   0:04 /usr/libexec/platform-python /usr/bin/dnf makecache --timer
root     1007719  0.3  0.3  99308  6048 ?        Ssl   2025 756:25 /usr/local/aegis/aegis_client/aegis_12_81/AliYunDun
[root@iZ23wv7v5ggZ ~]#

4. Emergency recovery first, then check the root cause. First look at how many php-fpm processes there are in total. A total of 17.


[root@iZ23wv7v5ggZ ~]# ps aux | grep php-fpm | grep -v grep | wc -l
17

5. Edit the php-fpm configuration, the server only has 2GB memory, modify it to run 5 subprocesses, edit the file /usr/local/php/etc/php-fpm.conf. as shown in Figure 3

Edit php-fpm configuration, the server only has 2GB of memory, modify it to run 5 subprocesses, edit the file /usr/local/php/etc/php-fpm.conf

Before modification:


pm = dynamic
pm.max_children = 16
pm.start_servers = 10
pm.min_spare_servers = 8
pm.max_spare_servers = 16
pm.max_requests = 2048

After modification:


pm = dynamic
pm.max_children = 5
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 3
pm.max_requests = 500

6. Restart after saving: execute the command: as shown in Figure 4


[root@iZ23wv7v5ggZ ~]# service php-fpm restart
[root@iZ23wv7v5ggZ ~]# sudo systemctl restart php-fpm

7. Execute the top command, the result is displayed as follows. The memory has been restored, from 96MB idle to 1214MB, and the PHP-FPM adjustment has taken effect. The load is also decreasing (19.77 → 7.97). But the CPU is still 85.4%, and there are still 8 processes running, which is not normal.


top - 15:55:11 up 993 days,  3:59,  1 user,  load average: 7.97, 11.01, 15.32
Tasks: 124 total,   8 running, 116 sleeping,   0 stopped,   0 zombie
%Cpu(s): 85.4 us,  9.9 sy,  0.0 ni,  1.0 id,  0.0 wa,  1.0 hi,  2.6 si,  0.0 st
MiB Mem :   1968.6 total,   1214.1 free,    302.7 used,    451.8 buff/cache
MiB Swap:   2048.0 total,   1878.8 free,    169.2 used.   1469.1 avail Mem

8. Let’s see what is eating the CPU: Execute the command: ps aux –sort=-%cpu | head -10 , all 5 php-fpm processes are full, indicating that there are a lot of continuous requests to hit my server.


[root@iZ23wv7v5ggZ ~]# ps aux --sort=-%cpu | head -10
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
www      2098124 17.0  3.1 452724 63280 ?        R    15:57   0:06 php-fpm: pool www
www      2098123 16.8  2.9 452404 59716 ?        R    15:57   0:06 php-fpm: pool www
www      2098122 16.7  3.0 452868 61800 ?        R    15:57   0:06 php-fpm: pool www
www      2098125 16.7  2.9 452404 59712 ?        R    15:57   0:06 php-fpm: pool www
www      2098126 16.6  3.0 452732 60772 ?        R    15:57   0:06 php-fpm: pool www
root     1007745  3.2  1.0 154220 20160 ?        Ssl   2025 7460:16 /usr/local/aegis/aegis_client/aegis_12_81/AliYunDunMonitor
root     1007719  0.3  0.2  99308  5772 ?        Ssl   2025 756:31 /usr/local/aegis/aegis_client/aegis_12_81/AliYunDun
www      2040873  0.2  1.4 186640 29872 ?        R    Feb17  24:56 nginx: worker process
root         754  0.1  0.2 555732  4480 ?        Ssl   2023 2189:58 /usr/libexec/platform-python -Es /usr/sbin/tuned -l -P
[root@iZ23wv7v5ggZ ~]#

9. Check the access log of the virtual host of my blog, the single IP of the blog has little traffic, but the IP The segment is very concentrated (222.167.251.x, 143.20.219.x, 66.92.14.x), which looks like a distributed crawler. But each IP has only 2 requests, and it is unlikely that they will blow up the server.


[root@iZ23wv7v5ggZ ~]# tail -200 /data/wwwlogs/www.shuijingwanwq.com_nginx.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -20
2 66.92.14.197
2 222.167.251.94
2 222.167.251.83
2 222.167.251.69
2 222.167.251.248
2 222.167.251.161
2 222.167.251.155
2 222.167.251.100
2 143.20.219.38
2 143.20.219.227
2 143.20.219.208
2 143.20.219.199
2 136.0.94.36
2 13.53.45.144
1 66.92.14.92
1 66.92.14.90
1 66.92.14.85
1 66.92.14.8
1 66.92.14.77
1 66.92.14.75

10. It may be caused by the traffic of other virtual hosts. Check it out: to see which log file is most frequently written recently. The main traffic is concentrated on my blog (21MB log) and access_nginx.log (7MB).


&lt;h1>看哪个日志文件最近写入最频繁&lt;/h1>
ls -lt /data/wwwlogs/*.log | head -10

-rw-r--r-- 1 www root 21765969 Feb 25 16:01 /data/wwwlogs/www.shuijingwanwq.com_nginx.log
-rw-r--r-- 1 www root  7330312 Feb 25 16:01 /data/wwwlogs/access_nginx.log
-rw-r--r-- 1 www root     2189 Feb 25 15:47 /data/wwwlogs/error_nginx.log
-rw-r--r-- 1 www root    48532 Feb 25 15:40 /data/wwwlogs/learn-php-app-0605-prod.wangqiang.store_nginx.log
-rw-r--r-- 1 www root   213953 Feb 25 11:03 /data/wwwlogs/tym-jammerall.shuijingwanwq.com_nginx.log
-rw-r--r-- 1 www root        0 Jun 16  2023 /data/wwwlogs/learn-php-app-0605-prod.shuijingwanwq.com_nginx.log
-rw-r--r-- 1 www root        0 Jun  8  2023 /data/wwwlogs/fanxiapp-wangqiang-larabbs.shuijingwanwq.com_nginx.log

11. Look at the traffic source of access_nginx.log and found it! 183.129.189.60 This IP accounts for 177 of the 200 logs. Block this IP first:


[root@iZ23wv7v5ggZ ~]# tail -200 /data/wwwlogs/access_nginx.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -20
177 183.129.189.60
10 121.196.223.20
2 77.83.39.167
2 45.153.34.187
2 38.46.221.123
2 204.76.203.69
1 47.99.50.249
1 43.156.202.34
1 43.130.31.17
1 43.130.16.212
1 205.210.31.11

12. Add a rule to the security group of the Alibaba Cloud console: log in to the Alibaba Cloud console → ecs → security group → find the security group corresponding to my instance → add rules → select ‘adhere’ in the direction, select ‘reject’ in the policy, and fill in the source address 183.129.189.60, the port range is filled with -1/-1 (all ports). as shown in Figure 5

Add a rule to the security group of the Alibaba Cloud console: log in to the Alibaba Cloud console → ECS → security group → find the security group corresponding to my instance → add the rule → select the "in the direction", the policy selects "reject", and the source address is filled 183.129.189.60, fill in the port range -1/-1 (all ports)

13. See who is still in a lot of requests now, 183.129.189.60 is still brushing! The security group rules may not take effect. Immediately block with iptables. as shown in Figure 6

See who is still a lot of requests, 183.129.189.60 is still brushing! The security group rules may not take effect. Immediately block with iptables


[root@iZ23wv7v5ggZ ~]# tail -500 /data/wwwlogs/access_nginx.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -10
473 183.129.189.60
10 121.196.223.20
3 81.29.142.6
2 77.83.39.167
2 45.153.34.187
2 38.46.221.123
2 204.76.203.69
1 47.99.50.249
1 43.156.202.34
1 43.130.31.17
[root@iZ23wv7v5ggZ ~]#

14. Execute the following command and return 0, indicating that the ban was successful.


[root@iZ23wv7v5ggZ ~]# sleep 30 &amp;amp;&amp;amp; tail -100 /data/wwwlogs/access_nginx.log | grep "183.129.189.60" | wc -l
73

15. Since iptables does not work, it is sealed on the nginx level. It is directly blocked in the virtual host configuration corresponding to access_nginx.log. Find which virtual host it is. Then directly edit nginx.conf, after saving, restart nginx. as shown in Figure 7

Since iptables doesn't work, it is blocked on the nginx level. It is directly blocked in the virtual host configuration corresponding to access_nginx.log. Find which virtual host it is. Then directly edit nginx.conf, after saving, restart nginx.


[root@iZ23wv7v5ggZ ~]# grep -rl "access_nginx.log" /usr/local/nginx/conf/
/usr/local/nginx/conf/nginx.conf
[root@iZ23wv7v5ggZ ~]# vi /usr/local/nginx/conf/nginx.conf
[root@iZ23wv7v5ggZ ~]# service nginx restart
Redirecting to /bin/systemctl restart nginx.service
[root@iZ23wv7v5ggZ ~]#

16. Real-time monitoring, wait 30 seconds to see if there is any new one. If there is no output, the ban has taken effect. Confirm that it has taken effect. However, the load is still high and the blog is still responding 504


[root@iZ23wv7v5ggZ ~]# timeout 30 tail -f /data/wwwlogs/access_nginx.log | grep "183.129.189.60"

17. The load is stable at 7.6 and does not continue to decline. There are other things that are eating resources. Take a look at the current situation:


&lt;h1>当前 PHP-FPM 进程状态&lt;/h1>
[root@iZ23wv7v5ggZ ~]# ps aux | grep "php-fpm" | grep -v grep | wc -l
6
&lt;h1>实时看有没有那个 IP 的新请求&lt;/h1>
[root@iZ23wv7v5ggZ ~]# tail -20 /data/wwwlogs/access_nginx.log | awk '{print $4, $1}'
[25/Feb/2026:15:26:57 121.196.223.20
[25/Feb/2026:15:26:57 121.196.223.20
[25/Feb/2026:15:26:57 121.196.223.20
[25/Feb/2026:15:26:57 121.196.223.20
[25/Feb/2026:15:27:07 121.196.223.20
[25/Feb/2026:15:27:07 121.196.223.20
[25/Feb/2026:15:27:17 121.196.223.20
[25/Feb/2026:15:27:17 121.196.223.20
[25/Feb/2026:15:28:33 47.99.50.249
[25/Feb/2026:15:34:01 45.153.34.187
[25/Feb/2026:15:49:35 43.156.202.34
[25/Feb/2026:15:56:04 38.46.221.123
[25/Feb/2026:16:01:12 38.46.221.123
[25/Feb/2026:16:12:08 81.29.142.6
[25/Feb/2026:16:12:12 81.29.142.6
[25/Feb/2026:16:12:16 81.29.142.6
[25/Feb/2026:16:20:52 185.242.226.113
[25/Feb/2026:16:45:12 34.158.168.101
[25/Feb/2026:16:45:13 34.158.168.101
[25/Feb/2026:16:45:14 34.158.168.101
&lt;h1>博客最近的请求&lt;/h1>
[root@iZ23wv7v5ggZ ~]# tail -20 /data/wwwlogs/www.shuijingwanwq.com_nginx.log | awk '{print $4, $1, $7}' | tail -10
[25/Feb/2026:16:56:34 143.20.219.229 /robots.txt
[25/Feb/2026:16:56:34 143.20.219.183 /tag/develop
[25/Feb/2026:16:56:34 222.167.251.252 /robots.txt
[25/Feb/2026:16:56:34 66.92.14.190 /robots.txt
[25/Feb/2026:16:56:34 222.167.251.15 /robots.txt
[25/Feb/2026:16:56:34 66.92.14.195 /robots.txt
[25/Feb/2026:16:56:34 136.0.94.155 /robots.txt
[25/Feb/2026:16:56:34 66.92.14.151 /robots.txt
[25/Feb/2026:16:56:35 143.20.219.77 /robots.txt
[25/Feb/2026:16:56:35 136.0.94.158 /robots.txt
&lt;h1>我的服务器是几核的&lt;/h1>
[root@iZ23wv7v5ggZ ~]# nproc
1
[root@iZ23wv7v5ggZ ~]#

18. Find the root cause. Single-core CPU + a large number of distributed crawlers. My blog was requested by a dozen different IPs at the same second, these IPs Segments (143.20.219.x, 222.167.251.x, 66.92.14.x, 136.0.94.x) are distributed crawlers. A single-core machine can’t hold it at all. Block these crawler IP segments in batches in Nginx:


deny 183.129.189.60;
deny 143.20.219.0/24;
deny 222.167.251.0/24;
deny 66.92.14.0/24;
deny 136.0.94.0/24;

19. Finally, restart the ECS, and then observe the situation in the last few days. Whether 504 will still appear no. The problem has been solved. Summarize what I did:

PHP-FPM MAX_CHILDREN dropped from 16 to 5 (adapted to single-core 2GB)
Nginx blocks 5 malicious IP segments
Restart ECS to clear the backlog process

20. The accident is summarized as follows:

Direct reason
IP 183.129.189.60 launched a malicious security scan on the server, violent detection /core/static/ Sensitive files (settings.ini, secret.sql, database.json, etc.) under the path. Four distributed crawler IPs at the same time Segments (143.20.219.x, 222.167.251.x, 66.92.14.x, 136.0.94.x) request my blog in large numbers concurrently in the same second. These requests are all processed by PHP-FPM.
fundamental reason
My server is single-core CPU, 2GB memory, but php-fpm is configured with max_children = 16, start_servers = 10. Each PHP-FPM subprocess takes up about 110MB of memory, 16 processes = 1.76GB, and almost all 2GB of memory is consumed. This configuration has just supported normal traffic in the past 993 days, but has been in a ‘critical state’ and has no margin to deal with burst traffic.
avalanche process
Malicious scan + crawler traffic influx → 16 php-fpm processes are all full → memory is exhausted, the system starts to use swap → swap on disk, io slows → Each request processing time becomes longer → the process cannot be released in time → new request queued → nginx wait for the php-fpm response timeout → return 504 gateway timeout. The CPU load soared from 0.5 to 19.77, forming a vicious circle.
solution
PHP-FPM MAX_CHILDREN has been reduced from 16 to 5, adapting to a single-core 2GB hardware configuration. The Nginx level blocks malicious scan IP and four crawler IP segments, and the blocked requests directly return to 403, and no longer occupy the php-fpm process. Restart ECS clears the backlog of stuck process.
Lessons lessons
The number of processes in PHP-FPM must match the hardware resources of the server, and the single-core 2GB machine can run up to 5 processes. 993 days without restarting or adjusting the configuration, long-term resource critical state is the biggest hidden danger. The server should have basic protective measures, such as Nginx current limiting (LIMIT_REQ_ZONE) and malicious IP automatically ban, rather than running naked.

My personal blog, today suddenly responded to 504’s troubleshooting process

You may also like...

Leave a Reply Cancel reply

My personal blog, today suddenly responded to 504’s troubleshooting process

You may also like...

After deploying based on LNMP 2.1, the interface responds to 500 because the configuration of .user.ini is incorrect

The design and thinking of the table structure of the uploaded resource file, and the reconstruction of the overall process

Generate src/frontend/views/use-case/view-mobile.php~ in phpstorm, how to cancel?

Leave a Reply Cancel reply