Investigation and optimization record of a WordPress site 504 error

by 永夜 · 2026/06/02

浏览量： 10

Background

my site www.shuijingwanwq.com It is a WordPress-based content station, with about 20,000 pages, and the average daily traffic is gradually increasing. The server uses Alibaba Cloud ECS (1 core 2GB), and the Web environment uses OneInStack (NGINX + PHP-FPM + Redis + Alibaba Cloud RDS MySQL). The website has been running smoothly until a large area suddenly appears one night 504 Gateway time-out, the background is especially slow and almost impossible to operate. In fact, I have encountered the same problem some time ago, and I have also made a preliminary optimization:My personal blog, today suddenly responded to 504’s troubleshooting process

problem

The front-end pages cannot be opened intermittently, and there are occasional 504 errors in refresh;
Backstage /wp-admin/admin.php Loading is very slow, often timeout;
Alibaba Cloud console display ECS CPU usage persists 100%;
A large number of RDS error logs appear Aborted connection And Got an error reading communication packets.

antecedent

1. Server basic status check

Log in to the server, execute top Loads up to 7.09 (1 core machine),%us more than 80%, multiple php-fpm The process occupies between 13% and 20% of the CPU,redis-server It also takes about 13% of the %id is 0. The remaining memory is about 150MB, and the swap is rarely used.Iostat show disk %util Very low, exclude disk I/O bottlenecks. as shown in Figure 1

Log in to the server, execute the TOP and find that the load is as high as 7.09 (1 core machine), %us exceeds 80%, and multiple php-fpm processes occupy the CPU between 13% and 20%, redis-server It also takes around 13% and %id is 0. The remaining memory is about 150MB, and the swap is rarely used. iostat shows that the disk %util is very low, excluding disk I/O bottlenecks. as shown in Figure 1

2. PHP-FPM configuration analysis

Examine /usr/local/php/etc/php-fpm.conf, found several serious problems:

Slow logs are repeatedly defined and finally overwritten to 0:request_slowlog_timeout Appeared twice, the first time 3s, and then set to 0, resulting in slow logs never take effect.

;;;;;;;;;;;;;;;;;;;;;
; FPM Configuration ;
;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;;;;;;;
; Global Options ;
;;;;;;;;;;;;;;;;;;

[global]
pid = run/php-fpm.pid
error_log = log/php-fpm.log
log_level = warning

emergency_restart_threshold = 30
emergency_restart_interval = 60s
process_control_timeout = 5s
daemonize = yes

;;;;;;;;;;;;;;;;;;;;
; Pool Definitions ;
;;;;;;;;;;;;;;;;;;;;

[www]
listen = /dev/shm/php-cgi.sock
listen.backlog = -1
listen.allowed_clients = 127.0.0.1
listen.owner = www
listen.group = www
listen.mode = 0666
user = www
group = www

pm = ondemand
pm.max_children = 5
request_slowlog_timeout = 3s
slowlog = /usr/local/php/var/log/slow.log
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 3
pm.max_requests = 500
pm.process_idle_timeout = 10s
request_terminate_timeout = 120
request_slowlog_timeout = 0

pm.status_path = /php-fpm_status
slowlog = var/log/slow.log
rlimit_files = 51200
rlimit_core = 0

catch_workers_output = yes
;env[HOSTNAME] = iZ23wv7v5ggZ
env[PATH] = /usr/local/bin:/usr/bin:/bin
env[TMP] = /tmp
env[TMPDIR] = /tmp
env[TEMP] = /tmp

Timeout settings are not coordinated: php-fpm request_terminate_timeout = 120s, and nginx is not set fastcgi_read_timeout(default 60s), which causes Nginx to return 504 first when PHP executes for more than 60 seconds, but PHP-FPM is still executing, waste of resources.

3. Nginx configuration is missing

Site profile /usr/local/nginx/conf/vhost/www.shuijingwanwq.com.conf The PHP processing part is missing in fastcgi_read_timeout, the default 60-second timeout.

4. RDS error log analysis

The RDS console error logs appear a lot:

Aborted connection ... (Got an error reading communication packets)
Got packets out of order

This indicates that the connection between PHP and MySQL is abnormally interrupted, usually because PHP has been executed for too long, exceeding MySQL’s wait_timeout or the PHP process is terminated early. as shown in Figure 3

This indicates that the connection between PHP and MySQL is abnormally interrupted, usually because the PHP execution time is too long, and the wait_timeout or PHP process of MySQL is terminated early. as shown in Figure 3

5. Slow log is empty

Whether it is php slow log (/usr/local/php/var/log/slow.log) or RDS slow query log (console slow log details), no data. The reason is that the PHP slow log is closed, RDS’s long_query_time By default 10 seconds, the current query may take 2 to 5 seconds but not recorded.

Solution

1. Fix PHP-FPM configuration

Use ondemand Mode, suitable for a 1-core small memory machine, and release the process when idle. The final configuration is as follows: as shown in Figure 4

It adopts OnDemand mode, which is suitable for a 1-core small memory machine, and releases the process in idle time. The final configuration is as follows: as shown in Figure 4

;;;;;;;;;;;;;;;;;;;;;
; FPM Configuration ;
;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;;;;;;;
; Global Options ;
;;;;;;;;;;;;;;;;;;

[global]
pid = run/php-fpm.pid
error_log = log/php-fpm.log
log_level = warning

emergency_restart_threshold = 30
emergency_restart_interval = 60s
process_control_timeout = 5s
daemonize = yes

;;;;;;;;;;;;;;;;;;;;
; Pool Definitions ;
;;;;;;;;;;;;;;;;;;;;

[www]
listen = /dev/shm/php-cgi.sock
listen.backlog = -1
listen.allowed_clients = 127.0.0.1
listen.owner = www
listen.group = www
listen.mode = 0666
user = www
group = www

pm = ondemand
pm.max_children = 5
pm.process_idle_timeout = 10s
pm.max_requests = 500

request_terminate_timeout = 120s
request_slowlog_timeout = 3s
slowlog = /usr/local/php/var/log/slow.log

catch_workers_output = yes
env[PATH] = /usr/local/bin:/usr/bin:/bin
env[TMP] = /tmp
env[TMPDIR] = /tmp
env[TEMP] = /tmp

deleted with ondemand conflicted pm.start_servers,pm.min/max_spare_servers.
Ensure that the slow log is unique and effective.

2. nginx increase timeout

Add in PHP processing location:

fastcgi_read_timeout 150s;

Make the Nginx timeout slightly greater than 120 seconds of PHP-FPM, avoiding the Nginx interruption in advance. as shown in Figure 5

3. Adjust RDS Slow Query Threshold

Log in to the RDS console → parameter settings → set long_query_time Change from 100 to 1 second to capture more slow SQL. as shown in Figure 7

Log in to the RDS Console → Parameter Settings → Change LONG_QUERY_TIME from 100 to 1 second to capture more slow SQL. as shown in Figure 7

4. Restart the service

systemctl restart php-fpm
systemctl restart nginx

Result

After the adjustment, the website access back to normal, the CPU usage rate has dropped to the normal level, and the background response has been significantly improved. Connection interruptions in the RDS error log are greatly reduced. Although the RDS slow log still does not find particularly slow SQL, the overall bottleneck has been basically lifted.

Follow-up optimization suggestions

Although it has now been restored, the configuration of 1-core 2GB is still difficult, considering that the volume of data has reached 20,000 articles and the traffic has continued to grow. Here are the following optimization measures that can be taken in the future:

✅ 1. Hardware upgrade (fundamental)

Upgrade ECS configuration: At least 2 cores and 4GB, 4 cores and 8GB are recommended. After upgrading, you can change the PHP-FPM mode to dynamic,pm.max_children Adjusted to 10~15, the concurrent processing capacity has been greatly improved.
RDS Specifications Boost: If slow queries increase or the number of connections is often full, consider upgrading RDS memory or turning on read-only instances.

✅ 2. Speed up static resources with CDN

Alibaba Cloud CDN + OSS combination:

Upload static files such as site pictures, CSS, JS to OSS, and enable CDN acceleration.
Configure WordPress plugins such as Alioss to automatically sync media libraries.
Effect: Reduce the pressure on ECS bandwidth and speed up the access speed of all parts of the country.

✅ 3. In-depth optimization of page cache

W3 Total Cache has been installed, it is recommended to check:

page cache Use Disk Enhancement or Redis.
Object Cache Confirm to use the Redis driver.
database cache Also use Redis to avoid disk I/O.
Turn on opcode cache(such as Zend Opcache).

✅ 4. Database special optimization

periodicize optimize table Optimization wp_posts,wp_postmeta Wait for the big table.
Add indexes to common query fields, such as post_date,meta_key.
Use index wp mysql for speed The plugin automatically analyzes and adds missing indexes.

✅ 5. Monitoring and Alarms

Install Alibaba Cloud monitoring, and set a threshold alarm for the number of CPU, memory, and RDS connections.
Write a daily scheduled script, check the PHP slow log and the RDS slow log, and send an email if a new slow request is found.

✅ 6. Regular cleaning and maintenance

Clean out expired drafts, recycle bin articles, useless postmeta data.
Disable or delete less commonly used plugins to reduce background add-ons.
Configure WP-Cron to use the system crontab trigger instead to check the task queue every time you access.

Summary

The root cause of this 504 failure is Server configuration does not match business scale, plus the wrong setting of php-fpm parameter, resulting in CPU depletion and connection interruption. The problem is solved by correcting the configuration, increasing the timeout, and opening the slow log. In the long run, upgrading hardware and introducing CDN is the only way.

I hope this record can provide reference for friends who encounter similar problems. If you also have experience in WordPress high load optimization, welcome to exchange!

Investigation and optimization record of a WordPress site 504 error

You may also like...

Leave a Reply Cancel reply

Investigation and optimization record of a WordPress site 504 error

WordPress Performance Optimization Notes

Redis oom errors encountered when resolving labels in WordPress + Polylang batch processing

Investigation and optimization record of a WordPress site 504 error

Background

problem

antecedent

1. Server basic status check

2. PHP-FPM configuration analysis

3. Nginx configuration is missing

4. RDS error log analysis

5. Slow log is empty

Solution

1. Fix PHP-FPM configuration

2. nginx increase timeout

3. Adjust RDS Slow Query Threshold

4. Restart the service

Result

Follow-up optimization suggestions

✅ 1. Hardware upgrade (fundamental)

✅ 2. Speed up static resources with CDN

✅ 3. In-depth optimization of page cache

✅ 4. Database special optimization

✅ 5. Monitoring and Alarms

✅ 6. Regular cleaning and maintenance

Summary

You may also like...

The investigation and analysis of the interface response data is constantly changing (within 5 minutes), and the root cause is the deployment problem of Varnish

Error in Go: package_mytest.go:5:2: cannot find package “.” in: e:\wwwroot\go\the-way-to-go\package\pack1

When operating a JSON field in MySQL 5.7, the problem of data loss is processed when concurrent execution (query, modification, reassignment) is processed (use a function JSON_ARRAY_APPEND that modifies the JSON value)

Leave a Reply Cancel reply