Troubleshooting and Repairing the Accuracy of WP Label Batch Translation Scripts
problem background
When using the Polylang plugin to synchronize Chinese and English tags, it is found that there is an intermittent inaccurate counting problem in batch translation scripts. The script shows that 8324 tags are processed, but there are actually 8328 Chinese tags in the background, and the results of each execution are not consistent.


problem
源语言标签总数: 8328 准确
已处理新标签: 0 不准确(理论应为 4)
已跳过已有翻译: 9000 不准确(应为 8324)
处理率: 108.07%
antecedent
The first investigation: the problem of pagination
initial use offset Parameters are paginated:
$terms = get_terms([
'taxonomy' => 'post_tag',
'lang' => $source_lang,
'number' => $per_page,
'offset' => $offset,
]);
problem analysis: use offset When paging, creating a new target language tag during processing changes the number of records in the database, resulting in inaccurate paging offsets.
try to fix: to use Paged parametric substitution offset:
$terms = get_terms([
'taxonomy' => 'post_tag',
'lang' => $source_lang,
'number' => $per_page,
'paged' => $page,
]);
Second investigation: the database table does not exist
After the modification, there is a new error:
WordPress database error Table 'wp_polylang_terms' doesn't exist
problem analysis: Directly query the Polylang database table that does not exist, the API function provided by Polylang should be used.
try to fix: Reference projects already available merge-tags.php file, use get_terms() Coordinate lang Parameters get a list of tags.
The third investigation: the count is still inaccurate
even if using Paged Parameters, the count is still inaccurate:
已跳过已有翻译: 9000
problem analysis: use get_terms() During paging queries, the newly created English tags are obtained by subsequent queries, resulting in skipped Count error.
the root of the problem: The English label created during the loop processing process is subsequently get_terms() The query is caught because the query conditions are limited only in the language, but the newly created English tag may be cached or re-indexed.
Final solution
core idea: First get the ID list of all source language tags at one time, and then process based on this static list to avoid subsequent queries being affected by newly created data.
// 先一次性获取所有源语言标签的ID列表
$source_term_ids = get_terms([
'taxonomy' => 'post_tag',
'lang' => $source_lang,
'hide_empty' => false,
'fields' => 'ids',
]);
// 使用 array_chunk 分批处理
foreach (array_chunk($source_term_ids, $per_page) as $batch) {
foreach ($batch as $term_id) {
// 处理每个标签...
}
}
Key improvement point:
- static list: Get the complete source language tag ID list in advance to avoid subsequent queries being polluted by new data
- Array shards: use
array_chunk()Perform memory-friendly batch processing - accurate counting: Counting based on static lists to ensure accurate statistics
Optimized complete code
<?php
if (php_sapi_name() !== 'cli') {
die("❌ 请在命令行运行\n");
}
require __DIR__ . '/wp-config.php';
global $wpdb, $polylang;
$per_page = 1000;
$source_lang = 'zh';
$target_lang = 'en';
$processed = 0;
$skipped = 0;
// 先一次性获取所有源语言标签的ID列表
$source_term_ids = get_terms([
'taxonomy' => 'post_tag',
'lang' => $source_lang,
'hide_empty' => false,
'fields' => 'ids',
]);
$total_terms = count($source_term_ids);
// 分批处理
foreach (array_chunk($source_term_ids, $per_page) as $batch) {
foreach ($batch as $term_id) {
$term = get_term($term_id, 'post_tag');
// 检查是否已有翻译
$target_id = pll_get_term($term_id, $target_lang);
if ($target_id) {
$skipped++;
continue;
}
// 创建翻译标签...
$processed++;
}
}
// 验证阶段
$untranslated_count = 0;
foreach ($source_term_ids as $term_id) {
if (!pll_get_term($term_id, $target_lang)) {
$untranslated_count++;
}
}
repair the effect
After repairing the results of execution:
源语言标签总数: 8328 ✅
已处理新标签: 4 ✅
已跳过已有翻译: 8324 ✅
处理率: 100.00% ✅

Experience summary
- Avoid dynamic paging traps: If the data is modified during the processing process, the use of dynamic paging query should be avoided
- Use static datasets: Get a complete list of IDs in advance to ensure the stability of the processing process
- Depend on the official API: Prioritize the use of the API functions provided by the plugin to avoid direct operation of the database table
- Add verification link: Verify after processing is complete to ensure all data is processed correctly
reference material
This article records the investigation process of label batch translation scripts, from problem discovery to finalization, hoping to help developers who encounter similar problems.