在 PHP 中检测 HTML 中是否存在特定标签的实现

1、将一篇文稿发布至微信的时候,微信响应失败:错误代码:45166,错误信息:invalid content hint: [pzRHda0286d228] rid: 61041ffe-5cefd269-6c9611ff。

2、最终分析得出,文稿内容中包含标签:mpvoice 所导致。文稿内容如下:

内容contentcontentcontentcontentcontentcontentcontentcontentcontentcontentcontentcontent<mpvoice class=\"js_editor_audio audio_iframe js_uneditable\" src=\"/cgi-bin/readtemplate?t=tmpl/audio_tmpl&amp;name=%E4%B8%80%E4%B8%AA%E4%BA%BA%EF%BC%8C%E4%B9%9F%E6%8C%BA%E5%A5%BD&amp;play_length=05:22\" isaac2=\"1\" low_size=\"612.81\" source_size=\"612.8\" high_size=\"2524.76\" name=\"一个人,也挺好\" play_length=\"322000\" voice_encode_fileid=\"MzI1NjA0MDg2M18yNjUwNzYzNTcz\" data-topic_id=\"1693506805106065417\" data-topic_name=\"夜听精选\" data-pluginname=\"insertaudio\" style=\"margin-top: 50px; margin-right: 0px; margin-bottom: 0px; max-width: 100%;\"></mpvoice>

3、最终决定在调用微信接口之前,先检测 HTML 中是否存在标签:mpvoice,如果存在,则不在走后续的流程,以节省用户等待时间

4、最终实现代码如下:

$mpvoicePattern = "/<mpvoice([^>]+)>(?:(?!<\/mpvoice>)[\s\S])*<\/mpvoice>/s";
$audioPattern = "/<audio([^>]+)>(?:(?!<\/audio>)[\s\S])*<\/audio>/s";
preg_match($mpvoicePattern, $item['content'], $mpvoiceMatches);
preg_match($audioPattern, $item['content'], $audioMatches);
if (!empty($mpvoiceMatches)) {
    $this->addError($attribute, Yii::t('error', Yii::t('error', Yii::t('error', '202247'), ['tag' => 'mpvoice'])));
    break;
}
if (!empty($audioMatches)) {
    $this->addError($attribute, Yii::t('error', Yii::t('error', Yii::t('error', '202247'), ['tag' => 'audio'])));
    break;
}

5、分别打印 $mpvoiceMatches、$audioMatches 结果如下

内容contentcontentcontentcontentcontentcontentcontentcontentcontentcontentcontentcontent<mpvoice class=\"js_editor_audio audio_iframe js_uneditable\" src=\"/cgi-bin/readtemplate?t=tmpl/audio_tmpl&amp;name=%E4%B8%80%E4%B8%AA%E4%BA%BA%EF%BC%8C%E4%B9%9F%E6%8C%BA%E5%A5%BD&amp;play_length=05:22\" isaac2=\"1\" low_size=\"612.81\" source_size=\"612.8\" high_size=\"2524.76\" name=\"一个人,也挺好\" play_length=\"322000\" voice_encode_fileid=\"MzI1NjA0MDg2M18yNjUwNzYzNTcz\" data-topic_id=\"1693506805106065417\" data-topic_name=\"夜听精选\" data-pluginname=\"insertaudio\" style=\"margin-top: 50px; margin-right: 0px; margin-bottom: 0px; max-width: 100%;\"></mpvoice><img class=\"rich_pages __bg_gif\" data-galleryid=\"\" data-ratio=\"0.304552590266876\" data-src=\"https://image.dev.chinamcloud.cn/cms/sjrmt/upload/HttpImage/mrtpsc/2021/08/06/39d0ca9927ed480da0068e1b6cee87cd.gif\" data-type=\"image\" data-w=\"637\" _width=\"637px\" src=\"https://image.dev.chinamcloud.cn/cms/sjrmt/upload/HttpImage/mrtpsc/2021/08/06/39d0ca9927ed480da0068e1b6cee87cd.gif\" data-order=\"0\" alt=\"\" data-fail=\"0\" style=\"width: 100%; visibility: visible ;\" title=\"\"><figure><figcaption>Listen to the T-Rex:</figcaption><audio controls src=\"/media/cc0-audio/t-rex-roar.mp3\">Your browser does not support the<code>audio</code> element.</audio></figure>
Array
(
    [0] => <mpvoice class="js_editor_audio audio_iframe js_uneditable" src="/cgi-bin/readtemplate?t=tmpl/audio_tmpl&amp;name=%E4%B8%80%E4%B8%AA%E4%BA%BA%EF%BC%8C%E4%B9%9F%E6%8C%BA%E5%A5%BD&amp;play_length=05:22" isaac2="1" low_size="612.81" source_size="612.8" high_size="2524.76" name="一个人,也挺好" play_length="322000" voice_encode_fileid="MzI1NjA0MDg2M18yNjUwNzYzNTcz" data-topic_id="1693506805106065417" data-topic_name="夜听精选" data-pluginname="insertaudio" style="margin-top: 50px; margin-right: 0px; margin-bottom: 0px; max-width: 100%;"></mpvoice>
    [1] =>  class="js_editor_audio audio_iframe js_uneditable" src="/cgi-bin/readtemplate?t=tmpl/audio_tmpl&amp;name=%E4%B8%80%E4%B8%AA%E4%BA%BA%EF%BC%8C%E4%B9%9F%E6%8C%BA%E5%A5%BD&amp;play_length=05:22" isaac2="1" low_size="612.81" source_size="612.8" high_size="2524.76" name="一个人,也挺好" play_length="322000" voice_encode_fileid="MzI1NjA0MDg2M18yNjUwNzYzNTcz" data-topic_id="1693506805106065417" data-topic_name="夜听精选" data-pluginname="insertaudio" style="margin-top: 50px; margin-right: 0px; margin-bottom: 0px; max-width: 100%;"
)
Array
(
    [0] => <audio controls src="/media/cc0-audio/t-rex-roar.mp3">Your browser does not support the<code>audio</code> element.</audio>
    [1] =>  controls src="/media/cc0-audio/t-rex-roar.mp3"
)

6、当仅在内容中包含 <mpvoice、/mpvoice>、audio 等文本内容时,而不是封闭且完整的 HTML 标签,则不会触发正则的匹配。如图1

图1

 
内容contentcontentcontentcontentcontentcontentcontentcontentcontentcontentcontentcontent<mpvoice class=\"js_editor_audio audio_iframe js_uneditable\" src=\"/cgi-bin/readtemplate?t=tmpl/audio_tmpl&amp;name=%E4%B8%80%E4%B8%AA%E4%BA%BA%EF%BC%8C%E4%B9%9F%E6%8C%BA%E5%A5%BD&amp;play_length=05:22\" isaac2=\"1\" low_size=\"612.81\" source_size=\"612.8\" high_size=\"2524.76\" name=\"一个人,也挺好\" play_length=\"322000\" voice_encode_fileid=\"MzI1NjA0MDg2M18yNjUwNzYzNTcz\" data-topic_id=\"1693506805106065417\" data-topic_name=\"夜听精选\" data-pluginname=\"insertaudio\" style=\"margin-top: 50px; margin-right: 0px; margin-bottom: 0px; max-width: 100%;\">/mpvoice><img class=\"rich_pages __bg_gif\" data-galleryid=\"\" data-ratio=\"0.304552590266876\" data-src=\"https://image.dev.chinamcloud.cn/cms/sjrmt/upload/HttpImage/mrtpsc/2021/08/06/39d0ca9927ed480da0068e1b6cee87cd.gif\" data-type=\"image\" data-w=\"637\" _width=\"637px\" src=\"https://image.dev.chinamcloud.cn/cms/sjrmt/upload/HttpImage/mrtpsc/2021/08/06/39d0ca9927ed480da0068e1b6cee87cd.gif\" data-order=\"0\" alt=\"\" data-fail=\"0\" style=\"width: 100%; visibility: visible ;\" title=\"\"><figure><figcaption>Listen to the T-Rex:</figcaption>audio controls src=\"/media/cc0-audio/t-rex-roar.mp3\">Your browser does not support the<code>audio</code> element.</audio></figure>
Array
(
    [0] => <mpvoice class="js_editor_audio audio_iframe js_uneditable" src="/cgi-bin/readtemplate?t=tmpl/audio_tmpl&amp;name=%E4%B8%80%E4%B8%AA%E4%BA%BA%EF%BC%8C%E4%B9%9F%E6%8C%BA%E5%A5%BD&amp;play_length=05:22" isaac2="1" low_size="612.81" source_size="612.8" high_size="2524.76" name="一个人,也挺好" play_length="322000" voice_encode_fileid="MzI1NjA0MDg2M18yNjUwNzYzNTcz" data-topic_id="1693506805106065417" data-topic_name="夜听精选" data-pluginname="insertaudio" style="margin-top: 50px; margin-right: 0px; margin-bottom: 0px; max-width: 100%;"></mpvoice>
    [1] =>  class="js_editor_audio audio_iframe js_uneditable" src="/cgi-bin/readtemplate?t=tmpl/audio_tmpl&amp;name=%E4%B8%80%E4%B8%AA%E4%BA%BA%EF%BC%8C%E4%B9%9F%E6%8C%BA%E5%A5%BD&amp;play_length=05:22" isaac2="1" low_size="612.81" source_size="612.8" high_size="2524.76" name="一个人,也挺好" play_length="322000" voice_encode_fileid="MzI1NjA0MDg2M18yNjUwNzYzNTcz" data-topic_id="1693506805106065417" data-topic_name="夜听精选" data-pluginname="insertaudio" style="margin-top: 50px; margin-right: 0px; margin-bottom: 0px; max-width: 100%;"
)
Array
(
    [0] => <audio controls src="/media/cc0-audio/t-rex-roar.mp3">Your browser does not support the<code>audio</code> element.</audio>
    [1] =>  controls src="/media/cc0-audio/t-rex-roar.mp3"
)

 

 

永夜