中文分詞的基本原理
The basic principles of Chinese word segmentation
(1)字符串匹配分詞法。
(1) String matching segmentation method.
該分詞法又分為正向大匹配法、反向大匹配法和短路徑分詞法。
This segmentation method is further divided into forward large matching method, reverse large matching method, and short path segmentation method.
舉個(gè)例子:
For example:
“不知道你在說(shuō)什么”:采用正向大匹配法分詞結果是“不知道,你,在,說(shuō)什么”。反向大匹配法分詞結果是“不,知道,你在,說(shuō),什么”。短路徑分詞結果是“不知道,你在,說(shuō)什么”。
"I don't know what you're saying": The result of using the positive big matching method for word segmentation is "I don't know what you're saying.". The result of the reverse big matching method for word segmentation is "no, I know, you're here, say, what". The result of short path segmentation is "I don't know, what are you saying?".
(2)詞義分詞法。
(2) Semantic segmentation.
這種分詞法其實(shí)就是一種機器判斷分詞方法。原理很簡(jiǎn)單,就是行句法、語(yǔ)義分析,然后利用句法信息和語(yǔ)義信息來(lái)處理歧義現象從而達到分詞的目的。
This segmentation method is actually a machine judgment segmentation method. The principle is very simple, which is to first perform syntactic and semantic analysis, and then use syntactic and semantic information to handle ambiguity and achieve the goal of word segmentation.
(3)統計分詞法。
(3) Statistical word segmentation.
這種分詞法很簡(jiǎn)單,就是根據詞組的統計,根據兩個(gè)相鄰的字出現的頻率的多少來(lái)確定這個(gè)詞的重要性以達到分詞的目的。
This segmentation method is very simple, which is to determine the importance of a word based on the frequency of its occurrence, according to the statistics of phrases, in order to achieve the goal of segmentation.
中文分詞的SEO優(yōu)化方法
SEO optimization methods for Chinese word segmentation
中文分詞是按照關(guān)鍵詞的組合進(jìn)行拆分,用戶(hù)在搜索某個(gè)關(guān)鍵詞時(shí),搜索引擎的做法是先返回用戶(hù)搜索的整個(gè)關(guān)鍵詞,然后再返回拆分后的關(guān)鍵詞結果。
Chinese word segmentation is based on the combination of keywords. When a user searches for a certain keyword, the search engine's approach is to first return the entire keyword searched by the user, and then return the split keyword result.
也就是說(shuō)中文分詞的優(yōu)化更多的將那些被分隔之后多個(gè)關(guān)鍵詞重新組合成另一個(gè)可以包含他們的一個(gè)新關(guān)鍵詞,這樣做的原因是:①可以避免關(guān)鍵詞堆砌,②增加多個(gè)關(guān)鍵詞信息,③一個(gè)關(guān)鍵詞帶有更多的信息量。
That is to say, the optimization of Chinese word segmentation focuses more on recombining multiple separated keywords into a new keyword that can contain them. The reason for doing so is: ① to avoid keyword stacking, ② to increase the information of multiple keywords, and ③ to add more information to one keyword.
中文分詞SEO優(yōu)化注意事項
Chinese word segmentation SEO optimization considerations
(1)信息量領(lǐng)域要高度相關(guān)。
(1) The field of information content should be highly relevant.
有時(shí)候為了將一個(gè)關(guān)鍵詞的信息量大限度的挖掘,可能會(huì )進(jìn)行一些錯誤的組合,這樣的優(yōu)化可能沒(méi)有什么用,反而對優(yōu)化不利。
Sometimes, in order to maximize the information content of a keyword, incorrect combinations may be made, which may not be useful and may be detrimental to optimization.
信息量是達到了想要的數量,但是精準度卻太過(guò)于分散,這樣不利于關(guān)鍵詞的權重集中。
The amount of information has reached the desired level, but the accuracy is too scattered, which is not conducive to the concentration of keyword weights.
(2)頁(yè)面關(guān)鍵詞和分詞不相關(guān)。
(2) The page keywords and segmentation are not related.
在標題的關(guān)鍵詞里面分詞做得很,但是頁(yè)面中卻沒(méi)有相關(guān)的分詞,這樣對于其中的某些分詞就不會(huì )有什么效果。
The segmentation in the keywords of the title is excellent, but there are no relevant segmentation on the page, so it will not have much effect on some of the segmentation.
(3)內容優(yōu)化做精準關(guān)鍵詞,避免使用分詞優(yōu)化。
(3) Optimize content with precise keywords and avoid using segmentation optimization.
一般情況下,我建議在做長(cháng)尾詞優(yōu)化時(shí)避免使用中文分詞,除了首頁(yè)、欄目列表和特定的內容聚合專(zhuān)題頁(yè),一般不建議使用分詞。
In general, I suggest avoiding using Chinese word segmentation when optimizing long tail words. Except for the homepage, column list, and specific content aggregation topic pages, it is generally not recommended to use word segmentation.
原因是分詞的優(yōu)化有難度,對于一般的編輯或長(cháng)尾詞頁(yè)面,我們應該集中精力去做一個(gè)關(guān)鍵詞就行,要是涵蓋的信息量太多,就會(huì )分散我們想要優(yōu)化關(guān)鍵詞的權重。
The reason is that optimizing word segmentation is difficult. For general editing or long tail word pages, we should focus on creating a keyword. If the amount of information covered is too much, it will scatter the weight of the keywords we want to optimize.