没事看了下盘古分词,用自己的实例看下基本写法吧
PanGu.Match.MatchOptions options = PanGu.Setting.PanGuSettings.Config.MatchOptions.Clone();
PanGu.Match.MatchParameter parameters = PanGu.Setting.PanGuSettings.Config.Parameters.Clone();
这两句是选项,具体属性可以看它的文档,这里全部用默认。
PanGu.Segment.Init(); PanGu.Match.MatchOptions options = PanGu.Setting.PanGuSettings.Config.MatchOptions.Clone(); PanGu.Match.MatchParameter parameters = PanGu.Setting.PanGuSettings.Config.Parameters.Clone(); Segment segment = new Segment(); ICollection<WordInfo> words = segment.DoSegment(articContent.Text, options, parameters); StringBuilder wordsString = new StringBuilder(); List<string> list = new List<string>(); foreach (WordInfo wordInfo in words) { if (wordInfo == null) { continue; } if (wordInfo.Word.Length < 2) //去掉分词结果小于2个字符的 { continue; } if (wordInfo.Rank < 2) //去掉分词结果权重小于2的 { continue; } //wordInfo.Word, wordInfo.Position, wordInfo.Rank if (!list.Contains(wordInfo.Word)) //用来去掉重复 { if (Regex.IsMatch(wordInfo.Word, "[\u4E00-\u9FFF]+")) //只添加汉字 { list.Add(wordInfo.Word); wordsString.Append(wordInfo.Word + ","); } } }