没事看了下盘古分词,用自己的实例看下基本写法吧
PanGu.Match.MatchOptions options = PanGu.Setting.PanGuSettings.Config.MatchOptions.Clone();
PanGu.Match.MatchParameter parameters = PanGu.Setting.PanGuSettings.Config.Parameters.Clone();
这两句是选项,具体属性可以看它的文档,这里全部用默认。
PanGu.Segment.Init();
PanGu.Match.MatchOptions options = PanGu.Setting.PanGuSettings.Config.MatchOptions.Clone();
PanGu.Match.MatchParameter parameters = PanGu.Setting.PanGuSettings.Config.Parameters.Clone();
Segment segment = new Segment();
ICollection<WordInfo> words = segment.DoSegment(articContent.Text, options, parameters);
StringBuilder wordsString = new StringBuilder();
List<string> list = new List<string>();
foreach (WordInfo wordInfo in words)
{
if (wordInfo == null)
{
continue;
}
if (wordInfo.Word.Length < 2) //去掉分词结果小于2个字符的
{
continue;
}
if (wordInfo.Rank < 2) //去掉分词结果权重小于2的
{
continue;
}
//wordInfo.Word, wordInfo.Position, wordInfo.Rank
if (!list.Contains(wordInfo.Word)) //用来去掉重复
{
if (Regex.IsMatch(wordInfo.Word, "[\u4E00-\u9FFF]+")) //只添加汉字
{
list.Add(wordInfo.Word);
wordsString.Append(wordInfo.Word + ",");
}
}
}
珂珂的个人博客 - 一个程序猿的个人网站