珂珂的个人博客 - 一个程序猿的个人网站

盘古分词的用法

没事看了下盘古分词,用自己的实例看下基本写法吧

    PanGu.Match.MatchOptions options = PanGu.Setting.PanGuSettings.Config.MatchOptions.Clone();
    PanGu.Match.MatchParameter parameters = PanGu.Setting.PanGuSettings.Config.Parameters.Clone();
这两句是选项,具体属性可以看它的文档,这里全部用默认。

PanGu.Segment.Init();

        PanGu.Match.MatchOptions options = PanGu.Setting.PanGuSettings.Config.MatchOptions.Clone();
        PanGu.Match.MatchParameter parameters = PanGu.Setting.PanGuSettings.Config.Parameters.Clone();

        Segment segment = new Segment();
        ICollection<WordInfo> words = segment.DoSegment(articContent.Text, options, parameters);
        StringBuilder wordsString = new StringBuilder();
        List<string> list = new List<string>();
        foreach (WordInfo wordInfo in words)
        {
            if (wordInfo == null)
            {
                continue;
            }
            if (wordInfo.Word.Length < 2) //去掉分词结果小于2个字符的
              {
                continue;
            }
            if (wordInfo.Rank < 2)  //去掉分词结果权重小于2的
            {
                continue;
            }
            //wordInfo.Word, wordInfo.Position, wordInfo.Rank
            if (!list.Contains(wordInfo.Word))   //用来去掉重复
            {
                if (Regex.IsMatch(wordInfo.Word, "[\u4E00-\u9FFF]+")) //只添加汉字
                {
                    list.Add(wordInfo.Word);
                    wordsString.Append(wordInfo.Word + ",");
                }
            }
            
        }

 


上一篇:jUploader上传

下一篇:个人代码全部开源


0 评论

查看所有评论

给个评论吧