当前位置>主页 > 期刊在线 > 信息技术 >

信息技术2019年5期

知识图谱系统研发
袁若瀛
(电子科技大学,四川 成都 610054)

摘  要:本工程要实现一个关于“动物”的知识图谱系统,用来描述“动物”的各种实体和概念,以及它们之间的强关系,我们用SPO 三元组(Subject-Predicate-Object)去描述两个实体间的关联,简单理解就是“实体- 实体关系- 实体”,例如,猫和猫科动物之间的关系是科,用“猫- 科- 猫科”来表示,把实体看作结点,实体关系看作一条边,那么就可以构建一个庞大的关于“动物”的知识图。构建“动物”知识图谱需要动物的实体和实体间关系,这些数据需要通过网络爬虫技术从网上获取,但网上获取的数据是文本形式,不能直接使用,所以需要用到知识抽取技术,本文使用基于句法依存关系的方法,实现了提取文本中的实体及实体间关系,然后以三元组的形式将这种关系表现出来,最后将生成的SPO 三元组存入Neo4j 图数据库,形成一个“动物”知识图谱。


关键词:知识图谱;网络爬虫;知识抽取;SPO 三元组;NEO4J 图数据库



中图分类号:TP391.1        文献标识码:A        文章编号:2096-4706(2019)05-0013-05


Research and Development of Knowledge Mapping System

YUAN Ruoying

(University of Electronic Science and Technology of China,Chengdu 610054,China)

Abstract:This project is to implement a knowledge atlas system about “animals” to describe the various entities and concepts of “animals” and the strong relationship between them. We use SPO triple (Subject-Predicate-Object) to describe the relationship between two entities. The simple understanding is that the relationship between (entity-entity relationship-entity) such as cats and catamount is science. We use “cat-family-cat family” to express it. We regard entities as nodes and entity relations as one side. Then we can construct a huge knowledge map about animals. Constructing the knowledge map of “animal” requires the relationship between animal entities and entities,and these data need to be obtained from the internet through web crawler technology,but the data obtained on the internet is in the form of text. It can't be used directly,so we need to use knowledge extraction technology. This paper uses the method based on the syntactic dependency relation to extract the relationship between entities in the text,and then express the relationship in the form of triples. Finally,the generated SPO triples need to be stored in Neo4j graph database to form an “animal” knowledge map.

Keywords:knowledge map;web crawler;knowledge extraction;SPO triple;NEO4J graph database


参考文献:

[1] 王延领.python 3.x 爬虫基础——Requersts,Beautiful-Soup4(bs4) [EB/OL].http://www.cnblogs.com/kmonkeywyl/p/8482962.html,2018-04-03.

[2] Pelhans. 知识图谱入门(三)知识抽取 [EB/OL].https://blog.csdn.net/pelhans/article/details/80020309,2018-04-20.

[3] MihaiWang.Python 操作Neo4j 的基本操作 [EB/OL].https://blog.csdn.net/wmh13262227870/article/details/77842513,2017-09-04.


作者简介:袁若瀛(1998.06-),男,汉族,山东菏泽人人, 本科,主要研究方向:大数据、机器学习。