Knowledge extraction
Knowledge Extraction is the creation of
The RDB2RDF W3C group is currently standardizing a language for extraction of RDF from relational databases. Another popular example for Knowledge Extraction is the transformation of Wikipedia into structured data and also the mapping to existing
Overview
After the standardization of knowledge representation languages such as
The following criteria can be used to categorize approaches in this topic (some of them only account for extraction from relational databases):
Examples
Entity Linking
- DBpedia Spotlight, OpenCalais, the Zemanta API, and Extractiv analyze free text via Named Entity Recognition and then disambiguates candidates via Name Resolution and links the found entities to the DBpedia knowledge repository (DBpedia Spotlight web demo).
Relational Databases to RDF
- , D2R Server and Virtuoso RDF Views are tools that transform relational databases to RDF. During this process they allow to reuse existing vocabularies and ontologies during the conversion process. When transforming a typical relational table named users, one column (e.g.name) or an aggregation of columns (e.g.first_name and last_name) has to provide the URI of the created entity. Normally the primary key is used. Every other column can be extracted as a relation with this entity. Then properties with formally defined semantics are used (and reused) to interpret the information. For example a column in a user table called marriedTo can be defined as symmetrical relation and a column homepage can be converted to a property from the FOAF Vocabulary called foaf:homepage, thus qualifying it as an inverse functional property. Then each entry of the user table can be made an instance of the class foaf:Person (Ontology Population). Additionally domain knowledge (in form of an ontology) could be created from the status_id, either by manually created rules (if status_id is 2, the entry belongs to class Teacher ) or by (semi)-automated methods (Ontology Learning). Here is an example transformation:
:Peter :marriedTo :Marry . :marriedTo a owl:SymmetricProperty . :Peter foaf:homepage <http://example.org/Peters_page> . :Peter a foaf:Person . :Peter a :Student . :Claus a :Teacher .
Extraction from structured sources to RDF
1:1 Mapping from RDB Tables/Views to RDF Entities/Attributes/Values
When building a RDB representation of a problem domain, the starting point is frequently an entity-relationship diagram (ERD). Typically, each entity is represented as a database table, each attribute of the entity becomes a column in that table, and relationships between entities are indicated by foreign keys. Each table typically defines a particular class of entity, each column one of its attributes. Each row in the table describes an entity instance, uniquely identified by a primary key. The table rows collectively describe an entity set. In an equivalent RDF representation of the same entity set:
So, to render an equivalent view based on RDF semantics, the basic mapping algorithm would be as follows:
- create an RDFS class for each table
- convert all primary keys and foreign keys into IRIs
- assign a predicate IRI to each column
- assign an rdf:type predicate for each row, linking it to an RDFS class IRI corresponding to the table
- for each column that is neither part of a primary or foreign key, construct a triple containing the primary key IRI as the subject, the column IRI as the predicate and the column's value as the object.
Early mentioning of this basic or direct mapping can be found in
Complex mappings of relational databases to RDF
The 1:1 mapping mentioned above exposes the legacy data as RDF in a straightforward way, additional refinements can be employed to improve the usefulness of RDF output respective the given Use Cases. Normally, information is lost during the transformation of an entity-relationship diagram (ERD) to relational tables (Details can be found in
XML
As XML is structured as a tree, any data can be easily represented in RDF, which is structured as a graph.
Survey of Methods / Tools
Knowledge discovery
Knowledge discovery describes the process of automatically searching large volumes of
The most well-known branch of
Another promising application of knowledge discovery is in the area of
Ontology Learning
See also
References
Retrieved from : http://en.wikipedia.org/wiki/Knowledge_extraction