Haystack docs home page

Question Answering on a Knowledge Graph

Haystack allows loading and querying knowledge graphs. In particular, Haystack can:

  • Load an existing knowledge graph given as a .ttl file
  • Execute SPARQL queries on a knowledge graph
  • Execute text queries on the knowledge graph by translating them to SPARQL queries with the help of a pre-trained seq2seq model

Haystack's knowledge graph functionalities are still in a very early stage. Thus, don't expect our exemplary tutorial to work on your custom dataset out-of-the-box. We recommend using the InMemoryKnowledgeGraph and Text2SparqlRetriever. If you would like to use GraphDB, please refer to the tutorial.

InMemoryKnowledgeGraph

InMemoryKnowledgeGraph is a triple store similar to Haystack's document stores. It does not require any extra installation or setup beyond installing Haystack.

kg = InMemoryKnowledgeGraph(index="tutorial_10_index")

Indices can be deleted and created with InMemoryKnowledgeGraph.delete_index() and InMemoryKnowledgeGraph.create_index().

InMemoryKnowledgeGraph can load an existing knowledge graph represented in the form of a .ttl file with the method InMemoryKnowledgeGraph.import_from_ttl_file(index, path), where path points to a ttl file starting with something like:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix hp: <https://deepset.ai/harry_potter/> .
hp:Gryffindor hp:source_url "https://harrypotter.fandom.com/wiki/Gryffindor"^^xsd:string .
hp:Gryffindor rdf:type hp:House_ .
hp:Gryffindor hp:name hp:Gryffindor .
hp:Gryffindor hp:founder hp:Godric_gryffindor .
...

InMemoryKnowledgeGraph.get_all_triples() returns all loaded triples in the form of subject, predicate, and object. It is helpful to check whether the loading of a .ttl file was successful.

InMemoryKnowledgeGraph.query(sparql_query) executes SPARQL queries on the knowledge graph. However, we usually do not want to use this method directly but use it through a retriever.

Text2SparqlRetriever

Text2SparqlRetriever can execute SPARQL queries translated from text but also any other custom SPARQL queries. Currently, it is the only implementation of the BaseGraphRetriever class. Internally, Text2SparqlRetriever uses a pre-trained BART model to translate text questions to queries in SPARQL format.

Text2SparqlRetriever.retrieve(query) can be called with a text query, which is then automatically translated to a SPARQL query.

Text2SparqlRetriever._query_kg(sparql_query) can be called with a SPARQL query.

Trying Question Answering on Knowledge Graphs with Custom Data

If you want to use your custom data you would first need to have your custom knowledge graph in the format of a .ttl file. You can load your custom graph and execute SPARQL queries with Text2SparqlRetriever._query_kg(sparql_query).

If you're using abbreviations of namespaces in a GraphDBKnowledgeGraph, you will need to pass them in the following format:

prefixes = """PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX hp: <https://deepset.ai/harry_potter/>
"""
kg.prefixes = prefixes

If you suspect you are having issues because of abbreviations of namespaces not mapped correctly, you can always try to execute a SPARQL query with the full namespace:

Text2SparqlRetriever._query_kg(sparql_query="select distinct ?obj where { <https://deepset.ai/harry_potter/Hermione_granger> <https://deepset.ai/harry_potter/patronus> ?obj . }")

instead of using the abbreviated form:

Text2SparqlRetriever._query_kg(sparql_query="select distinct ?obj where { hp:Hermione_granger hp:patronus ?obj . }")

If you would like to translate text queries to SPARQL queries for your custom data and use Text2SparqlRetriever.retrieve(query), there is significantly more effort necessary. We provide an exemplary pre-trained model in our tutorial. One limitation is that this pre-trained model can only generate questions about resources it has seen during training. Otherwise, it cannot translate the name of the resource to the identifier used in the knowledge graph. For example, it can translate "Harry" to "hp:Harry_potter" only because we trained it to do so.

Unfortunately, our pre-trained model for translating text queries does not work with your custom data. Instead, you need to train your own model. It needs to be trained according to the seq2seq example for summarization with BART in transformers. Haystack currently does not support the training of text2sparql models. We dont have concrete plans to extend the functionality, but we are more than open to contributions. Don't hesitate to reach out!