DEV Community


Extractive QA with txtai

davidmezzetti profile image David Mezzetti ・2 min read

This article is part of a tutorial series on txtai, an AI-powered search engine.

In Parts 1 through 4, we gave a general overview of txtai, the backing technology and examples of how to use it for similarity searches. This article builds on that and extends to building extractive question-answering systems.

Install dependencies

Install txtai and all dependencies.

pip install txtai
Enter fullscreen mode Exit fullscreen mode

Create an Embeddings and Extractor instances

The Embeddings instance is the main entrypoint for txtai. An Embeddings instance defines the method used to tokenize and convert a segment of text into an embeddings vector.

The Extractor instance is the entrypoint for extractive question-answering.

Both the Embeddings and Extractor instances take a path to a transformer model. Any model on the Hugging Face model hub can be used in place of the models below.

from txtai.embeddings import Embeddings
from txtai.extractor import Extractor

# Create embeddings model, backed by sentence-transformers & transformers
embeddings = Embeddings({"method": "transformers", "path": "sentence-transformers/bert-base-nli-mean-tokens"})

# Create extractor instance
extractor = Extractor(embeddings, "distilbert-base-cased-distilled-squad")
Enter fullscreen mode Exit fullscreen mode
data = ["Giants hit 3 HRs to down Dodgers",
        "Giants 5 Dodgers 4 final",
        "Dodgers drop Game 2 against the Giants, 5-4",
        "Blue Jays 2 Red Sox 1 final",
        "Red Sox lost to the Blue Jays, 2-1",
        "Blue Jays at Red Sox is over. Score: 2-1",
        "Phillies win over the Braves, 5-0",
        "Phillies 5 Braves 0 final",
        "Final: Braves lose to the Phillies in the series opener, 5-0",
        "Final score: Flyers 4 Lightning 1",
        "Flyers 4 Lightning 1 final",
        "Flyers win 4-1"]

questions = ["What team won the game?", "What was score?"]

execute = lambda query: extractor([(question, query, question, False) for question in questions], data)

for query in ["Red Sox - Blue Jays", "Phillies - Braves", "Dodgers - Giants", "Flyers - Lightning"]:
    print("----", query, "----")
    for answer in execute(query):

# Ad-hoc questions
question = "What hockey team won?"

print("----", question, "----")
print(extractor([(question, question, question, False)], data))
Enter fullscreen mode Exit fullscreen mode
---- Red Sox - Blue Jays ----
('What team won the game?', 'Blue Jays')
('What was score?', '2-1')

---- Phillies - Braves ----
('What team won the game?', 'Phillies')
('What was score?', '5-0')

--------- Dodgers - Giants ----
('What team won the game?', 'Giants')
('What was score?', '5-4')

--------- Flyers - Lightning ----
('What team won the game?', 'Flyers')
('What was score?', '4-1')

--------- What hockey team won? ----
[('What hockey team won?', 'Flyers')]
Enter fullscreen mode Exit fullscreen mode

Discussion (0)

Forem Open with the Forem app