DocumentToSpeech

Use this node in document retrieval pipelines to convert text Documents into SpeechDocuments. The document's content is read out into an audio file.

This node is experimental because of the dataclasses it uses (SpeechDocument). Bear in mind that that they might change in the future.


Position in a Pipeline	The last node in a document search pipeline, after a Retriever in a single-Retriever pipeline; or at the end of an indexing pipeline, before the Document Store
Input	Document
Output	SpeechDocument
Classes	DocumentToSpeech

Usage

To initialize DocumentToSpeech, run:

from haystack.nodes import DocumentToSpeech

model_name = 'espnet/kan-bayashi_ljspeech_vits'
answer_dir = './generated_audio_answers'

audio_document = DocumentToSpeech(model_name_or_path=model_name, generated_audio_dir=answer_dir)

To use DocumentToSpeech in a pipeline, run:

from haystack.nodes import DocumentToSpeech

retriever = BM25Retriever(document_store=document_store)
document2speech = DocumentToSpeech(
    model_name_or_path="espnet/kan-bayashi_ljspeech_vits",
    generated_audio_dir=Path(__file__).parent / "audio_documents",
    )

audio_pipeline = Pipeline()
audio_pipeline.add_node(retriever, name="Retriever", inputs=["Query"])
audio_pipeline.add_node(document2speech, name="DocumentToSpeech", inputs=["Retriever"])

Here's an example of an indexing pipeline with DocumentToSpeech:

file_paths = [p for p in Path(documents_path).glob("**/*")]

indexing_pipeline = Pipeline()

classifier = FileTypeClassifier()
indexing_pipeline.add_node(classifier, name="classifier", inputs=["File"])

text_converter = TextConverter(remove_numeric_tables=True)
indexing_pipeline.add_node(text_converter, name="text_converter", inputs=["classifier.output_1"])

preprocessor = PreProcessor(
        clean_whitespace=True,
        clean_empty_lines=True,
        split_length=100,
        split_overlap=50,
        split_respect_sentence_boundary=True,
)
indexing_pipeline.add_node(preprocessor, name="preprocessor", inputs=["text_converter"])

doc2speech = DocumentToSpeech(model_name_or_path="espnet/kan-bayashi_ljspeech_vits", generated_audio_dir=Path("./audio_documents"))
indexing_pipeline.add_node(doc2speech, name="doc2speech", inputs=["preprocessor"])

document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")
indexing_pipeline.add_node(document_store, name="document_store", inputs=["doc2speech"])

indexing_pipeline.run(file_paths=file_paths, meta=files_metadata)

Stars

5172

Edit on GitHub

Start a Discussion!

Usage