This post demonstrates how to use the LangChain library to load and save the transcript of a YouTube video. The python script retrieves the video's transcript, prints it, and writes the content to a text file for further use.
let's go through the code line by line:
from langchain.document_loaders import youtube
- This line imports the
youtube
module from thelangchain.document_loaders
package. This module is responsible for handling YouTube-related document loading functionalities.
import io
- This line imports the
io
module from Python's standard library, which provides tools for working with streams and I/O operations.
loader = youtube.YoutubeLoader.from_youtube_url("https://www.youtube.com/watch?v=3OvmwM61vJw")
- This line creates an instance of
YoutubeLoader
by calling thefrom_youtube_url
class method. The method takes a YouTube URL as an argument and initializes theloader
object to handle the video at the specified URL.
docs = loader.load()
- This line calls the
load
method on theloader
object. This method retrieves the document(s) (in this case, probably the transcript or other related data) from the YouTube video and stores them in thedocs
variable.docs
is expected to be a list of document objects.
print(docs)
- This line prints the
docs
variable to the console. This helps in debugging or understanding what data has been loaded from the YouTube video.
with io.open("transcript.txt", "w", encoding="utf-8") as f:
- This line opens a file named
transcript.txt
in write mode with UTF-8 encoding. Thewith
statement ensures that the file is properly opened and will be automatically closed after the indented block of code is executed. The file object is assigned to the variablef
.
for doc in docs:
- This line starts a for loop that iterates over each document object in the
docs
list.
f.write(doc.page_content)
- Within the loop, this line writes the
page_content
attribute of each document object to the filef
. This attribute likely contains the text content of the document (such as the transcript of the YouTube video).
f.close()
- This line closes the file
f
. However, since the file was opened using thewith
statement, it will be closed automatically even if this line is omitted. Including it is redundant but does not cause any issues.
Summary
This code loads the transcript of a YouTube video, prints the loaded documents to the console, and writes the content of these documents to a file named transcript.txt
.
Top comments (3)
Thanks for a quick & easy tutorial!
You're welcome! I'm glad I could help.
A great share!!
I personally created a tool which uses existing transcripts from youtube videos to enhance video learning experience. I have made a post about it here: supaclip.pro