DEV Community

Cover image for How to get Open Graph meta tags in QML
khomin
khomin

Posted on

How to get Open Graph meta tags in QML

One day i needed a solution that could parse meta graph tags from a input line and produce a title and an icon

Of course there were an infinite number of libraries that used jsoup's, but that was not what i needed, i wanted to use qt and c++
I thought as soon i enter my query - "c++ parser meta tags" i will see all the solutions o was looking for
But in reality, everything is a little more complicated.
What i did:
1) Prepare step, parse the input and decide if there is a valid url or just text
This step seems to be expensive (not so much as loading everything from input,
but I think it is too extra work)

static bool checkIsContainsHyperlink(QString line) {
    static QRegularExpression regex(web_pattern);
    QRegularExpressionMatch match = regex.match(line);
    return match.hasMatch();
}
Enter fullscreen mode Exit fullscreen mode

2) Download with the ability to handle redirects
Many sites do not provide tags on simple web pages, and they often use redirect for reasons which i don't know

connect(&m_WebCtrl, SIGNAL (finished(QNetworkReply*)), this, SLOT (fileDownloaded(QNetworkReply*)));
QNetworkRequest request(url);
request.setAttribute(QNetworkRequest::RedirectPolicyAttribute, true);
m_WebCtrl.get(request);
Enter fullscreen mode Exit fullscreen mode

3) Saving the page we downloaded it seems strange, why we save this page is probably surprising you
The problem is that some sites can ban a specific IP, which makes a lot of requests
For me it was enough to change 3-5 symbols in the url line and i got banned for a few minutes
Caching downloaded pages solved this problem

connect(m_downloader_image, &FileDownloader::downloaded, [&, imagePathName]() {
    QByteArray array = m_downloader_image->downloadedData();
    if(!array.isEmpty()) {
        QFile imageFile(imagePathName);
        if(imageFile.open(QIODevice::WriteOnly)) {
            imageFile.write(array);
            m_result.og_image_local_path = imagePathName;
        }
    }
    emit signalParserDone(m_result);
});
Enter fullscreen mode Exit fullscreen mode

4) Parsing
So we have a web-page in the local folder, it's time to parse it and get what we need
Unfortunately, for me, gumbo-parser turned out to be very unfriendly
So for first start i decided to use regex, hoping to change it to something else in the future

QRegularExpression site_name_regex(og_site_name);
QRegularExpression title_regex(og_title);
QRegularExpression description_regex(og_description);
QRegularExpression url_regex(og_url);
QRegularExpression image_regex(og_image);
QRegularExpressionMatch match;

match = site_name_regex.match(html);
if (match.hasMatch()) {
    res.og_site_name = match.captured(1);
}
match = title_regex.match(html);
if (match.hasMatch()) {
    res.og_title = match.captured(1);
}
match = description_regex.match(html);
if (match.hasMatch()) {
    res.og_description = match.captured(1);
}
match = url_regex.match(html);
if (match.hasMatch()) {
    res.og_url = match.captured(1);
}
match = image_regex.match(html);
if (match.hasMatch()) {
    res.og_image = match.captured(1);
}
Enter fullscreen mode Exit fullscreen mode

Finally, we can enter URL-address and enjoy the preview and title

alt text

here is the complete example (github)

Top comments (0)