DEV Community 👩‍💻👨‍💻

MartinJ
MartinJ

Posted on

Searching pdf files - Coding a Google custom search engine (gcse) component in React

Last reviewed : Sept 2022

Introduction

Most large organisations will hold huge archives of pdf documents. Searching for information in these represents a major challenge.

In fact, I've no idea how you might set about this under your own steam. Fortunately, Google's custom search engine (gcse) facility makes the task a five minute job.

The gcse is a little known but quite amazingly useful piece of Google magic - one that should be at the top of the toolbox for anyone who has responsibility for document archive management. Basically it allows you to target the entire might of the Google search engine at a specified folder element in your url. And had you realised that Google searches handle pdf files as easily as rendered websites?

The first step in setting up the procedure is to use the Programmable Search Engine Homepage to specify your search target - see Getting started with Programmable Search Engine

For this, all you need is a Google account. The instructions referenced above will then enable you to register a search engine, named with a tag that you choose yourself and keyed on a unique Search Engine Id supplied by Google.

Coding a GCSE reference in a webapp

To use your gcse in a conventional Javascript webapp, all you need is the following tiny packet of javascript:

<script src="https://cse.google.com/cse.js?cx=" + mySearchengineId></script>
<div class="gcse-search"></div>
Enter fullscreen mode Exit fullscreen mode

The effect of this will be to display a text-search input field and a search icon button.

Submitting a search specification will then typically present the results in a popup window (depending on your choice of gcse layout).

This arrangement has worked flawlessly for me in the past - the only time I ever encountered a problem was when <td> and <th> styles in my webapp's stylesheets collided with Google's use of these elements in its script. This was easily fixed by qualifying my styles with a classname.

But I was initially stumped when I wanted to use a cse in a React webapp. The Google script is just a complicated heap of html, but this needs to be inserted into the contents of a component's "return" statement. I cannot reference the script itelf directly and for reason that are beyond my understanding, attemts to emulate its content and the mechansim they use to communicate with <div class="gcse-search"></div> proved unsuccessful.

Information on the web on this is patchy but eventually I hit on a sandbox registered by khrismuc at React cascading select.

This uses a React useEffect hook to reach into the DOM and invoke Google's cse script. All I had to do in my in particular case (where I was searching a folder of newsletter pdfs for my application) was to create myself a component as follows:

import React, { useEffect } from "react";

function NewsletterGcseSearch() {
    useEffect(() => {
        const script = document.createElement("script");
        document.head.append(script);
        script.src = "https://cse.google.com/cse.js?cx=00111 .... 6146:di0ylihvlxu";
    }, []);

    return (
            <div className="gcse-search"></div>
    );
}

export { NewsletterGcseSearch };
Enter fullscreen mode Exit fullscreen mode

Here, obviously, I've obscured my Search Engine Id, but I'm sure you'll get the idea. All I then had to do then was to import the component into my webapp and render it as <NewsletterGcseSearch/>

Thank you Google (and khrismuc) - much obliged.

Top comments (0)

In defense of the modern web

I expect I'll annoy everyone with this post: the anti-JavaScript crusaders, justly aghast at how much of the stuff we slather onto modern websites; the people arguing the web is a broken platform for interactive applications anyway and we should start over;

React users; the old guard with their artisanal JS and hand authored HTML; and Tom MacWright, someone I've admired from afar since I first became aware of his work on Mapbox many years ago. But I guess that's the price of having opinions.