DEV Community

Cover image for A search engine - Part 5: Introducing reflection
David Kröll
David Kröll

Posted on

A search engine - Part 5: Introducing reflection

We'd like the search engine to handle not only strings, but to store any arbitrary struct type. There are different approaches available. We could use a Golang interface type, maybe some already existing ones, or we could also define a new one.

I'm thinking about something like C#'s ToString() method.
In a Golang context, the correct comparison would be the fmt.Stringer interface.

Thanks to Go's automatic interface implementation, the custom type would have to implement the String() method on it, to automatically satisfy this interface.

type Stringer interface {
    String() string
}
Enter fullscreen mode Exit fullscreen mode

Look up the type here: https://pkg.go.dev/fmt#Stringer

The other approach is to use the special type interface{}, the empty interface.
Every type does satisfy this interface. Then one may search through the underlying type for strings and operate on these. This will introduce reflection. I'll go with this approach, since it's the first time I'll use reflection in Go.

We already have a package for this in the standard library. It's called reflect.

First, we'll have to change the datatype of our GlSearch cache slice to the empty interface. In the next step we'll adjust the Add() method.

type GlSearch struct {
    config Config
    cache  []interface{}
    index  map[string][]int
    mu     *sync.Mutex
}

// New creates a new GlSearch instance
func New(c Config) *GlSearch {

    glsearch := GlSearch{
        config: c,
        cache:  []interface{}{},
        index:  map[string][]int{},
        mu:     &sync.Mutex{},
    }

    return &glsearch
}
Enter fullscreen mode Exit fullscreen mode

For sure, we have to update all usages of the cache field to align with the new type, but this is not shown here. The Add() method now supports any interface type to be added. Since the string type does also satisfy interface{} as well, we check it just at the beginning.

// Add extracts all strings on the interface type and adds them seperately
func (g *GlSearch) Add(i interface{}) {
    g.mu.Lock()
    defer g.mu.Unlock()

    v := reflect.ValueOf(i)

    switch v.Type().Kind() {
    case reflect.String:
        // check if the provided interface is already a string
        // if so, use type assertion
        g.addString(i.(string), i)

    case reflect.Struct:
        // loop through all fields
        for j := 0; j < v.NumField(); j++ {
            f := v.Field(j)
            // check if the struct field is a string
            if f.Type().Kind() == reflect.String {
                // add the string
                g.addString(f.String(), i)
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

The method addString() called in the code above is a new one.
It doesn't use the locking, since it is not accessible to other packages (you may also call it private). The first parameter will be used for indexing and the second one is the actual data which should be appended to the cache.

func (g *GlSearch) addString(s string, data interface{}) {
    // the zero-based position our next entry in the cache
    i := len(g.cache)

    // add the underlying interface, not only the data used for indexing
    g.cache = append(g.cache, data)

    // transform our string to build up a useful index
    // split into words using space characters
    words := split(s, g.config.Seperators)

    // remove words that occur very often (stopwords, e.g.: the, now, to, ...)
    // to improve index quality and minimize memory needs
    filterWords(&words, g.config.KeepFunc, g.config.TransformFunc)
    // update the index for every word

    for _, v := range words {

        // append or create if it does not exist already
        if e, ok := g.index[v]; !ok {
            e = []int{i}
            g.index[v] = e
        } else {
            e = append(e, i)
            g.index[v] = e
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

It is no more limited to the string type. Just anything can be added to the cache. The only constraint is, that there are some string fields on the struct attached. Picking the example song earlier in this series, now we could define our custom struct with data in it.

I've chosen a Lyrics type here to demonstrate the functionality (definition not shown here, there are no more fields than shown in the initialization).

func main() {
    search := glsearch.New(glsearch.DefaultConfig)

    l := Lyrics{
        ReleaseYear: 1975,
        Author:      "Queen",
        Intro:       "Is this the real life? Is this just fantasy?",
        Verse1:      "Mama, just killed a man",
        Outro:       "Nothing really matters, anyone can see",
    }

    search.Add(l)

    sr := search.Find("mama")

    fmt.Printf("We searched for %s and found a song released in %d\n",
        // again, using type assertions, since any interface type can now be added
        sr.Filter, sr.Contents[0].(Lyrics).ReleaseYear)

    fmt.Printf("The whole search result would be: %+v\n", sr)

    // Output: We searched for mama and found a song released in 1975
    // The whole search result would be:
    // {Filter:mama Matches:1
    // Contents:[{ReleaseYear:1975
    //          Author:Queen Intro:Is this the real life? Is this just fantasy?
    //          Verse1:Mama, just killed a man
    //          Outro:Nothing really matters, anyone can see}]}
}
Enter fullscreen mode Exit fullscreen mode

As you can see, we are just searching for mama and the whole struct gets returned. We can now access the result struct however we want to.

I hope you enjoyed my series about inverted indizes in Go. For me it was real fun to learn a new data structure and also make use of reflection in Go for the first time. In conclusion I can say that I have learned a lot and I hope you also did. Please do not hesitate to contact me if you see any improvements or mistakes.

Discussion (0)