We'd like the search engine to handle not only strings, but to store any arbitrary struct type. There are different approaches available. We could use a Golang interface type, maybe some already existing ones, or we could also define a new one.
I'm thinking about something like C#'s ToString()
method.
In a Golang context, the correct comparison would be the fmt.Stringer
interface.
Thanks to Go's automatic interface implementation, the custom type would have to implement the String()
method on it, to automatically satisfy this interface.
type Stringer interface {
String() string
}
Look up the type here: https://pkg.go.dev/fmt#Stringer
The other approach is to use the special type interface{}
, the empty interface.
Every type does satisfy this interface. Then one may search through the underlying type for strings and operate on these. This will introduce reflection. I'll go with this approach, since it's the first time I'll use reflection in Go.
We already have a package for this in the standard library. It's called reflect
.
First, we'll have to change the datatype of our GlSearch
cache slice to the empty interface. In the next step we'll adjust the Add()
method.
type GlSearch struct {
config Config
cache []interface{}
index map[string][]int
mu *sync.Mutex
}
// New creates a new GlSearch instance
func New(c Config) *GlSearch {
glsearch := GlSearch{
config: c,
cache: []interface{}{},
index: map[string][]int{},
mu: &sync.Mutex{},
}
return &glsearch
}
For sure, we have to update all usages of the cache
field to align with the new type, but this is not shown here. The Add()
method now supports any interface type to be added. Since the string
type does also satisfy interface{}
as well, we check it just at the beginning.
// Add extracts all strings on the interface type and adds them seperately
func (g *GlSearch) Add(i interface{}) {
g.mu.Lock()
defer g.mu.Unlock()
v := reflect.ValueOf(i)
switch v.Type().Kind() {
case reflect.String:
// check if the provided interface is already a string
// if so, use type assertion
g.addString(i.(string), i)
case reflect.Struct:
// loop through all fields
for j := 0; j < v.NumField(); j++ {
f := v.Field(j)
// check if the struct field is a string
if f.Type().Kind() == reflect.String {
// add the string
g.addString(f.String(), i)
}
}
}
}
The method addString()
called in the code above is a new one.
It doesn't use the locking, since it is not accessible to other packages (you may also call it private). The first parameter will be used for indexing and the second one is the actual data which should be appended to the cache.
func (g *GlSearch) addString(s string, data interface{}) {
// the zero-based position our next entry in the cache
i := len(g.cache)
// add the underlying interface, not only the data used for indexing
g.cache = append(g.cache, data)
// transform our string to build up a useful index
// split into words using space characters
words := split(s, g.config.Seperators)
// remove words that occur very often (stopwords, e.g.: the, now, to, ...)
// to improve index quality and minimize memory needs
filterWords(&words, g.config.KeepFunc, g.config.TransformFunc)
// update the index for every word
for _, v := range words {
// append or create if it does not exist already
if e, ok := g.index[v]; !ok {
e = []int{i}
g.index[v] = e
} else {
e = append(e, i)
g.index[v] = e
}
}
}
It is no more limited to the string
type. Just anything can be added to the cache. The only constraint is, that there are some string fields on the struct attached. Picking the example song earlier in this series, now we could define our custom struct with data in it.
I've chosen a Lyrics
type here to demonstrate the functionality (definition not shown here, there are no more fields than shown in the initialization).
func main() {
search := glsearch.New(glsearch.DefaultConfig)
l := Lyrics{
ReleaseYear: 1975,
Author: "Queen",
Intro: "Is this the real life? Is this just fantasy?",
Verse1: "Mama, just killed a man",
Outro: "Nothing really matters, anyone can see",
}
search.Add(l)
sr := search.Find("mama")
fmt.Printf("We searched for %s and found a song released in %d\n",
// again, using type assertions, since any interface type can now be added
sr.Filter, sr.Contents[0].(Lyrics).ReleaseYear)
fmt.Printf("The whole search result would be: %+v\n", sr)
// Output: We searched for mama and found a song released in 1975
// The whole search result would be:
// {Filter:mama Matches:1
// Contents:[{ReleaseYear:1975
// Author:Queen Intro:Is this the real life? Is this just fantasy?
// Verse1:Mama, just killed a man
// Outro:Nothing really matters, anyone can see}]}
}
As you can see, we are just searching for mama
and the whole struct gets returned. We can now access the result struct however we want to.
I hope you enjoyed my series about inverted indizes in Go. For me it was real fun to learn a new data structure and also make use of reflection in Go for the first time. In conclusion I can say that I have learned a lot and I hope you also did. Please do not hesitate to contact me if you see any improvements or mistakes.
Top comments (0)