Skip to content

Add alternative to text search in SearchClans#58

Open
jpholanda wants to merge 6 commits intotopfreegames:masterfrom
edersondonde:master
Open

Add alternative to text search in SearchClans#58
jpholanda wants to merge 6 commits intotopfreegames:masterfrom
edersondonde:master

Conversation

@jpholanda
Copy link
Copy Markdown

@jpholanda jpholanda commented Mar 27, 2020

The current clan search implementation uses the text search provided by Mongo. However, it turns out that if the clan name doesn't have any alphabetic characters, the text search will not find it, as it was made to find words. As such, the only way to find clans with such names is via regex search, which is the proposed alternative. You can choose which search to use via the new query parameter useRegexSearch, where useRegexSearch=true means use regex search and useRegexSearch=false means use text search (if non-existent, defaults to false).

Comment thread models/clan.go Outdated
filter = bson.M{"$text": bson.M{"$search": term}}
} else {
escapedTerm := fmt.Sprintf(`\Q%s\E`, term)
filter = bson.M{"name": bson.M{"$regex": escapedTerm, "$options": "i"}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query is very inefficient, be careful. It was like this before I changed it to use text index!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

@jpholanda jpholanda Mar 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that, however the way we currently do it is by essentially by-passing khan to execute that search. Text search doesn't work as intended for our purposes.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be acceptable to change to case-sensitive then?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For case sensitive regular expression queries, if an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan.

As I understand, the search is still O(N) but with a cheaper constant, because you load index data instead of the documents themselves, which is supposed to be faster... I think only load tests would tell if it's really okay. But, as I remember, most queries would take more than 2 seconds with the regex of this patch...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wasn't excluding case-sensitivity. Sorry if it wasn't clear.
\Q and \E are used to escape the term, in the sense of interpreting literally all the characters in between. https://www.pcre.org/current/doc/html/pcre2syntax.html#SEC2

Got it. Still can't tell if it will have an impact or not... But the MongoDB docs do not have any warnings on that, so I guess it's okay. Anyway, only load tests will really tell if a solution is good or not.

Copy link
Copy Markdown
Author

@jpholanda jpholanda Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble figuring out exactly what to do for those load tests. Am I supposed to write a new config file? How do I know what values would be reasonable? And after perfoming those load tests, how will I know if the result is acceptable or not?

Copy link
Copy Markdown
Contributor

@matheuscscp matheuscscp Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Setup a new instance of Khan API to load test it. New instances of Redis, Postgres and MongoDB will also be necessary for this API to work for your purposes, because clan insertions on Postgres will be dispatched to MongoDB through Redis.
  2. Make sure this new API instance will send its latency metrics to somewhere (Datadog, Graphana...). You're interested in the latency metric of the Search Clans endpoint.
  3. Drill this API somehow (from your local machine or from a cluster) using the load tests command-line tool I created. For this step you just need to copy/change the tool's configs. What you want here is to create a lot of clans first, and then drill the Search Clans endpoint. For that you can first set the probability of createClan to 1 and everything else to 0, then later set the probability of searchClans to 1 and everything else to 0.
  4. After drilling the Search Clans endpoint, analyze its latency metric along the experiment. You want to know if this metric did not reach absurd values, like 2 seconds. Also, check the databases' metrics (CPU, etc.) to see if they were okay.

Someone in your team can certainly help you with the details in each of these steps. Also, refer to the docs of the load tests command-line tool, I'm pretty sure I explained all its configs there. If something is not clear, feel free to ask me!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leohahn you probably want to read this thread

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @matheuscscp. We'll first see if the load test is really required since it is already running on one of our games, its 95p latency seems to be around 200-300ms.

@jpholanda
Copy link
Copy Markdown
Author

I think the postgres container in travis-ci is broken, probably because of the same issue I fixed in the docker-compose for testing.

@matheuscscp
Copy link
Copy Markdown
Contributor

matheuscscp commented Mar 27, 2020

Don't forget to make sure the index on name exists in all games. Since we don't have a MongoDB migrations tool, it would be a good idea to add the creation of these indexes here: https://github.com/topfreegames/khan/blob/master/cmd/migrate_mongo.go#L128

Comment thread api/clan.go Outdated
start := time.Now()
gameID := c.Param("gameID")
term := c.QueryParam("term")
useTextSearchStr := c.QueryParam("useTextSearch")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally default boolean variables are false. I think having useRegexSearch defaulting to false is more intuitive.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not only more intuitive, but also probably more safe, given we don't really know if regex search is acceptable before having results of load tests.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread api/clan_test.go Outdated
"encoding/json"
"fmt"
"net/http"
netUrl "net/url"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
netUrl "net/url"
neturl "net/url"

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread lib/interface.go Outdated
UpdateClan(context.Context, *ClanPayload) (*Result, error)
UpdatePlayer(context.Context, string, string, interface{}) (*Result, error)
SearchClans(context.Context, string) (*SearchClansResult, error)
SearchClans(context.Context, string, ...bool) (*SearchClansResult, error)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to avoid using multiple bool arguments, otherwise you may have the following function call:

SearchClans(ctx, "", false, true, true, true, false)

It is not readable at all.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned below, it is meant to be a single optional argument, but I agree. I will follow your suggestion to use enums instead.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread lib/lib.go Outdated
func (k *Khan) SearchClans(ctx context.Context, clanName string, useTextSearchOpt ... bool) (*SearchClansResult, error) {
useTextSearch := true
if len(useTextSearchOpt) > 0 {
useTextSearch = useTextSearchOpt[0]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have an array of booleans if you only use the first one?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to have an optional boolean argument, so as to be backwards compatible. Would it be better to disregard this?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, if you want backwards compatibility, maybe add a new method?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread models/clan.go
filter = bson.M{"$text": bson.M{"$search": term}}
} else {
escapedTerm := fmt.Sprintf(`^\Q%s\E`, term)
filter = bson.M{"name": bson.M{"$regex": escapedTerm}}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golang Regex engine is guaranteed to run at linear time based on the length of the input.
https://golang.org/pkg/regexp/

The regexp implementation provided by this package is guaranteed to run in time linear in the size of the input. (This is a property not guaranteed by most open source implementations of regular expressions.)

Do you know if Mongo guarantees the same?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implication if it does not follow is that the user can pass some exponential time regexes: https://stackoverflow.com/questions/8887724/why-can-regular-expressions-have-an-exponential-running-time

Copy link
Copy Markdown
Author

@jpholanda jpholanda Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mongo uses PCRE, which does not have such guarantees. However, the regex /^\Q...\E/ is a prefix expression with no operators, and according to this https://docs.mongodb.com/manual/reference/operator/query/regex/#index-use,

/^a/ can stop scanning after matching the prefix

so it should be pretty efficient.
Will try to confirm with the load test.

Comment thread models/clan_test.go Outdated
err := testing.CreateClanNameTextIndexInMongo(GetTestMongo, player.GameID)
Expect(err).NotTo(HaveOccurred())
clans, err := SearchClan(testDb, testMongo, player.GameID, "SEARCH", 10)
clans, err := SearchClan(testDb, testMongo, player.GameID, "SEARCH", 10, true)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to avoid raw boolean values on function calls. Adding an enumeration is easier to understand.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jpholanda jpholanda force-pushed the master branch 2 times, most recently from 748dd2e to bde9cd5 Compare March 31, 2020 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants