Add alternative to text search in SearchClans#58
Add alternative to text search in SearchClans#58jpholanda wants to merge 6 commits intotopfreegames:masterfrom
Conversation
| filter = bson.M{"$text": bson.M{"$search": term}} | ||
| } else { | ||
| escapedTerm := fmt.Sprintf(`\Q%s\E`, term) | ||
| filter = bson.M{"name": bson.M{"$regex": escapedTerm, "$options": "i"}} |
There was a problem hiding this comment.
This query is very inefficient, be careful. It was like this before I changed it to use text index!
There was a problem hiding this comment.
There was a problem hiding this comment.
I realize that, however the way we currently do it is by essentially by-passing khan to execute that search. Text search doesn't work as intended for our purposes.
There was a problem hiding this comment.
Would it be acceptable to change to case-sensitive then?
There was a problem hiding this comment.
For case sensitive regular expression queries, if an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan.
As I understand, the search is still O(N) but with a cheaper constant, because you load index data instead of the documents themselves, which is supposed to be faster... I think only load tests would tell if it's really okay. But, as I remember, most queries would take more than 2 seconds with the regex of this patch...
There was a problem hiding this comment.
Yes, I wasn't excluding case-sensitivity. Sorry if it wasn't clear.
\Q and \E are used to escape the term, in the sense of interpreting literally all the characters in between. https://www.pcre.org/current/doc/html/pcre2syntax.html#SEC2
Got it. Still can't tell if it will have an impact or not... But the MongoDB docs do not have any warnings on that, so I guess it's okay. Anyway, only load tests will really tell if a solution is good or not.
There was a problem hiding this comment.
I'm having trouble figuring out exactly what to do for those load tests. Am I supposed to write a new config file? How do I know what values would be reasonable? And after perfoming those load tests, how will I know if the result is acceptable or not?
There was a problem hiding this comment.
- Setup a new instance of Khan API to load test it. New instances of Redis, Postgres and MongoDB will also be necessary for this API to work for your purposes, because clan insertions on Postgres will be dispatched to MongoDB through Redis.
- Make sure this new API instance will send its latency metrics to somewhere (Datadog, Graphana...). You're interested in the latency metric of the Search Clans endpoint.
- Drill this API somehow (from your local machine or from a cluster) using the load tests command-line tool I created. For this step you just need to copy/change the tool's configs. What you want here is to create a lot of clans first, and then drill the Search Clans endpoint. For that you can first set the probability of
createClanto 1 and everything else to 0, then later set the probability ofsearchClansto 1 and everything else to 0. - After drilling the Search Clans endpoint, analyze its latency metric along the experiment. You want to know if this metric did not reach absurd values, like 2 seconds. Also, check the databases' metrics (CPU, etc.) to see if they were okay.
Someone in your team can certainly help you with the details in each of these steps. Also, refer to the docs of the load tests command-line tool, I'm pretty sure I explained all its configs there. If something is not clear, feel free to ask me!
There was a problem hiding this comment.
Thanks @matheuscscp. We'll first see if the load test is really required since it is already running on one of our games, its 95p latency seems to be around 200-300ms.
|
I think the postgres container in travis-ci is broken, probably because of the same issue I fixed in the docker-compose for testing. |
|
Don't forget to make sure the index on |
| start := time.Now() | ||
| gameID := c.Param("gameID") | ||
| term := c.QueryParam("term") | ||
| useTextSearchStr := c.QueryParam("useTextSearch") |
There was a problem hiding this comment.
Generally default boolean variables are false. I think having useRegexSearch defaulting to false is more intuitive.
There was a problem hiding this comment.
It's not only more intuitive, but also probably more safe, given we don't really know if regex search is acceptable before having results of load tests.
| "encoding/json" | ||
| "fmt" | ||
| "net/http" | ||
| netUrl "net/url" |
There was a problem hiding this comment.
| netUrl "net/url" | |
| neturl "net/url" |
| UpdateClan(context.Context, *ClanPayload) (*Result, error) | ||
| UpdatePlayer(context.Context, string, string, interface{}) (*Result, error) | ||
| SearchClans(context.Context, string) (*SearchClansResult, error) | ||
| SearchClans(context.Context, string, ...bool) (*SearchClansResult, error) |
There was a problem hiding this comment.
I think it is better to avoid using multiple bool arguments, otherwise you may have the following function call:
SearchClans(ctx, "", false, true, true, true, false)
It is not readable at all.
There was a problem hiding this comment.
As I mentioned below, it is meant to be a single optional argument, but I agree. I will follow your suggestion to use enums instead.
| func (k *Khan) SearchClans(ctx context.Context, clanName string, useTextSearchOpt ... bool) (*SearchClansResult, error) { | ||
| useTextSearch := true | ||
| if len(useTextSearchOpt) > 0 { | ||
| useTextSearch = useTextSearchOpt[0] |
There was a problem hiding this comment.
Why do you have an array of booleans if you only use the first one?
There was a problem hiding this comment.
The idea was to have an optional boolean argument, so as to be backwards compatible. Would it be better to disregard this?
There was a problem hiding this comment.
hmm, if you want backwards compatibility, maybe add a new method?
| filter = bson.M{"$text": bson.M{"$search": term}} | ||
| } else { | ||
| escapedTerm := fmt.Sprintf(`^\Q%s\E`, term) | ||
| filter = bson.M{"name": bson.M{"$regex": escapedTerm}} |
There was a problem hiding this comment.
Golang Regex engine is guaranteed to run at linear time based on the length of the input.
https://golang.org/pkg/regexp/
The regexp implementation provided by this package is guaranteed to run in time linear in the size of the input. (This is a property not guaranteed by most open source implementations of regular expressions.)
Do you know if Mongo guarantees the same?
There was a problem hiding this comment.
The implication if it does not follow is that the user can pass some exponential time regexes: https://stackoverflow.com/questions/8887724/why-can-regular-expressions-have-an-exponential-running-time
There was a problem hiding this comment.
Mongo uses PCRE, which does not have such guarantees. However, the regex /^\Q...\E/ is a prefix expression with no operators, and according to this https://docs.mongodb.com/manual/reference/operator/query/regex/#index-use,
/^a/ can stop scanning after matching the prefix
so it should be pretty efficient.
Will try to confirm with the load test.
| err := testing.CreateClanNameTextIndexInMongo(GetTestMongo, player.GameID) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
| clans, err := SearchClan(testDb, testMongo, player.GameID, "SEARCH", 10) | ||
| clans, err := SearchClan(testDb, testMongo, player.GameID, "SEARCH", 10, true) |
There was a problem hiding this comment.
Try to avoid raw boolean values on function calls. Adding an enumeration is easier to understand.
748dd2e to
bde9cd5
Compare
…upport Add support to pagination to clan search queries
Make Limit an optional parameter to the lib's clan search call
The current clan search implementation uses the text search provided by Mongo. However, it turns out that if the clan name doesn't have any alphabetic characters, the text search will not find it, as it was made to find words. As such, the only way to find clans with such names is via regex search, which is the proposed alternative. You can choose which search to use via the new query parameter
useRegexSearch, whereuseRegexSearch=truemeans use regex search anduseRegexSearch=falsemeans use text search (if non-existent, defaults to false).