-
Notifications
You must be signed in to change notification settings - Fork 16
Description
I was using the dynamic text example from play-cljc-examples ui-gallery for my game and experience performance dip when rendering around 300 characters. It wasn't a deal breaker yet but I got curious and managed to find some improved implementation that scales better to my target around 2000 dynamic characters per frame and still manage to render okay-ish until around 80000 characters in my machine.
the code is in my fork here:
https://github.com/keychera/play-cljc-examples/blob/master/ui-gallery/src/ui_gallery/chars.cljc
there are several implementations that I test and chars/assoc-lines6 being the fastest and chars/assoc-lines0 is the original code
Initially, I was planning to contribute to play-cljc-examples with chars/assoc-lines1 that already improves the performance. But then, I kept trying to find performance gains and I found that skipping instance/assoc from the main library and building attributes yourself leads to faster code. Since I am now not sure if the fastest implementation that I found is generic enough to be contributed, my goal of making this issue is just to share what I managed to do.
Also, I am by no means a performance expert so if there are better ways to do or measure certain things, please tell me.
What I did to stress test the code by printing the following: add some ad-hoc fps statistic and multiply the strings to increase the character counts like below
msrdc_KR0u8Qx43v.mp4
with small load, my machine will run the game at 165fps (my screen refresh rate) and the original code performance drop around 140-ish fps when rendering 300 characters, and drops to 20 fps when it reachs around 1500 characters. On the other hand, chars/assoc-lines6 renders 5000 characters around 150-ish fps and at around 80000 characters it drops to 30fps as shown in the video (sorry if the comparison number is rather arbitrary because I mostly change the number manually).
the numbers are from the JVM build running on JDK 25 on WSL2
some machine specs:
13th Gen Intel(R) Core(TM) i7-13700H (2.40 GHz), 16,0 GB RAM,
NVIDIA GeForce RTX 4060
The code is rather messy since I was also still learning how to make performant clojure code. I was trying all the tricks in the book in other variations of assoc-lines (I tried transducer, reducing nested loops, memoize, type hinting, transients, arrays). I also tried clj-async-profiler to find for the bottleneck and I don't know if there is any other place to improve. the following screenshot is the flamegraph of one call fo chars/assoc-lines6 and play-cljc/render after rendering for 3 seconds.
For now, this is already more than enough for my game. I hope that this could be helpful and If my code is acceptable, I could try to clean up and contribute what I can.