Been having a lot of fun reading an SC2Replay collection through nom parsers, serializing into Arrow files so that pola.rs can read them and perform data analysis with jupyter lab, plotly or interact with SQL operations, etc.
Looking for feedback and ideas on what to progress on.
For example, "through history, are my timings getting better?". etc. Also would love to have ideas on what libraries to use to perform forecasting.
This is very cool, I'd love to chat more about it.<p>For context, I have an open source video player designed for esports coaches. The main feature is that you can load in multiple video streams at once at synchronise them together. Mainly for FPS games like Valorant / Apex Legends (<a href="https://www.vodon.gg/" rel="nofollow noreferrer">https://www.vodon.gg/</a> if you want to explore it).<p>I'm starting to get access to some streams of data from the games via coaches that use the tool. My very naïve approach was simply to load game events into the video timeline (so you could easily skip forwards to deaths, kills etc) but I hadn't thought about loading this into data analysis tools.<p>The game events themselves for Valorant seem like they'd be enough to almost construct an online replay from them as well, which could compliment the recorded gameplay (i.e. construct a dynamic map of where everybody is that could be brought up on screen).<p>It's a very cool space, if you'd like to chat more my email is in my profile.
I am currently conducting research on esprots as a whole with the main target being StarCraft since around 2019. I was able to publish a nice dataset of pro replays, and resulting JSON files after using one of the Golang parsers written by Icza.<p><a href="https://www.nature.com/articles/s41597-023-02510-7" rel="nofollow noreferrer">https://www.nature.com/articles/s41597-023-02510-7</a><p>Feel free to look into the tools on my GitHub (<a href="https://github.com/Kaszanas">https://github.com/Kaszanas</a>). Since this is mostly the topic of my PhD I guess I will be updating the dataset in the near future. You may want to try and test your parser against it.<p>further research for you would probably include running Logistic Regression on aggregated data from each of the replays to try and have a model that can discern between winners and losers and see which parameters are key in your data.<p>Example: <a href="https://www.researchgate.net/publication/363613604_Determinants_of_victory_in_Esports_-_StarCraft_II" rel="nofollow noreferrer">https://www.researchgate.net/publication/363613604_Determina...</a><p>And even further embedding the games as timeseries data via various methods.
Thanks for sharing, very inspiring. I love Rust for parsing video game replays / save files. I've authored a Rocket League replay parser (boxcars) and an entire suite of web visualizations (via Webassembly) for EU4 called pdx.tools <a href="https://pdx.tools" rel="nofollow noreferrer">https://pdx.tools</a><p>It's not easy to work with proprietary formats, but they've both become pretty popular, so I would 100% recommend sinking more time into this project as long as it scratches your itch. Gamers are always looking for more stats and deeper insights
I've also been impressed with the amount of data in SC2 replay files given the size, although I'm more interested in Broodwar (SC1) as a game. If I'm not mistaken Broodwar replay files contain a lot less information. They mainly contain player actions, i.e. you don't even know when a marine died -- the game has to literally be replayed to get that information.
This is insanely cool! Very impressed you managed to implement a full parser in Rust.<p>I implemented a basic one in Rust a while back: <a href="https://github.com/ZephyrBlu/rust-parser">https://github.com/ZephyrBlu/rust-parser</a><p>And a full one in Python with a few bells and whistles ages ago: <a href="https://github.com/ZephyrBlu/zephyrus-sc2-parser">https://github.com/ZephyrBlu/zephyrus-sc2-parser</a><p>Don't maintain either of them though :(, and the Rust one is super rough.<p>SC2 is a very interesting area for data analysis, but at the same time I found it very challenging. There is so much nuance and inconsistency across games it can be really hard to do accurately do things like categorize builds or measure build timings.<p>The area I ended up focusing on was builds, and I feel like I did some interesting stuff there: <a href="https://sc2.gg/reports/top-openings-2022/" rel="nofollow noreferrer">https://sc2.gg/reports/top-openings-2022/</a>.<p>I found personal statistics less interesting than aggregate statistics. Even pro games are very volatile, ladder games even more so. Extremely hard to get reliable signal out of them if you're trying to track things across games. Even simple things like Collection Rate are poor indicators without significant categorization work (Matchup, build, opponent build, etc).
This is a super cool project! It would be especially interesting to use on pro-level games, maybe you could even sell services to pros who want to get better. Like being able to say "your baneling run-bys are most cost-effective with 7 banelings around the ten minute mark" etc.<p>With a relatively larger dataset you could come up with some real interesting statistics on individual players performance, use of certain units, the success of various strategies and build orders, etc.<p>It would be real fun to try to predict the outcome of a game based on the first 3 minutes or something.
On the data analytical site, I always wondered if complex game state patterns could be deconstructed using techniques from the JKU Visual Data Science Lab [0][1]. Here they demonstrate chess board win states from competitive data, and I wonder what the state relationships would look like for different (video) game matches and if new strategies could be derived from it.<p>[0]: <a href="https://youtu.be/yBCe8SqGwK8" rel="nofollow noreferrer">https://youtu.be/yBCe8SqGwK8</a><p>[1]: <a href="https://jku-vds-lab.at/publications/2022_embedding_structure/" rel="nofollow noreferrer">https://jku-vds-lab.at/publications/2022_embedding_structure...</a>
It'd be great to use a non-Blizzard property for this analysis since that company is outstandingly despicable. Plenty of other RTSes exist and I know that AoE2 has an extremely strong data analytics scene.
I may be the only one not familiar, but nom refers to <a href="https://github.com/rust-bakery/nom">https://github.com/rust-bakery/nom</a> which looks like a pretty handy way to parse binary data in Rust.