TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

League of Legends data scraping the hard and tedious way for fun

158 点作者 maknee3 个月前

16 条评论

jeremiahar3 个月前
I worked on something like this back in 2016, I'm not sure how much things have changed since then. I used dynamic binary instrumentation to deal with the field encryption. Basically, manually map the executable into executable memory on Linux (as if it were a shared library). Begin execution at the packet switch, but before executing a block of code, disassemble it until a conditional branch, and modify it according to some heuristics to remove the at rest encryption. The original block of code wasn't executed since it might not fit into the original block size, so new blocks were mmap'd for this. Malloc/Free were hooked and replaced with wrappers over glibc's free/malloc, but with bookkeeping so that the memory can be freed after execution of the packet switch. atexit was just replaced with a noop. That all just dealt with the encryption, but there were also randomized packet id's and field orders. Those problems were dealt with by using manually written heuristics based on the packet id's which were actually interesting. Packet handlers with references to text strings (even hashed ones), etc were a gold mine here because they made static detection of packet id's simple. If there was no text string, many of the offsets could be auto detected just by parsing a replay and running small snippets to determine which offsets actually "made sense" for the field that was being searched for. For example, if there was a gold gain packet, the amount of gold gained shouldn't be out of an expected range, or else the offset is likely not corresponding to that field. Once all of the high volume code blocks had been instrumented, replays were able to be parsed in 2-3 seconds (along with generating the desired data aggregations). This is all from memory so it's possible there could be a minor mistake or two.
pton_xd3 个月前
I&#x27;ve always heard that &quot;security through obscurity&quot; is discouraged because, well, there&#x27;s no stopping someone from digging in and figuring it out. However in this case it seems somewhat successful in that the author was not able to decrypt the packets directly.<p>The article says that &quot;while it might seem feasible to reimplement these functions in Python without running the client, several factors make this approach impractical&quot; and then lists some reasons like the lookup tables changing, chunk layouts getting shuffled, etc.<p>Is that all it takes to thwart decrypting the packets? Even though, presumably, you have access to all those lookup tables and chunk layouts somewhere in the client? Is it just too much effort to piece together how it works? I&#x27;d be curious to hear more specifics on how exactly Riot was able to make reverse engineering this so impractical.<p>Great article!
评论 #43054378 未加载
评论 #43032422 未加载
评论 #43031512 未加载
评论 #43032258 未加载
finalfire3 个月前
This is really something cool, and it is exactly what I was looking for. To give a context, I worked on some data science-inspired studies [1] about LoL, and the future research direction is to provide a formal modeling for the games and analyze them through it. While I had a little success by getting aggregated data from websites such as uol.gg, the granularity is not fine enough to do very interesting analysis.<p>[1] <a href="https:&#x2F;&#x2F;doi.org&#x2F;10.1016&#x2F;j.ipm.2023.103516" rel="nofollow">https:&#x2F;&#x2F;doi.org&#x2F;10.1016&#x2F;j.ipm.2023.103516</a>
landr0id3 个月前
The World of Warships community has gone through similar steps, but the encryption is much more straightforward. Some of the packets are pickled Python, some are just binary blobs, so there are some undocumented packets but for the most part people have done a decent job of figuring it out and building tooling around it such as the minimap renderer: <a href="https:&#x2F;&#x2F;github.com&#x2F;WoWs-Builder-Team&#x2F;minimap_renderer">https:&#x2F;&#x2F;github.com&#x2F;WoWs-Builder-Team&#x2F;minimap_renderer</a><p>There’s an odd unspoken and somewhat understood agreement between the developer (Wargaming) and community though: the community actively reverse engineers the game to document the packets and WG kind of looks the other way (except when they recently threatened me with a perma ban :) — they even use the tooling the community creates for official tournaments.<p>In this article the author mentions Riot partnering with external companies to provide more rich data set and analytics. Do they use these tools&#x2F;data sets for tournaments as well? Is it known at all how these partnerships are structured?
评论 #43030332 未加载
moonshadow5653 个月前
&gt; League of Legends runs on a custom game engine developed in 2009.<p>Developed by Sergey Titov (same engine that powers Big Rigs).
评论 #43027406 未加载
leloctai3 个月前
I&#x27;m not very well versed in RE, but I know that competitive games like this spend a lot of effort in preventing you from attaching debuggers, hooking and decompilation.<p>By passing this is not mentioned at all in the article. Is this because they&#x27;re trivial to bypass for experienced people, or because they want to hide their method from the dev?
exar08153 个月前
I did something similar with a friend for some time for another game.<p>As it went, our data was used to prove things to the developer they would have loved to hush-hush, which led to a cat and mouse game with the data and their open and... not so open apis. In the End, we stopped playing the game and stopped our efforts at it. Fun times.
infogulch3 个月前
Getting data by directly processing the packets instead of using the (buggy, slow) replay system is a great idea. There&#x27;s a lot of interesting data in the middle of LoL gamestate that is missing in summary overviews that only consider the final state of the game.
SpaceManNabs3 个月前
One of the cool things about dota is that opendota and stratz provide a lot of data because steam is relatively open.<p>it is how i wrote a blog post on generating builds for heroes before dota plus even had the feature!
nomilk3 个月前
Where&#x2F;how are images like this made? They&#x27;re cool. Technical and communicative, but with a relaxed and casual look and feel.<p><a href="https:&#x2F;&#x2F;maknee.github.io&#x2F;assets&#x2F;images&#x2F;posts&#x2F;2024-11-02&#x2F;league_overview.svg" rel="nofollow">https:&#x2F;&#x2F;maknee.github.io&#x2F;assets&#x2F;images&#x2F;posts&#x2F;2024-11-02&#x2F;leag...</a>
评论 #43033257 未加载
ajsmitty3 个月前
I remember doing this 10+ years ago now for a site called probuilds. I left lol shortly after this. Cool to see that the packets haven’t changed much. (Based on my memory)<p>Shortly after I released this for TSM riot came out with the api.
m0w0kuma3 个月前
I&#x27;ve been working on something similar [1], but I took a different approach: I statically extract all decryption stubs using a IDA script I wrote, then emulate them using Unicorn. I&#x27;m also interested in your implementation details—do you have your code on GitHub or somewhere else?<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;m0w0kuma&#x2F;ROFL">https:&#x2F;&#x2F;github.com&#x2F;m0w0kuma&#x2F;ROFL</a>
评论 #43030194 未加载
picafrost3 个月前
A tip:<p><pre><code> @media (prefers-color-scheme: dark) { img[src*=&quot;svg&quot;], img[src*=&quot;png&quot;] { filter: invert(1) hue-rotate(180deg); } }</code></pre>
评论 #43026589 未加载
Kuinox3 个月前
The diagrams are not visible in dark mode.
评论 #43025596 未加载
评论 #43027665 未加载
评论 #43025677 未加载
评论 #43025593 未加载
armanckeser3 个月前
Really cool project! I am not sure if this is only me, but your dark theme is hiding the illustrations fyi.
评论 #43025789 未加载
babuloseo3 个月前
GTFO hackernews, we only play Dota2 here.