TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Big Data at Khan Academy

49 点作者 dylanvee超过 11 年前

5 条评论

t1m超过 11 年前
It&#x27;s interesting work, but it&#x27;s not really &#x27;big data&#x27;.<p>&quot;Every day, we collect around 8 million data points on exercise and video interactions, and a few million more around community discussion, computer science programs, and registrations. Not to mention the raw web request logs, and some client-side events we send to MixPanel.&quot;<p>OK - 8 million records per day. Let&#x27;s double that for the argument&#x27;s sake.<p>Even if they were fairly fat records (1Kb), that&#x27;s only 16Gb &#x2F; day. That makes it around 2 months &#x2F; TB.<p>I can easily put together a machine with 20TB of storage and run a traditional free relational DB (or even a single free node of Greenplum) and store more than 3 years of this data.<p>Then bang against it with SQL. Transactions are free.
评论 #6531783 未加载
nl超过 11 年前
Interesting.<p>There&#x27;s a whole emerging field called &quot;learning analytics&quot;, which at the moment appears to be more a theoretically good idea than anything with practical outcomes (Sadly, much in education is like this - something will emerge in the technology field, and then 6 months later there will be a XXX-in-education movement) - although Khan Academy is in a good position to get that data and use it.<p>But for those of you who have kids who do Kumon Math (or similar) it&#x27;s pretty easy to see how analytics could speed up the Kumon process (of selecting questions that exercise very specific skills).<p>For those interested there is an upcoming &quot;Big Data in Education&quot; Coursera course[1] that I&#x27;m planning on doing. It will be my first coursea experience, so I&#x27;m not quite sure what to expect. I&#x27;m in the fortunate position of having access to a fairly significant amount of educational usage data, so I&#x27;m hoping it will be useful.<p>[1] <a href="https://www.coursera.org/course/bigdata-edu" rel="nofollow">https:&#x2F;&#x2F;www.coursera.org&#x2F;course&#x2F;bigdata-edu</a>
评论 #6532540 未加载
评论 #6531412 未加载
alexatkeplar超过 11 年前
Isn&#x27;t this a flawed approach? It seems like Khan Academy is trying to re-construct a record of behaviours across their business by stitching together:<p>1. Parsing web logs for web page views and API accesses<p>2. Exporting &quot;some client-side events&quot; from MixPanel<p>3. Mining their transactional databases for state changes<p>On #1 - web caching and client-side events have long invalidated web log based analytics approaches. How is Khan different?<p>On #3 - this is reverse engineering your user behaviours by mining state changes in your transactional systems. This is typically a ton of work, it breaks when you change your data models, and your operational systems aren&#x27;t designed to reveal user behaviours anyway.<p>Have Khan explored alternative approaches? Typically: defining with the analyst team a set of events you want to monitor, making sure all of your systems (client-side, mobile, server-side, whatever) emit immutable streams of these events, and then collecting, storing, enriching, analyzing at your leisure.
noelwelsh超过 11 年前
This was a nice read but I&#x27;m much more interested to know what they do with the data. From hanging around &quot;big data&quot; people the emphasis still seems to be on storage and simple SQL-esque querying. For most people this is a solved problem, and it&#x27;s time to go beyond storage and see what value we can get from data. I believe in most cases this requires a different skill set <i>and</i> different mindset. Most people think in binary terms, but statistical models deal with shades of grey -- nothing is ever certain -- and even simple models like linear regression are difficult for the untrained to understand.
评论 #6532499 未加载
scorpion032超过 11 年前
Do you anybody else that uses Google App Engine?