TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Big Data at Khan Academy

49 pointsby dylanveeover 11 years ago

5 comments

t1mover 11 years ago
It&#x27;s interesting work, but it&#x27;s not really &#x27;big data&#x27;.<p>&quot;Every day, we collect around 8 million data points on exercise and video interactions, and a few million more around community discussion, computer science programs, and registrations. Not to mention the raw web request logs, and some client-side events we send to MixPanel.&quot;<p>OK - 8 million records per day. Let&#x27;s double that for the argument&#x27;s sake.<p>Even if they were fairly fat records (1Kb), that&#x27;s only 16Gb &#x2F; day. That makes it around 2 months &#x2F; TB.<p>I can easily put together a machine with 20TB of storage and run a traditional free relational DB (or even a single free node of Greenplum) and store more than 3 years of this data.<p>Then bang against it with SQL. Transactions are free.
评论 #6531783 未加载
nlover 11 years ago
Interesting.<p>There&#x27;s a whole emerging field called &quot;learning analytics&quot;, which at the moment appears to be more a theoretically good idea than anything with practical outcomes (Sadly, much in education is like this - something will emerge in the technology field, and then 6 months later there will be a XXX-in-education movement) - although Khan Academy is in a good position to get that data and use it.<p>But for those of you who have kids who do Kumon Math (or similar) it&#x27;s pretty easy to see how analytics could speed up the Kumon process (of selecting questions that exercise very specific skills).<p>For those interested there is an upcoming &quot;Big Data in Education&quot; Coursera course[1] that I&#x27;m planning on doing. It will be my first coursea experience, so I&#x27;m not quite sure what to expect. I&#x27;m in the fortunate position of having access to a fairly significant amount of educational usage data, so I&#x27;m hoping it will be useful.<p>[1] <a href="https://www.coursera.org/course/bigdata-edu" rel="nofollow">https:&#x2F;&#x2F;www.coursera.org&#x2F;course&#x2F;bigdata-edu</a>
评论 #6532540 未加载
评论 #6531412 未加载
alexatkeplarover 11 years ago
Isn&#x27;t this a flawed approach? It seems like Khan Academy is trying to re-construct a record of behaviours across their business by stitching together:<p>1. Parsing web logs for web page views and API accesses<p>2. Exporting &quot;some client-side events&quot; from MixPanel<p>3. Mining their transactional databases for state changes<p>On #1 - web caching and client-side events have long invalidated web log based analytics approaches. How is Khan different?<p>On #3 - this is reverse engineering your user behaviours by mining state changes in your transactional systems. This is typically a ton of work, it breaks when you change your data models, and your operational systems aren&#x27;t designed to reveal user behaviours anyway.<p>Have Khan explored alternative approaches? Typically: defining with the analyst team a set of events you want to monitor, making sure all of your systems (client-side, mobile, server-side, whatever) emit immutable streams of these events, and then collecting, storing, enriching, analyzing at your leisure.
noelwelshover 11 years ago
This was a nice read but I&#x27;m much more interested to know what they do with the data. From hanging around &quot;big data&quot; people the emphasis still seems to be on storage and simple SQL-esque querying. For most people this is a solved problem, and it&#x27;s time to go beyond storage and see what value we can get from data. I believe in most cases this requires a different skill set <i>and</i> different mindset. Most people think in binary terms, but statistical models deal with shades of grey -- nothing is ever certain -- and even simple models like linear regression are difficult for the untrained to understand.
评论 #6532499 未加载
scorpion032over 11 years ago
Do you anybody else that uses Google App Engine?