Berkeley offers its data science course online for free

801 pointsby seycombiabout 7 years ago

19 comments

I find it curious that there are so many courses for data-science related subjects, which superficially seem to cover the same material, and relatively few courses covering more traditional CS topics such as computer systems, networks, OS. I suppose it has to do with the market, but also feels like colleges are skating to where the puck is, rather than where it will be (or perhaps, where it could be).

评论 #16773443 未加载

评论 #16769239 未加载

评论 #16769735 未加载

评论 #16770100 未加载

评论 #16769214 未加载

评论 #16769999 未加载

评论 #16771539 未加载

评论 #16770166 未加载

评论 #16769358 未加载

评论 #16772119 未加载

评论 #16772833 未加载

评论 #16771141 未加载

bartartabout 7 years ago

People at Berkeley view this class as kind of a joke. The average grade is insanely high and the topics are covered in much less depth than just the normal intro cs or stats classes.<a href="https://www.berkeleytime.com/grades/?course1=7765-all-all" rel="nofollow">https://www.berkeleytime.com/grades/?course1=7765-all-all</a>

评论 #16770015 未加载

评论 #16771585 未加载

评论 #16770425 未加载

gnulinuxabout 7 years ago

I'm a UC Berkeley alum. When I was there this was a course taken by humanity majors to learn some programming so that their Resume looks cooler. majority of STEM majors take CS 61A (SICP) or E7 (Programming in MATLAB). Just noting this as a context, this is not the class intended for CS majors; this : <a href="https://cs61a.org/" rel="nofollow">https://cs61a.org/</a> one is.

评论 #16775505 未加载

评论 #16772701 未加载

master_yoda_1about 7 years ago

I think this is a bad trend. These university make basic courses free to gain popularity and then ask for big money for their real courses.This is bad in two ways:1) The people taking these courses do not learn much for the effort and time they spend. Also it gives them illusion that they know enough as they take course from big university.2) Industry is already so confused in hiring, they hire by name. So even you take these courses and study in depth on your own you can't get hired. Someone more qualified can not get hired just because they can't pay 100k to get a degree in machine learning from one of these big university.This is really a bad trend and we should spend time on real courses. Everyone knows that TV series are waste of time, these courses are like TV series. Stop watching them.

评论 #16776027 未加载

评论 #16780738 未加载

anonymous5133about 7 years ago

Always boggles my mind with these "free" online courses that still stick to old method of "registering" for the class and then following a regimented schedule.Seriously, just upload the lecture videos, put the homework online and textbook. Add a message board and you're golden.

评论 #16771835 未加载

benhamnerabout 7 years ago

For those interested in a practical, hands-on course, we just released one at Kaggle <a href="https://www.kaggle.com/learn/overview" rel="nofollow">https://www.kaggle.com/learn/overview</a>

jphabout 7 years ago

Berkeley and the UC schools are making major strides in online education, including edX participation and on-campus projects. If you're interested in Berkeley and data science, there's an online masters program too. (Disclosure: Berkeley is in my client roster). <a href="https://requestinfo.datascience.berkeley.edu" rel="nofollow">https://requestinfo.datascience.berkeley.edu</a>

评论 #16770197 未加载

评论 #16771107 未加载

seycombiabout 7 years ago

direct link: <a href="https://www.edx.org/professional-certificate/berkeleyx-foundations-of-data-science#courses" rel="nofollow">https://www.edx.org/professional-certificate/berkeleyx-found...</a>(There are two ways you can follow the course: Certificate Program is paid, but the AUDIT program is free)

评论 #16769462 未加载

评论 #16768093 未加载

dpflanabout 7 years ago

Have anyone followed the curriculum suggested here?> <a href="http://datasciencemasters.org/" rel="nofollow">http://datasciencemasters.org/</a>

graycatabout 7 years ago

Okay, here's a view of what appears to be part of the course:We have a course (right a school application of stuff taught in school!) with two teachers, that is, two sections of the course, each section with its own teacher and its own students. At the end of the two courses, that is, the two sections, we want to compare the teachers. So we give the same test to all of the students from both courses.Suppose one section had 20 students and the other one, 25 -- the point here is that we don't ask that the two numbers be equal; fine if they are equal, but we're not asking that they be.So, there were 45 students. So, get a good random number generator and pick 20 students from the 45 and average their scores; also average the scores of the other 25; then take the difference of the two averages.That was once. It was resampling. Now, do that 1000 times -- remember, we have a computer to do this for us. So, now we have 1000 differences. If you want, then, "live a little" and do that 2000 times. Or, for A students, do all the combinations of 45 students taken 20 at a time. Ah, heck, lets stick closer to being practical and stay with the 1000.Now, presto, bingo, drum roll please, may I have the envelope with the actual difference in the actual averages of the actual scores in the two classes.If that actual difference is out in a tail of the empirical distribution of the 1000 differences from the resamplings, then we have a choice to make:(1) The two teachers did equally well but just by chance in the luck of the draw of the students one of the teachers seemed to do much better than the other one.(2) The actual difference is so far out in the tail that we don't believe that the two teachers were equally good, reject the hypothesis that there was no difference, called the null hypothesis, and conclude that the teacher with the higher actual average was actually a better teacher.Sure, it happened that the real reason was that one section of the course started at 7 AM and was over before the sun came up and the other section was at 11 AM when nearly everyone was awake. We like to f'get about such details! Or, sure, we might get criticized for a poorly controlled experiment.This is also called a statistical hypothesis test or a two sample test. It is a distribution free test because we are making no assumptions about probability distributions of the student scores, etc. Since we are not assuming a probability distribution, we are not assuming a probability distribution with parameters and, thus, have a non-parametric test. Uh, an example of a probability distribution with parameters is the Gaussian where the parameters are mean and standard deviation.Such tests go way back in statistics for the social sciences, e.g., educational statistics.In more recent years, leaders in resampling include B. Efron and P. Diaconis, recently both at Stanford.Why teach such stuff? Well, some parts of computer science are tweaking old multivariate statistics, especially regression analysis, and calling the results machine learning and/or artificial intelligence, putting out a lot of hype and getting a lot of attention, publicity, students, and maybe consulting gigs. Also the newsies get another source of shocking headlines to get eyeballs for the ad revenue -- write about AI and the old "take over the world ploy"!So, maybe now some profs of applied statistics, what for a while was called mathematical sciences, etc., or other profs of applied math want to get in on the party. Maybe.What can be done with resampling tests? I don't know that there is any significant market for such: Long ago I generalized such things to a curious multidimensional case and published the results in Information Sciences. The work was a big improvement on what we were doing in AI at IBM's Watson lab for zero day monitoring of high end server farms and networks. Still, I doubt that my paper has ever been applied.One of the best areas for applied statistics is the testing of medical drugs. Maybe at times resampling plans have been useful there.I have a conjecture that resampling plans are closely tied to the now classic result in mathematical statistics that order statistics are always sufficient statistics. Sufficient statistics is cute stuff, from the Radon-Nikodym theorem in measure theory and, in particular, from a 1940s paper of Halmos and Savage, then both at the University of Chicago. Some of the interest is that sample mean and sample variance are sufficient for Gaussian distributed data, and that means that, given such data, you can always do just as well in statistics with only the sample mean and sample variance and otherwise just throw away the data. IIRC E. Dynkin, student of Kolmogorov and Gel'fand, long at Cornell, has a paper that this result for the Gaussian is in a sense unstable: If the distribution is only approximately Gaussian, then the sufficiency claim does not hold.Other applications of resampling, such applied math, etc. might be in US national security. E.g., maybe monitoring activities in North Korea and looking for significant changes ....Maybe there would be applications in A/B testing in ad targeting, but I wouldn't hold my breath looking for a job offer to do such from a big ad firm.For all I know, some Wall Street hedge fund or some Chicago commodities fund uses such statistics to look for significant changes in the markets or anomalies that might be exploited. I doubt it, but maybe! Once I showed my work in anomaly detection to some people at Morgan Stanley, back before the 2008 crash of The Big Short, and there was some interest for monitoring their many Sun workstations but no interest for trading!Net, IMHO for such applied math: If can find a serious application, that is, a serious problem where such applied math gives a powerful, valuable solution, the first good or much better solution, with a good barrier to entry, and cheap, fast, and easy to bring on-line and monetize, then be a company founder and go for it. But I wouldn't look for venture funding for such a project before had revenue significant and growing rapidly and no longer needed equity funding!Otherwise look for job offers (1) in US national security, (2) medical research, (3) wherever else. But don't hold breath while waiting.Now you may just have gotten enough from about 1/3rd of the Berkeley course!

评论 #16774025 未加载

评论 #16770003 未加载

评论 #16770782 未加载

评论 #16772520 未加载

评论 #16772070 未加载

评论 #16770022 未加载

Treegardenabout 7 years ago

why is there no syllabus - as in a list of contents? I want to know what really is behind this buzzword stuff.

评论 #16771753 未加载

frabbitabout 7 years ago

There are at least two big turn-offs to this course at first blush: 1) they insist on using anaconda (effectively another package manager complicating the already layered interaction of system pip, virtualenv, virtualenvwrapper etc ). 2) they use Microsoft VisualStudioCode (so, inevitably a good deal of time in this course will be spent learning how to navigate a bloated IDE)

评论 #16792318 未加载

meri_dianabout 7 years ago

What exactly is Data Science? It seems like such an overused term and the value of the subject really gets diluted for me when I see charts in Tableau being offered as examples of "data science".What's the difference between, say, a Master's program in Computer Science where one studies machine learning and a Master's program in Data Science? Am I wrong for thinking the Data Science program weaker?

评论 #16770860 未加载

评论 #16770843 未加载

carlosggabout 7 years ago

Berkeley also used to have this Data Science with Spark series on edX but they taught it just the one time and now even the archived versions of the courses are closed.<a href="https://www.edx.org/xseries/data-science-engineering-apacher-sparktm" rel="nofollow">https://www.edx.org/xseries/data-science-engineering-apacher...</a>

评论 #16770662 未加载

tenkabutoabout 7 years ago

For those interested, you might want to check out <a href="http://data8.org" rel="nofollow">http://data8.org</a> I'm not sure how it compares to the OP course, though.

csjrabout 7 years ago

Does anyone know how it compares to bootcamps like DataCamp[0] for e.g?[0] <a href="https://www.datacamp.com" rel="nofollow">https://www.datacamp.com</a>

评论 #16769617 未加载

erokarabout 7 years ago

Many if not all of the courses on Edx has a free audit option, like this one. It gives you no certificate and often you cannot access or submit exercises.

simpleAdamabout 7 years ago

this would seem to be a playlist<a href="https://www.youtube.com/watch?v=xcgrnZay9Yc&list=PLFeJ2hV8Fyt7mjvwrDQ2QNYEYdtKSNA0y" rel="nofollow">https://www.youtube.com/watch?v=xcgrnZay9Yc&list=PLFeJ2hV8Fy...</a>

daveheqabout 7 years ago

Who has time for this?

评论 #16793532 未加载