TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Announcing Apache Spark 1.4

154 点作者 rxin将近 10 年前

8 条评论

eranation将近 10 年前
Anyone who wants to pick up Spark basics - Berkeley (Spark was developed at Berkeley&#x27;s AMPLab) in collaboration with DataBricks (Commercial company started by Spark creators) just started a free MOOC on edx: <a href="https:&#x2F;&#x2F;www.edx.org&#x2F;course&#x2F;introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x" rel="nofollow">https:&#x2F;&#x2F;www.edx.org&#x2F;course&#x2F;introduction-big-data-apache-spar...</a><p>(If you wonder what is Spark, in a very unofficial nutshell - it is a computation &#x2F; big data &#x2F; analytics &#x2F; machine learning &#x2F; graph processing engine on top of Hadoop that usually performs much better and has arguably a much easier API in Python, Scala, Java and now R)<p>It has more than 5000 students so far, and the Professor seems to answer every single Piazza question (a popular student &#x2F; teacher message board).<p>So far it looks really good (It started a week ago, so you can still catch up, 2nd lab is due only Friday 6&#x2F;12 EOD, but you have 3 days &quot;grace&quot; period... and there is not too much to catch up)<p>I use Spark for work (Scala API) and still learned one or two new things.<p>It uses the PySpark API so no need to learn Scala. All homework labs are done in a iPython notebook. Very high quality so far IMHO.<p>It is followed by a more advanced spark course (Scalable Machine Learning) also by Berkeley &amp; Databricks.<p><a href="https:&#x2F;&#x2F;www.edx.org&#x2F;course&#x2F;scalable-machine-learning-uc-berkeleyx-cs190-1x" rel="nofollow">https:&#x2F;&#x2F;www.edx.org&#x2F;course&#x2F;scalable-machine-learning-uc-berk...</a><p>(not affiliated with edx, Berkeley or databricks, just thought it&#x27;s a good place for a PSA to those interested)<p>The Spark originating academic paper by Matei Zaharia (Creator of Spark) got him a PHd dissertation award in 2014 by the ACM (<a href="http:&#x2F;&#x2F;www.acm.org&#x2F;press-room&#x2F;news-releases&#x2F;2015&#x2F;dissertation-award-14&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.acm.org&#x2F;press-room&#x2F;news-releases&#x2F;2015&#x2F;dissertatio...</a>)<p>Spark also set a new record in large scale sorting (Beating Hadoop by far): <a href="https:&#x2F;&#x2F;databricks.com&#x2F;blog&#x2F;2014&#x2F;11&#x2F;05&#x2F;spark-officially-sets-a-new-record-in-large-scale-sorting.html" rel="nofollow">https:&#x2F;&#x2F;databricks.com&#x2F;blog&#x2F;2014&#x2F;11&#x2F;05&#x2F;spark-officially-sets...</a><p>* EDIT: typo in &quot;Berkeley&quot;, thanks gboss for noticing :)
评论 #9704445 未加载
评论 #9703423 未加载
评论 #9703504 未加载
评论 #9704461 未加载
评论 #9708589 未加载
评论 #9703668 未加载
评论 #9704159 未加载
评论 #9736487 未加载
评论 #9702843 未加载
fleeno将近 10 年前
As someone who doesn&#x27;t know what Apache Spark is, this article reads like it could have been randomly generated.
评论 #9702806 未加载
评论 #9705466 未加载
chiachun将近 10 年前
The release notes: <a href="https:&#x2F;&#x2F;spark.apache.org&#x2F;releases&#x2F;spark-release-1-4-0.html" rel="nofollow">https:&#x2F;&#x2F;spark.apache.org&#x2F;releases&#x2F;spark-release-1-4-0.html</a><p>Another major change is that it supports Python 3 now. <a href="https:&#x2F;&#x2F;issues.apache.org&#x2F;jira&#x2F;browse&#x2F;SPARK-4897" rel="nofollow">https:&#x2F;&#x2F;issues.apache.org&#x2F;jira&#x2F;browse&#x2F;SPARK-4897</a>
评论 #9703200 未加载
minimaxir将近 10 年前
I&#x27;m excited about SparkR, even though R is shunned in the field of big data. Between that and dplyr (which inspired the SparkR syntax) for data manipulation and sanitation, it should be much easier to write sane, reproducible code and visualizations for big data analysis. (the Python&#x2F;Scala tutorials for Spark gave me a headache)<p>SparkR appears to have strong integration into Rstudio, which is big news: <a href="http:&#x2F;&#x2F;blog.rstudio.org&#x2F;2015&#x2F;05&#x2F;28&#x2F;sparkr-preview-by-vincent-warmerdam&#x2F;" rel="nofollow">http:&#x2F;&#x2F;blog.rstudio.org&#x2F;2015&#x2F;05&#x2F;28&#x2F;sparkr-preview-by-vincent...</a>
评论 #9703629 未加载
评论 #9702773 未加载
评论 #9702511 未加载
评论 #9701814 未加载
DannoHung将近 10 年前
Does anyone know if there&#x27;s a guide to integrating Spark between a realtime write only database and a historical database?<p>I&#x27;ve looked into using Spark Streaming, but I can&#x27;t work out how you could seamlessly transition data from a streaming batch to the historical db in a reasonably tight time period.<p>I&#x27;d be willing to pay for training if it came to it, but I don&#x27;t think I&#x27;m using the right search terms.
评论 #9702031 未加载
评论 #9702772 未加载
评论 #9702744 未加载
评论 #9702838 未加载
评论 #9703795 未加载
评论 #9702462 未加载
krat0sprakhar将近 10 年前
I, somehow, always keep getting confused between Spark and Storm! Can someone explain the difference between the two (usecases etc.) as if explaining to a five year-old? Thanks!
评论 #9705560 未加载
lazzlazzlazz将近 10 年前
Is support for User Defined Aggregation Functions (regarding DataFrames) slated for 1.5?
Tepix将近 10 年前
Too bad the website is so hard to read.<p>Time for that site to join contrastrebellion.com