Why We’re Building Flux, a New Data Scripting and Query Language

91 点作者 pauldix将近 7 年前

29 条评论

tylerl将近 7 年前

> I don't want to live in a world where the language I speak evolved slowly over the past 1600 years.Also, this? This is the illustrative example you chose for your amazing new query language?<pre><code> square = (table=<-) => { table |> map(fn: (r) => r._value = r._value * r._value) } </code></pre> So you're saying you're combining the expressiveness of SQL with the readability of, what, perl?

评论 #17573034 未加载

评论 #17572935 未加载

评论 #17582217 未加载

评论 #17573506 未加载

评论 #17574769 未加载

评论 #17573097 未加载

评论 #17575361 未加载

georgewfraser将近 7 年前

SQL definitely has weaknesses, but I wish people wouldn't use "straw man" examples to crap all over it. The flux example from his blog post would look something like this:<pre><code> select lag(value, 0) over recent * 1 + lag(value, 1) over recent * 0.5 + lag(value, 2) over recent * 0.25 + ... as exp_moving_avg from telegraph where time > datetime_sub(current_datetime(), interval 1 hour) and measurement = 'foo' window recent as (order by time rows 10 preceding) </code></pre> Here, the main difference between flux is that flux has a built-in exponential moving average function, whereas in SQL we have to actually write out the formula.

评论 #17573548 未加载

评论 #17572428 未加载

rs86将近 7 年前

SQL has a very solid ground in research - a lot of - in relational algebra. If you try to make a query language that is a dsl for anything without a really different data model underneath, you will accomplish nothing great.

评论 #17572948 未加载

评论 #17572964 未加载

pauldix将近 7 年前

Author here, I just noticed that this got picked up so I'm late to the party. I suppose I'll take the bait and aim to clarify one thing that I think is funny people are getting hung up on. My line: "I don’t want to live in a world where the best language humans could think of for working with data was invented in the 70’s"Read in context, the meaning of that sentence isn't that things invented decades (or centuries or millennia) ago are all bad. I even state that SQL is a great and powerful tool. If you took from the post that I think SQL is shit and needs to be replaced, you weren't paying attention.The point of that line (and really the point of us creating Flux) is that we think there can be a more elegant and understandable language (read: API) for working with time series data. But that we won't get there by trying to improve SQL. You don't build an automobile by creating better wheels for your horse and buggy.Also, SQL isn't a language like English. SQL is an API and APIs change all the time. Yes, code is communication, but its form evolves much more quickly than spoken and written language between humans.

评论 #17578335 未加载

fake-name将近 7 年前

Holy shit. The language name alone is a really, really stupid idea.Protip: If you're inventing a new esoteric programming language (and until you have other people implementing non-trivial projects in/using your language, it is an esoteric programming language), google the fucking name first.If googling your intended <thing's> name results in more then 1000 hits, CHANGE YOUR <thing's> FUCKING NAME.If you don't trying to find any resources about the <thing> on the internet will be a huge pain in the ass. Name your project something unique.Googling "flux" results in "About 197,000,000 results". If you just make it a little more specific as "fluxql", you get ~142 results.People looking for language resources will actually find the shit they're looking for, and the name actually tells you something about what it does, which is nice.

评论 #17576434 未加载

meritt将近 7 年前

> Writing a SQL equivalent example of that query is, at this point, beyond my SQL capabilities.Let me get this straight, someone who doesn't know SQL is going to solve all of the deficiencies of SQL by inventing something new?InfluxDB is awesome and the syntax looks great, but please spare me the "problems" you're solving when you can't write a simple moving-average query.

评论 #17572407 未加载

jwdunne将近 7 年前

I'd be interested in seeing a language that can take full advantage of the architecture in Out of the Tarpit.It's always touted as a must read paper but I haven't seen many inroads towards something truly Functional-Relational.I don't think a new query language is the solution to the problem. SQL isn't that bad. Sure, for sanity, every language has a reimplementation of SQL syntax using whatever abstractions available. But even if you don't go SQL, you've got datalog.A language being 40 to 50 years old doesn't make it a problem. I'm speaking English - that's been through 2000 years of iteration. I don't think that makes it a good candidate for a ground up rethink when so much thought by great thinkers has already gone into it.

评论 #17573800 未加载

评论 #17573021 未加载

fnordsensei将近 7 年前

I've been writing Datalog for the last week or so. It took me a bit to adapt to the syntax (also, [1] helped), but now I find myself enjoying the combination of terseness and expressiveness.It's probably a matter of familiarity, but looking at Flux, it seems both noisier than Datalog and less readable than SQL.Given that readability is a goal for Flux, I guess it's a matter of subjectivity: readable for whom? What background do you have in order for Flux to look readable?1: <a href="http://www.learndatalogtoday.org/" rel="nofollow">http://www.learndatalogtoday.org/</a>

评论 #17576459 未加载

nishantvyas将近 7 年前

"I don’t want to live in a world where the best language humans could think of for working with data was invented in the 70’s. I refuse to let that be my reality."---- re-play the same sentence for c/SQL/ENGLISH ---I don’t want to live in a world where the best language humans could think of for "communicating" was invented in the (c. 550–1066 CE). I refuse to let that be my reality.So, I'm starting new language...sdflakfj lsjkfaldfj sdfkjaslf dflkasjdfk sldfkjaslf laskfdas....------ P.S.: Any technology is built over time with sedimentary layers... every layer has played key role in where we are today.. I'd not discount any...

justinmk将近 7 年前

The post doesn't mention datalog, but Dedalus/Bloom[1] makes a good case for why datalog is a good starting point for a data query language.1. <a href="https://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-173.html" rel="nofollow">https://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-...</a>2. <a href="https://www.youtube.com/watch?v=R2Aa4PivG0g&t=2295s" rel="nofollow">https://www.youtube.com/watch?v=R2Aa4PivG0g&t=2295s</a>

fiiv将近 7 年前

All the words in the english language (not to mention place names, words in other languages/transliterations), possible acronyms, etc, and they choose one that already has a Facebook-promoted architectural pattern using it.Not sure if arrogance or ignorance, perhaps just apathy?

andyfleming将近 7 年前

It seems more complicated than necessary. Why not a syntax like...<pre><code> from("telegraf") .range(-1h) .where(_measurement, "foo") .exponentialMovingAverage(-10s)</code></pre>

评论 #17573456 未加载

评论 #17572559 未加载

评论 #17573264 未加载

评论 #17572586 未加载

AlphaWeaver将近 7 年前

>I don’t want to live in a world where the best language humans could think of for working with data was invented in the 70’sMaybe we just got it right relatively early!

Ycros将近 7 年前

I've gone down a similar line of thinking when writing a lot of SQL over timeseries or otherwise ordered datasets recently. Going in a more functional and composable direction while keeping it limited so that hopefully the query execution engine can still make good optimizations seems like the right idea.Making it a separate and open language outside of Influx is also a great approach - I'd love to see other databases try adopting this. I'll definitely be keeping an eye on this project.

mrdoops将近 7 年前

Looks like Ecto.<a href="https://hexdocs.pm/ecto/Ecto.Query.html" rel="nofollow">https://hexdocs.pm/ecto/Ecto.Query.html</a>

dragonwriter将近 7 年前

There seems to be a lot of not understanding SQL in the SQL criticism, and the design of the new language seems to have a lot of excess noise.1. If arity-1 functions are as dominant in use as the examples suggest, mandatory named args are excessive noise.2. From the examples, the |> operator also looks like needless noise; code would be cleaner and more readable if this operator was whitespace without any other character (like function application in Haskell, but newlines should also be acceptable) and there was a different punctuation for when that isn't intended.3. This seems really noisy:<pre><code> square = (table=<-) => { table |> map(fn: (r) => r._value = r._value * r._value) } </code></pre> Map is a transform applied to an input, so is square, so why can't it be:<pre><code> square = map(fn: (r) => r._value = r._value * r._value) </code></pre> Or, better:<pre><code> square = map(_._value *= 2)</code></pre>

pwinnski将近 7 年前

"We started with poorly-implemented SQL, which frustrated people, so we decided to make something without any of the advantages of SQL instead."

bww将近 7 年前

Based on all the reflexively negative comments I’ll assume that most the participants in this thread write SQL for a living. Notwithstanding certain ridiculous assertions on the part of the author attempting to correlate the value of a technology with the year in which it was created this seems like a pretty compelling idea. We use InfluxDB at my company to manage time series data and it’s specialization for that use case has been a big benefit. I don’t see any reason to be dismissive by default of a language designed for interacting with data having these specific characteristics in a manner explicitly suited to it.

评论 #17573224 未加载

评论 #17572723 未加载

评论 #17573187 未加载

coldtea将近 7 年前

>I don’t want to live in a world where the best language humans could think of for working with data was invented in the 70’sYeah, because languages go stale... especially those based on mathematical abstractions like SQL.

lixtra将近 7 年前

It was a bad idea to focus on Flux vs SQL.It would be better compared to pig or pandas.I like the approach. I find it annoying that in SQL I need more than select privileges to write my own function or view (Aliases and WITH is all that you can use to structure a big query). Also macros that work on all tables that have certain columns are at best difficult to write in SQL, so there is room for improvement.OTOH in Flux you write something that looks a lot like the output of a planner, so if things change in your dB you might have to modify your scripts instead of adding an index.

xaduha将近 7 年前

> I don’t want to live in a world where the best language humans could think of for working with data was invented in the 70’sYou don't, most popular one is. Best one is XQuery.

评论 #17572736 未加载

ChrisRackauckas将近 7 年前

They took the name of the popular Julia neual network package? That makes it confusing.

评论 #17572454 未加载

wslh将近 7 年前

Why We’re Building Flux, a New Data Scripting and Query Language? Who knows why?Before pedantically saying "I don’t want to live in a world where the best language humans could think of for working with data was invented in the 70’s" show us your breakthrough that makes us think you deserve reading across all your article paragraphs.

评论 #17572408 未加载

评论 #17572223 未加载

评论 #17572917 未加载

评论 #17572256 未加载

tincholio将近 7 年前

From TFA: ". This is kind of like the worst part of Lisp (nested function calls), but even less readable. Not only was the Flux example more terse, it was more readable and understandable."It's kinda funny, because his forward pipe operator (suspiciously similar to Elixir's) is the same as a threading macro, which you have in lisp (or can trivially write if you don't)

yellowapple将近 7 年前

"I don’t want to live in a world where the best language humans could think of for working with data was invented in the 70’s"I don't want to live in a world where the best language humans could think of for communicating with other humans was invented in the ${CENTURY_WHEN_ENGLISH_WAS_INVENTED}70's.

wizardofmysore将近 7 年前

Seems more like the XKCD comic on standards.I really loved the tenets given, ie.. Useable Readable Composable Testable Contributable ShareableBut post that the article really failed to connect how the new language is going to do the above in a way no other language has done.Also, I agree with the rest on the point against SQL, doesn't matter if SQL was invented in the 70s, so was the many programming languages and paradigms we use today.PS. I don't make a living writing SQL.

blablablerg将近 7 年前

these things are very easy to do and clear to write using R's tidyverse.

rixed将近 7 年前

It looks indeed a lot like graphite, and since you explicitly mention in your talk that your objective is to reimplement all the functions that are present in graphite, why no instead present your work as a port of the graphite language, with some extension to work on other data sources and sinks (and dots replaced by the fat pipe)?This is interesting to me as I'm currently working on something close: a lightweight stream processor to allow system engineers to manipulate some large streams of data while in flight to a database. And I've been wondering (and still am) about the trade-offs between simple and expressive. Very early, I decided not to be TS specific at all (since we were prevented to use an off-the-shelf product for that reason that our data does not look enough like a TS -- not a single time nor a single value fields). Eventually, after a few detours, we ended up favoring a SQL like language for that reason that it's field agnostic.Regarding the language itself, the main differences I can see are that you query over a time range while we process infinite streams, with the consequence that we must explicitly tells each operation when it has to output values (windowing); the other is that you have an implicit key and one TS by "group" with the same key, which makes piping many operations easier (but JOINing harder), while we have to be more specific about how to group.So for instance, where you have:<pre><code> from(db:"foo") |> window(every:20s) |> sum() </code></pre> we would have the more SQL-alike:<pre><code> select sum value from foo group by time // 20 </code></pre> ("//" being the integer division).Or, if you needed the start and stop additional columns added by window():<pre><code> select sum value, (time // 20)*20 AS start, start+20 AS stop group by start </code></pre> But then, because fluxlang process a range of time while we stream "forever" we would also have to tell when to output a tuple, for instance after 20s has passed:<pre><code> select sum value, (time // 20)*20 AS start, start+20 AS stop group by start commit after in.time > group.stop </code></pre> which gets verbose quickly.But to us this constraint imposed by streaming (as opposed to querying a DB for the data to process) is essential since our main use case is alerting from a single box, so querying every minute the last 10 minutes of data for thousands of defined alerts would just not work.Another interesting difference is the type system. One thing I both like and hate in SQL is the NULL. It's convenient for missing data but it's also the SQL equivalent of the null pointer. So we have a type system that looks closely on it: we support this special case of algebraic data type that a "type?" is a NULLable "type", and that NULLs must be dealt with before they reach a function that does not accept NULLs. For instance, there is no way to compile a filter which condition can be NULL, and one would have to COALESCE it first. What's your thoughts about missing data? Do you manage to avoid the issue entirely, including after a JOIN operation?The other difference I noticed is how nice your query editor is. For now our query editor is $EDITOR, but my plan is to build a data source plugin for Grafana. What do you think of this approach?

gaius将近 7 年前

The ultimate fantasy of every programmer is to a) invent a new language and b) force other people to use it. It’s OK, we all get it, it’s fine. But let’s be honest about our motivations...A previous employer had RQL, "relational query language". It was between 10,000 and a million times slower than SQL depending on what you were doing (under the covers it was just generating really bad SQL). But the engineer who invented it was sufficiently well connected to get it declared the corporate standard, so...

评论 #17574418 未加载

评论 #17572864 未加载