TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How to create a 1M record table with a single query

180 点作者 lipanski大约 4 年前

11 条评论

cribwi大约 4 年前
If you&#x27;re running PostgreSQL, you can use the built-in generate_series (1) function like:<p><pre><code> SELECT id, random() FROM generate_series(0, 1000000) g (id); </code></pre> There seems to be an equivalent in SQLite: <a href="https:&#x2F;&#x2F;sqlite.org&#x2F;series.html" rel="nofollow">https:&#x2F;&#x2F;sqlite.org&#x2F;series.html</a><p>[1] <a href="https:&#x2F;&#x2F;www.postgresql.org&#x2F;docs&#x2F;current&#x2F;functions-srf.html" rel="nofollow">https:&#x2F;&#x2F;www.postgresql.org&#x2F;docs&#x2F;current&#x2F;functions-srf.html</a>
评论 #26565074 未加载
评论 #26565090 未加载
评论 #26568150 未加载
评论 #26566377 未加载
评论 #26565627 未加载
etaioinshrdlu大约 4 年前
If your DB doesn&#x27;t support this technique, you might be able to use this rather disgusting technique, which builds an exponentially growing tower of rows by using nested queries with JOIN&#x27;s. <a href="https:&#x2F;&#x2F;stackoverflow.com&#x2F;a&#x2F;61169467" rel="nofollow">https:&#x2F;&#x2F;stackoverflow.com&#x2F;a&#x2F;61169467</a>
评论 #26565467 未加载
openquery大约 4 年前
This looks convenient (and performant). But how does it scale as queries join across tables?<p>If you need to create test data with complex business logic, referential integrity and constraints we&#x27;ve been working on declarative data generator that is build exactly for this: <a href="https:&#x2F;&#x2F;github.com&#x2F;openquery-io&#x2F;synth" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;openquery-io&#x2F;synth</a>.
zikani_03大约 4 年前
If you need another repeatable way to create random data that you can export as SQL (or CSV&#x2F;Excel files). You may find a tool we built and use at work useful: <a href="https:&#x2F;&#x2F;github.com&#x2F;creditdatamw&#x2F;zefaker" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;creditdatamw&#x2F;zefaker</a><p>Needs a little Groovy but very convenient for generating random (or non-random) data.
sigmonsays大约 4 年前
i generally end up writing a data generator using the language and apis in the application. I often want control over various aspects of the data and built the generator as such. Quite often I just generate queries and then run them which works quickly.<p>While this looks like a good way to generate simple data, practical applications are more involved.
alexf95大约 4 年前
Can someone give me a specific use case for this? As &quot;check how a query behaves on a large table&quot; is very vague to me.<p>E.g. I have a table structure but not alot of rows in it, so i go use this to get alot of rows in to check how fast queries get processed?
评论 #26565089 未加载
评论 #26565623 未加载
评论 #26566904 未加载
评论 #26564756 未加载
muad_kyrlach大约 4 年前
Everyone, please rush out and do this in your production databases! :)
slt2021大约 4 年前
usually there is a system table with a ton of metadata that will for sure contain few thousand rows, so generating a million rows for SQL Server is simply:<p>select top 1000000 ROW_NUMBER() from sys.objects a, sys.objects b<p>or you can use INFORMATION_SCHEMA which is more portable across different RDBMS engines
评论 #26568765 未加载
urbandw311er大约 4 年前
&gt; This is not a problem if your DBMS supports SQL recursion<p>A table of which DMBS’s support this would be useful
评论 #26565250 未加载
评论 #26564711 未加载
评论 #26565392 未加载
评论 #26564691 未加载
评论 #26564656 未加载
tandav大约 4 年前
<p><pre><code> spark.sparkContext.range(1_000_000, numSlices=200).take(20)</code></pre>
coremoff大约 4 年前
in oracle, this can be done using heirarchical queries:<p><pre><code> select dbms_random.value from dual connect by level &lt; 1000001; </code></pre> edit: 1,000,001 as level starts with 1