TechEcho

11 comments

cribwiabout 4 years ago

If you're running PostgreSQL, you can use the built-in generate_series (1) function like:<pre><code> SELECT id, random() FROM generate_series(0, 1000000) g (id); </code></pre> There seems to be an equivalent in SQLite: <a href="https://sqlite.org/series.html" rel="nofollow">https://sqlite.org/series.html</a>[1] <a href="https://www.postgresql.org/docs/current/functions-srf.html" rel="nofollow">https://www.postgresql.org/docs/current/functions-srf.html</a>

评论 #26565074 未加载

评论 #26565090 未加载

评论 #26568150 未加载

评论 #26566377 未加载

评论 #26565627 未加载

etaioinshrdluabout 4 years ago

If your DB doesn't support this technique, you might be able to use this rather disgusting technique, which builds an exponentially growing tower of rows by using nested queries with JOIN's. <a href="https://stackoverflow.com/a/61169467" rel="nofollow">https://stackoverflow.com/a/61169467</a>

评论 #26565467 未加载

openqueryabout 4 years ago

This looks convenient (and performant). But how does it scale as queries join across tables?If you need to create test data with complex business logic, referential integrity and constraints we've been working on declarative data generator that is build exactly for this: <a href="https://github.com/openquery-io/synth" rel="nofollow">https://github.com/openquery-io/synth</a>.

zikani_03about 4 years ago

If you need another repeatable way to create random data that you can export as SQL (or CSV/Excel files). You may find a tool we built and use at work useful: <a href="https://github.com/creditdatamw/zefaker" rel="nofollow">https://github.com/creditdatamw/zefaker</a>Needs a little Groovy but very convenient for generating random (or non-random) data.

sigmonsaysabout 4 years ago

i generally end up writing a data generator using the language and apis in the application. I often want control over various aspects of the data and built the generator as such. Quite often I just generate queries and then run them which works quickly.While this looks like a good way to generate simple data, practical applications are more involved.

alexf95about 4 years ago

Can someone give me a specific use case for this? As "check how a query behaves on a large table" is very vague to me.E.g. I have a table structure but not alot of rows in it, so i go use this to get alot of rows in to check how fast queries get processed?

评论 #26565089 未加载

评论 #26565623 未加载

评论 #26566904 未加载

评论 #26564756 未加载

muad_kyrlachabout 4 years ago

Everyone, please rush out and do this in your production databases! :)

slt2021about 4 years ago

usually there is a system table with a ton of metadata that will for sure contain few thousand rows, so generating a million rows for SQL Server is simply:select top 1000000 ROW_NUMBER() from sys.objects a, sys.objects bor you can use INFORMATION_SCHEMA which is more portable across different RDBMS engines

评论 #26568765 未加载

urbandw311erabout 4 years ago

> This is not a problem if your DBMS supports SQL recursionA table of which DMBS’s support this would be useful

评论 #26565250 未加载

评论 #26564711 未加载

评论 #26565392 未加载

评论 #26564691 未加载

评论 #26564656 未加载

tandavabout 4 years ago

<pre><code> spark.sparkContext.range(1_000_000, numSlices=200).take(20)</code></pre>

coremoffabout 4 years ago

in oracle, this can be done using heirarchical queries:<pre><code> select dbms_random.value from dual connect by level < 1000001; </code></pre> edit: 1,000,001 as level starts with 1

11 comments

cribwiabout 4 years ago

评论 #26565074 未加载

评论 #26565090 未加载

评论 #26568150 未加载

评论 #26566377 未加载

评论 #26565627 未加载

etaioinshrdluabout 4 years ago

评论 #26565467 未加载

openqueryabout 4 years ago

zikani_03about 4 years ago

sigmonsaysabout 4 years ago

alexf95about 4 years ago

评论 #26565089 未加载

评论 #26565623 未加载

评论 #26566904 未加载

评论 #26564756 未加载

muad_kyrlachabout 4 years ago

Everyone, please rush out and do this in your production databases! :)

slt2021about 4 years ago

评论 #26568765 未加载

urbandw311erabout 4 years ago

> This is not a problem if your DBMS supports SQL recursionA table of which DMBS’s support this would be useful

评论 #26565250 未加载

评论 #26564711 未加载

评论 #26565392 未加载

评论 #26564691 未加载

评论 #26564656 未加载

tandavabout 4 years ago

<pre><code> spark.sparkContext.range(1_000_000, numSlices=200).take(20)</code></pre>

coremoffabout 4 years ago

in oracle, this can be done using heirarchical queries:<pre><code> select dbms_random.value from dual connect by level < 1000001; </code></pre> edit: 1,000,001 as level starts with 1

How to create a 1M record table with a single query

11 comments

How to create a 1M record table with a single query

11 comments