If you're running PostgreSQL, you can use the built-in generate_series (1) function like:<p><pre><code> SELECT id, random()
FROM generate_series(0, 1000000) g (id);
</code></pre>
There seems to be an equivalent in SQLite: <a href="https://sqlite.org/series.html" rel="nofollow">https://sqlite.org/series.html</a><p>[1] <a href="https://www.postgresql.org/docs/current/functions-srf.html" rel="nofollow">https://www.postgresql.org/docs/current/functions-srf.html</a>
If your DB doesn't support this technique, you might be able to use this rather disgusting technique, which builds an exponentially growing tower of rows by using nested queries with JOIN's. <a href="https://stackoverflow.com/a/61169467" rel="nofollow">https://stackoverflow.com/a/61169467</a>
This looks convenient (and performant). But how does it scale as queries join across tables?<p>If you need to create test data with complex business logic, referential integrity and constraints we've been working on declarative data generator that is build exactly for this: <a href="https://github.com/openquery-io/synth" rel="nofollow">https://github.com/openquery-io/synth</a>.
If you need another repeatable way to create random data that you can export as SQL (or CSV/Excel files). You may find a tool we built and use at work useful: <a href="https://github.com/creditdatamw/zefaker" rel="nofollow">https://github.com/creditdatamw/zefaker</a><p>Needs a little Groovy but very convenient for generating random (or non-random) data.
i generally end up writing a data generator using the language and apis in the application. I often want control over various aspects of the data and built the generator as such. Quite often I just generate queries and then run them which works quickly.<p>While this looks like a good way to generate simple data, practical applications are more involved.
Can someone give me a specific use case for this? As "check how a query behaves on a large table" is very vague to me.<p>E.g. I have a table structure but not alot of rows in it, so i go use this to get alot of rows in to check how fast queries get processed?
usually there is a system table with a ton of metadata that will for sure contain few thousand rows, so generating a million rows for SQL Server is simply:<p>select top 1000000 ROW_NUMBER()
from sys.objects a, sys.objects b<p>or you can use INFORMATION_SCHEMA which is more portable across different RDBMS engines
in oracle, this can be done using heirarchical queries:<p><pre><code> select dbms_random.value from dual connect by level < 1000001;
</code></pre>
edit: 1,000,001 as level starts with 1