...
- The TPC-DS generator is a private function, and needs to be imported to be used.
- The TPC-DS generator is based on the dsdgen TPC-DS tool for generating the data.
- The data is generated in parallel, utilizing all the available partitions on all the available nodes.
- The generated data types are with accordance to the data types in the TPC-DS Schema. Each data type is converted to its respective proper type during the data generation (i.e integers, doubles, strings, ... etc)DATE and TIME types are treated as String.The mapping between TPC-DS types and AsterixDB's types is shown below.
Data Types Mapping
TPC-DS Type | AsterixDB Type |
---|---|
Identifier | String |
Integer | Integer |
Decimal | Double |
Char | String |
Varchar | String |
Date | String |
Time | String |
Function Signatures
Two versions of the
...
function exist, namely, one-parameter and two-parameter versions, as explained below:
...
- one-parameter version:
...
-
tpcds_datagen(scalefactor)
This version of the function takes a single parameter, namely, scalefactor, and generates the data for all the tables for the specified scalefactor.
...
- two-parameters
...
- version:
tpcds_datagen(tablename, scalefactor)
This version of the function takes two parameters, namely, tablename and scalefactor (in this order), and generates the data for the specified table tablename only, for the specified scalefactor.
Scale Factor to Data Size Ratio
The table below shows the relation between the scale factor and the generated data size. Each scale factor translates to approximately 1 GB of data generation.
Scale Factor | Generated Size |
---|---|
1 | 1GB |
2 | 2GB |
10 | 10GB |
1000 | 1TB |
100000 | 100TB |
TPC-DS Specification References
...