Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • The TPC-DS generator is a private function, and needs to be imported to be used.
  • The TPC-DS generator is based on the dsdgen TPC-DS tool for generating the data.
  • The data is generated in parallel, utilizing all the available partitions on all the available nodes.
  • The generated data types are with accordance to the data types in the TPC-DS Schema. Each data type is converted to its respective proper type during the data generation (i.e integers, doubles, strings, ... etc)DATE and TIME types are treated as String.The mapping between TPC-DS types and AsterixDB's types is shown below.

Data Types Mapping

TPC-DS TypeAsterixDB Type
IdentifierString
IntegerInteger
DecimalDouble
CharString
VarcharString
DateString
TimeString


Function Signatures

Two versions of the

...

function exist, namely, one-parameter and two-parameter versions, as explained below:

...

  • one-parameter version:

...

  •  tpcds_datagen(scalefactor)

This version of the function takes a single parameter, namely,  scalefactor, and generates the data for all the tables for the specified scalefactor.

...


  • two-parameters

...

  • version: tpcds_datagen(tablename, scalefactor)

This version of the function takes two parameters, namely, tablename and scalefactor (in this order), and generates the data for the specified table tablename only, for the specified scalefactor.


Scale Factor to Data Size Ratio

The table below shows the relation between the scale factor and the generated data size. Each scale factor translates to approximately 1 GB of data generation.

Scale FactorGenerated Size
11GB
22GB
1010GB
10001TB
100000100TB


TPC-DS Specification References

...