THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!

Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Welcome to Kylin Wiki.



Create flat table and Global Dictionary
      Improve
  • Distributed encoding
  • Using Roaring64NavigableMap, support canditary higher than Integer.MAX_VALUE
     Build process
  • Group by FlatTable RDD then distinct
  • Repartion RDD, Using DictionaryBuilderHelper.calculateBucketSize()
  • MapPartiton RDD, using DictHelper.genDict()
  • Save encoded dict file to FS, using NGlobalDictHDFSStore.writeBucketDict()
    Bucket concept
  • The bucket is used to store dictionaries. The number of bucket is just the RDD partitions(task parallelism). It has two import member variables -- relativeDictMap and absoluteDictMap.
  • At one segment building job, dictionaries are encoded parallelized and stored into RelativeDictionary and after segment building job done, dictionaries will be reencoded with buckets offsets. And this global dictionry will save to FS and tags as one version(If there's no global dictionary built before, version is 0).
  • When the next segment job starts, it will get the lastest vertion of dictionary and loaded to buckets and add new distinct values to buckts.

  • No labels