Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Proposers

Approvers

Status

Current state:Under Discussion  

Status
colourYellow
titleIn progress

Discussion thread: here

JIRA:here 

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyHUDI-504

Released: <Hudi Version>

Abstract

...

  • We only have one version of docs kept at the asf-site branch for the latest release.  Given that each version has new features and improvements, some involving configuration and parameter changes compared to to previous versions, the single version of docs can create confusion for users using a previous release of Hudi.
  • There's no API docs generated from the code.
  • The landing page of hudi.apache.org shows detailed information about Hudi.  It would be good to show highlights of Hudi at the high level and other useful information (powered-by, slack group, etc) directly on the landing page.
  • Current process of building, testing, and deploying docs (i.e the content powering hudi.apache.org) is mostly manual.

...

To address these gaps, restructuring of the docs is needed to make the process easier.   Migration with Travis-ci, it can build the asf-site branch and execute callback scripts.

The diagram below shows the structuring and the workflow.

drawioImage AddedbordertrueviewerToolbartruefitWindowfalsediagramNameDocs workflowsimpleViewerfalsewidthdiagramWidth641revision11

Versioning docs

The release-specific documentation should evolve with the code changes.  Thus, maintaining the docs regarding design, implementation details and API examples on master under /docs would make things easier.  In this way, the code and the corresponding docs change can coexist in the same PR.  Specific pages around these include:

  • Quickstart
  • Concepts
  • Writing Data
  • Querying Data
  • Configuration
  • Performance
  • Administering

In addition to these docs, a new set of API docs will be generated by javadoc for each release (similar to this).

Content regarding the general information of Hudi should remain in asf-site branch:

  • Use Cases
  • Talks & Powered By
  • Comparison
  • Releases
  • Community
  • Code
  • Developers
  • Feedback

The release-specific docs on master and general content in asf-site are generated separately.  The generated web pages are then uploaded to the hosting server for the Hudi site.  The landing page references the specific version of docs content through links.

For backward compatibility, we can manually generate the content for current (0.5.0-incubating) and one or two old releases (0.4.6/0.4.7), to fit into the new docs structure. 

Automating docs update

To make updating and deploying docs content easier, a set of scripts automating the above process will be added.  In this way, once the changes to the /docs on either master or asf-site are landed, the update content pages can be uploaded to the hosting server of Hudi website.  The API docs on master branch is also going to be maintained in this way.

Redesigning landing page

The landing page should serve the following purposes, with as fewer words as possible

  1. Answer what is Hudi
  2. Provide insights into why someone should use Hudi (some answers in FAQ)
  3. Highlight key features
  4. Have the link to the comparison page if the user would like to read more
  5. Direct links/buttons to join the community

Here are the proposed changes corresponding to each item:

  1. Simplify the existing content to 2-3 sentences, with a diagram showing the interoperability with existing data ecosystem
  2. 3-4 sentences on why someone should use Hudi, with a link to the FAQ
  3. Highlight a few key features, each one elaborated with the technical details in 2-3 sentences, something like the following (from this set of slides by Vinoth and Balaji)
    E.g., "Near real-time data ingestion to Cloud storage/DFSBy carefully managing how data is laid out in storage & how it’s exposed to queries, Hudi is able to power a rich data ecosystem where external sources can be ingested in near real-time and made available for interactive SQL Engines like Presto & Spark"
    Image RemovedImage Removed
  4. Have a few sentences on the comparison against other solutions.  Have a link to the detailed comparison page.
  5. Direct links for joining the slack group and mailing list.  Right now these links are in the community page and requires additional clicks and reading.

For the detailed comparison page, we could have an interactive table to compare Hudi with other solutions regarding different aspects (each aspect can be clickable to show detailed explanation):

Image Removed

The top menu bar on the landing page is still kept.

Rollout/Adoption Plan

  • No impact on existing users regarding APIs as this is docs website change.

Test Plan

...

Key Points

There are several key points need to consider:

  • How to migration with travis-ci
  • How to use git command safely in travis-ci (important)

To use git command safely, hidden the password, we need use bellow form.

Code Block
languagejava
git clone https://${GIT_TOKEN}@github.com/${GIT_REPO}/${GIT_PROJECT}.git

Steps to get the {GIT_TOKEN}

Steps - 01 : Enable two-factor authentication

https://help.github.com/en/github/authenticating-to-github/about-two-factor-authentication

Image Added


Steps - 02 : Generate personal access tokens

https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line

Image Added

Steps to migration with travis-ci

Steps - 01 : Active repository

Image Added

Steps - 02 : Add {GIT_TOKEN} to environment variables

Image Added

Steps - 03 : Add .travis.yml file to asf-site branch

Code Block
languageyml
language: ruby
rvm:
  - 2.6.3

git:
  clone: false

env:
  global:
    - GIT_USER="CI BOT"
    - GIT_EMAIL="cibot@test.com"
    - GIT_REPO="apache"
    - GIT_PROJECT="incubator-hudi"
    - GIT_BRANCH="asf-site"
    - DOCS_ROOT="`pwd`/${GIT_PROJECT}/docs"

before_install:
  - git config --global user.name ${GIT_USER}
  - git config --global user.email ${GIT_EMAIL}
  - git clone https://${GIT_TOKEN}@github.com/${GIT_REPO}/${GIT_PROJECT}.git
  - cd ${GIT_PROJECT} && git checkout ${GIT_BRANCH}
  - gem install bundler:2.0.2

script:
  - pushd ${DOCS_ROOT}
  - bundle install
  - bundle update --bundler
  - bundle exec jekyll build _config.yml --source . --destination _site
  - popd

after_success:
  - \cp -rf ${DOCS_ROOT}/_site/* test-content
  - git add -A
  - git commit -am "Travis CI build asf-site"
  - git push origin asf-site --force

branches:
  only:
    - asf-site


Steps - 04 : Check whether ci works well or not

Image Added

End

Hope, this will works well

Test

had test https://github.com/lamber-ken/lamber-ken.github.io, when commits, it will generate docs and push the content to asf-site branch.

Image Added