Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Status

Current state: Accepted

Discussion thread

JIRASAMZA-1901

Released: 

Problem

Samza SQL users have long been asking for a one stop shell, in which SQL statements and SQL files can be executed interactively. The requirement can be summarized as following:

  • An interactive command shell

  • Execute SQL queries and non-queries, show tables, functions, schema, etc.

  • Execute SQL files

  • Set environment variables interactively and/or via configuration file to configure behavior of shell

  • Command editing, history, highlight and auto-completion

  • Display SQL query in a streaming friendly way

  • List status of submitted non-query SQL statements and be able to stop them

  • Apart from Samza SQL, support other streaming SQLs and data sources by letting user implement their own plugins

Proposed Changes

We need to implement a Samza SQL shell. 

Terminal: Choice of Programming Language and Libraries

Any shell is essentially a REPL (Repeat-Evaluate-Print Loop), yet any non-trivial one is much more than that. Features like highlighting, history, command line editing, auto completion and rich data presentation require the capability of full control of the terminal. Terminal programming is a messy area, however, with the complexity coming from the long history and the vast variety of terminal types and operating systems. C/C++ is ideal for terminal IO programming, but for easier interoperation with data sources and other systems, Java is chosen.

...

terminal.puts(InfoCmp.Capability.enter_ca_mode); // tput smcup
terminal.puts(InfoCmp.Capability.exit_ca_mode); // tput rmcup

Shell Commands

 Use “HELP” to show all commands and their brief descriptions, or “help <command>” to get more detail about a specific command.

...

Use "CLEAR" command to clear the screen.   
Use "EXIT" or "QUIT" to quit the shell.

Accessing Samza

Samza SQL shells run jobs in Samza standalone mode. Samza executor can run multiple non-query jobs as backend jobs simultaneously, and run a single query job as a front-end job. Job results are stored in a data buffer. Samza SQL shell provides commands for users to manage their jobs easily. 

Configuration

The configuration file is located at "conf/shell-defaults.conf".  Most of the variables set here can also be set in the shell via command "SET". Some of the important configurations are listed below.

shell.executor

Specifies which SQL executor to use. By default org.apache.samza.sql.client.impl.SamzaExecutor is used, a.k.a Samza SQL.

shell.debug

Some of the samza code, though not supposed to, actually prints something to the stdout and stderr when a SQL is being executed. Those messages shall not be seen by the end user, and more importantly, they mess up the terminal display. Imagine you are painting a table on the screen and refreshing it every second, and part of it are replaced by some INFO or WARNING information!

When the shell starts up it takes control of the terminal completely and turns off all output to Java System.out and System.err. To prevent the shell from doing so, set "shell.debug" to true.

Public Interfaces

Samza SQL shell is designed to support other SQLs like Spark SQL. The org.apache.samza.sql.client.interfaces package enables user to implement the SQL executor of their own by implementing the SqlExecutor interface. Other interfaces and classes in the package serves for this purpose.

Implementation

Data Feed Model

Many SQL frameworks today use callback functions to feed data to the client. Our model does NOT allow an executor to print directly to the screen. An executor shall cache the data and wait for the shell to retrieve the data upon necessary (like when the shell decides to refresh the screen).

Error Handling

We believe exceptions are exceptional. When implementing a SqlExecutor, report recoverable errors (like table does not exist in a select statement) by return values, not exceptions. Report EXCEPTIONAL and unrecoverable conditions like unexpected corrupted or inconsistent data structures by thrown an ExecutionException. The shell will report the exception to the user and try to clean up and exit. 

Command Line Editor

The shell supports primitive auto completion and highlighting at the moment. SQL Keywords are highlighted and pressing tab key triggers auto completion. It can NOT handle keywords with spaces (like INSERT INTO) and double quotes though. This will be improved as a future work.

...

As any common shell, CTRL C gives up the current line input, CTRL D ends the session, CTRL A moves cursor to the start of the line and CTRL E moves the cursor to the end of the line.

Displaying the query result

Assume an user tries to execute SQL statement that selects profiles of those who have visited his LinkedIn page in the past 5 minutes. How do we display the result?

When the data volume is huge and comes very fast, a table view is probably the best. No matter how much the data volume is, the screen always displays one page out of maybe 5,000 pages. The page may contain 50 lines, each of which displays some brief information of a record. User can navigate through pages and select the row they are interested in and see the detail of the record.

When the amount of data is much less, however, the table view is not straightforward. A logging view, which continuously displays the newly coming data and scrolls the screen up is more convenient for the user. Of course, the user can still pause and resume the screen so they can examine a specific record that they are interested in.

The shell supports the logging view at the moment. The table view will be supported as a future work.

The SamzaSqlExecutor

SamzaExecutor is the default implementation of SqlExecutor. SamzaExecutor returns job execution results into a data buffer, where the shell retrieves results and then show those results in the terminal with the given format specified by users.

...

Execute query jobs. SamzaExecutor runs a single query job as a front-end job. It stores query results in a data buffer, where Shell retrieves results.

Compile and Debug

Build:

./gradlew clean releaseSqlShellTarGz

...