You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

Use this page to share code that you have developed for Trafodion and that could be useful for others. This could include, but is not restricted to:

  • UDFs
  • Stored Procedures
  • Scripts to manage a Trafodion instance
  • SQL queries to query Trafodion metadata

If the code is just a few lines, you could list it here. If it's multiple files, you could publish it in github and put a short description and a link here.

Also, please consider adding a license to your code in github, for example you can add an Apache license, so it is clear how your code can be used.

Stored Procedures

 

Scalar UDFs

Using Regular Expression in SQL (在SQL中使用正则表达式)

This demo UDF aims to provide an example about how to write a scalar UDF, instead of providing a production-level UDF. But one can start with it to enhance it to be more useful.

Standard SQL cannot do some complex query against a string. Like to filter out a valid phone number, a valid IP address or an email address. Regular Expression is a powerful tool to do this. This demo shows you how to write a UDF to enhance/extend Trafodion to support RegExp to do some more complex queries.

The source code can be downloaded from : https://github.com/traflm/trafodion-repos/tree/master/scalar-udf

 

Simulate to_char Oracle function

This demo will simulate the to_char function from Oracle.

The source code can be downloaded from: https://github.com/traflm/trafodion-repos/blob/master/scalar-udf/to_date.c

 

Table-Mapping UDFs

Reading Kafka Events

This UDF can read a topic from Apache Kafka. It is a basic serial Kafka consumer.

The source code can be downloaded from https://github.com/esgyn/code-examples/blob/master/src/main/java/org/trafodion/examples/udrs/udfs/table_valued/KafkaConsumerSerial

Group Concatenation

This UDF groups rows by one column and concatenates the values of one column occurring in the group, similar to the GROUP_CONCAT function in MySQL. It behaves like an aggregate or sequence function, but rather than computing min or max, etc. it computes a concatenation of the values in the group.

The source code can be downloaded from https://github.com/esgyn/code-examples/tree/master/src/main/java/org/trafodion/examples/udrs/udfs/table_valued/group_concat

JSON Column-izer

When stored in a database, JSON data is typically stored in a character column that holds the entire document, unless the JSON has been previously parsed into name-value pairs, which then presents different questions of how to store it and access it. 

A Hive table DDL can depict a JSON layout including its embedded structs and arrays.  For example, this JSON 

{"Employee" : {"Person" : {"First name" : "John", "Middle Name" : "Quincy", "Last Name" : "Smith" }, "Phones" : {"Home" : "555-555-1212", "Mobile" : "888-555-1212"} } }

...would be declared in Hive similar to the following (but column names need adjustment - Hive has problems with backticked column names within a struct):

CREATE TABLE  t1 (employee STRUCT <     

person STRUCT <

              `first name` : string,

              `middle name` : string,

              `last name` : string >,     

phones STRUCT <

              home : string,

              mobile : string > 

>)  

ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

The serde interprets the JSON and provides access to individual JSON elements.  Whether you load data into this (INTERNAL) table or define it as an EXTERNAL table and link it to file(s) in HDFS, physically, the JSON is still a single string of characters - it is not stored as multiple columns.  A HiveQL SELECT statement would be like the following with JsonSerDe resolving the "column" reference to extract the correct value.  Note that use of hierarchical naming is essential.

SELECT employee.person.`first name` FROM t1;

However, when a variety of JSON formats occur within the same data stream, it becomes very cumbersome to define a single table with all the tags that accommodates all the possible forms.  Challenges such as these in making JSON query-able led to this UDF.

This is a Table Mapping UDF that expects a single column containing a JSON document.  Parameters passed in the call identify JSON tags whose values are to be output.  For this, we show creating a new table in Hive (or a similar one can be created in Trafodion).

> hive

CREATE TABLE t2 (jsondata STRING);

LOAD DATA LOCAL PATH '<json text input file>'  OVERWRITE INTO TABLE t2;

or

> hive

CREATE EXTERNAL TABLE t2 (jsondata STRING) ROW FORMAT DELIMITED STORED AS TEXTFILE LOCATION '<hdfs dir containing data>';

(files in the hdfs dir are implicitly linked to the table)

Applying the JSON layout above, one sample invocation of the UDF is

SELECT * FROM udf( unjson( TABLE(select * from hive.hive.t2), 'employee.person.last name', 'employee.person.first name', 'employee.phones.mobile' ));

The SELECT-list within the TABLE() specification should resolve to a single column containing the JSON.  The table referenced can be any Trafodion-supported type.  There is no defined limit on the number of tag parameters included in the call.  Note that the tag names match how they are spelled in the JSON document (case insensitive).  The output will be three columns for each row, either with the extracted values or nulls. 

The java source can be copied from: https://github.com/esgyn/code-examples/blob/master/json_columnizer.

Trafodion steps after java compiling, jar'ing, and uploading to every node of the Trafodion cluser:

> sqlci

create library <your library name> file '<the full path name of your jar>';

create table_mapping function <what you want> ()
external name '<your class>'                       -- name of your class
language java
library <your library name>;

Manageability

 

 

Tools

 

SQL Scripts and Queries

 

Miscellaneous

 

 

 

  • No labels