Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Current stateUnder Discussion

Discussion thread: discuss mail

JIRA:

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyKAFKA-9436

...

Code Block
{
   "IP" : "111.61.73.113"
   ,"RemoteUser" : "-"
   ,"AuthedRemoteUser" : "-"
   ,"DateTime" : "08/Aug/2019:18:15:29 +0900"
   ,"Method" : "OPTIONS"
   ,"Request" : "/api/v1/service_config"
   ,"Protocol" : "HTTP/1.1"
   ,"Response" : "200"
   ,"BytesSent" : "-"
   ,"Ms" : "101989"
   ,"Referrer" : "http://local.test.com"
   ,"UserAgent"
 : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36"}



2. Parsing Patterned String ( such as URL )


Input

Code Block
https://kafka.apache.org/documentation/#connect"

...

name

description

type

default

valid values

importance

regex
Ordered String Regex Group Mapping Keys ( with :{TYPE} )PatternString
grouped regular expression stringstring medium
mapping
String Ordered Regex Group PatternMapping KeysString
comma seperated namesmedium

...

Code Block
"transforms": "RegexTransform",
"transforms.RegexTransform.type": "org.apache.kafka.connect.transforms.ToStructByRegexTransform$ValueParseStructByRegex$Value",

"transforms.RegexTransform.regex": "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(GET|POST|OPTIONS|HEAD|PUT|DELETE|PATCH) (.+?) (.+?)\" (\\d{3}) ([0-9|-]+) ([0-9|-]+) \"([^\"]+)\" \"([^\"]+)\""

"transforms.RegexTransform.mapping": "IP,RemoteUser,AuthedRemoteUser,DateTime,Method,Request,Protocol,Response,BytesSent,Ms,Referrer,UserAgent"

...

Code Block
"transforms": "RegexTransform",
"transforms.RegexTransform.type": "org.apache.kafka.connect.transforms.ToStructByRegexTransform$ValueParseStructByRegex$Value",

"transforms.RegexTransform.regex": "^(https?):\\/\\/([^/]*)/(.*)"

"transforms.RegexTransform.mapping": "protocol,domain,path"

...

Only one main abstract class and one validator class added : ToStructByRegexTransformParseStructByRegex, GroupRegexValidator 

this include

  1. describe/declare part
  2. main functionstype case support


1. describe/declare part

Code Block
languagejava
public abstract class ToStructByRegexTransfo
rm<RParseStructByRegex<R extends ConnectRecord<R>> implements Transformation<R> {
    public static final String OVERVIEW_DOC = "Generate key/value Struct objects supported by ordered Regex Group"
        + "<p/>Use the concrete transformation type designed for the record key (<code>" + Key.class.getName() + "</code>) "
        + "or value (<code>" + Value.class.getName() + "</code>).";

    private static final String TYPE_DELIMITER = ":";

    private interface ConfigName {
        String REGEX = "regex";
        String MAPPING_KEY = "mapping";
    }


    public static final ConfigDef CONFIG_DEF = new ConfigDef()
        .define(ConfigName.REGEX, ConfigDef.Type.STRING, ConfigDef.NO_DEFAULT_VALUE, new GroupRegexValidator(), ConfigDef.Importance.MEDIUM,
            "String Regex Group Pattern.")
        .define(ConfigName.MAPPING_KEY, ConfigDef.Type.LIST, ConfigDef.NO_DEFAULT_VALUE, ConfigDef.Importance.MEDIUM,
            "Ordered Regex Group Mapping Keys");


    private static final String PURPOSE = "TransformParse Struct by regex group mapping";


 ...



2. main functions interface

Detail Code


Code Block
languagejava
@Override
    public R apply(R record) {
        if (operatingSchema(record) == null) {
            return applySchemaless(record);
        } else {
            return applyWithSchema(record);
        }
    }

    private R applySchemaless(R record) {
        ...
    }

    private R applyWithSchema(R record) {
        ...
    }

...