Status

Current state[One of "Under Discussion", "Accepted", "Rejected"]

Discussion thread: 

JIRA or Github Issue: 

Released: <Doris Version>

Google Doc: <If the design in question is unclear or needs to be discussed and reviewed, a Google Doc can be used first to facilitate comments from others.>

Motivation

Apache Doris supports functions (ST_Point,ST_LineFromText,ST_Polygon etc) to generate GEOGRAPHY values,We can combine GEOGRAPHY values with other geographic functions to do complex geographic analysis.The specific functions will be explained in detail below.

So we now need to have the following things to do:

  1. Support for constructing GEOGRAPHY values in different ways (WKT, WKB, GEOJSON)
  2. Support for more spatial types, currently supported Point, LineString, Polygon, need support MultiPoint, MultiLineString, MultiPolygon, GeometryCollection.
  3. Support for more geographic analysis functions.
  4. Support GEO functions in Nereids planner

Related Research

At present, the geo type of doris is implemented based on the S2 library (https://github.com/google/s2geometry) , The official S2 documentation(http://s2geometry.io/) gives a very comprehensive introduction to the S2 library, So I just need to introduce the necessary information here. We describe a point on Earth in terms of (lng, lat), but in S2, points are represented internally as unit-length vectors (points on the surface of a three-dimensional unit sphere) as opposed to traditional (latitude, longitude) pairs. That is to say, S2 regards the earth as a unit sphere, About space calculations on this earth Transposition vector calculations on the unit sphere, But for us, we only need to care about latitude and longitude, So we need to know two things:1. doris only supports geographic analysis(S2 based on a sphere, s2earth radius is the mean radius which averages the difference between polar and equatorial radius. It's 6,371.0088 km which is the same as the WGS-84 mean radius. ), not spatial analysis. 2. For points, we only need to know the latitude and longitude, so it's 2D for us.

I will discuss the three things mentioned above separately here:

一、Support for constructing GEOGRAPHY values in different ways

  • WKT

       Currently we use the WKT parser implemented by Yacc and Lex tools.

  • WKB(EWKB)

       “Well-known binary” is a scheme for writing a simple features geometry into a platform-independent array of bytes, usually for transport between systems or between programs.This document(https://libgeos.org/specifications/wkb/) explains wkb in detail.

      I just need to add a few points:as said above,For points, we only need to know the latitude and longitude, so it's 2D for us.Therefore, we only need to realize the conversion of the standard WKB format, However, we consider that many users use the EWKB format of postgis. In order to be compatible with EWKB, I also implemented the corresponding implementation. In fact, I just added SRID to the wkb format, and the value is 4326.This also explains that EWKB parsing can only parse data with SRID=4326, and others will return NULL,Similarly, the SRID in the result of ST_asEWKB can only be 4326, For ISO WKB, more dimensional recognition is added, which has nothing to do with us, so don't consider it.

  • GEOJSON(todo)


二、Support for more spatial types, currently supported Point, LineString, Polygon, need support MultiPoint, MultiLineString, MultiPolygon, GeometryCollection.(todo)


三、Support for more geographic analysis functions.

Done:

    DOUBLE ST_Angle(GEOPOINT point1, GEOPOINT point2, GEOPOINT point3);

    DOUBLE ST_Azimuth(GEOPOINT point1, GEOPOINT point2);

    DOUBLE ST_Area_Square_Meters(GEOMETRY geo)/ST_Area_Square_Km(GEOMETRY geo);

    DOUBLE ST_Distance_Sphere(DOUBLE x_lng, DOUBLE x_lat, DOUBLE y_lng, DOUBLE y_lat);

    BOOL ST_Contains(GEOMETRY shape1, GEOMETRY shape2);// This function seems incomplete

ToDo:

    ST_DWITHIN

    ST_INTERSECTS

    ......

Detailed Design

support wkb: https://github.com/apache/doris/pull/18526


Scheduling

  1. I will first optimize the overall code of geo, sub-categories, type: put a class for each geometry type, parse (wkt, wkb, geojson code), the core function code does not divide into categories.
  2. Support for more geometry types:MultiPoint, MultiLineString, MultiPolygon, GeometryCollection
  3. Support for constructing GEOGRAPHY values in GEOJSON
  • No labels