Authors: Dian Fu, Jincheng Sun
Page properties | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Status
Current state: ["Under Discussion"]
Discussion thread: here (<- link to https://mail-archives.apache.org/mod_mbox/flink-dev/)
JIRA: here (<- link to https://issues.apache.org/jira/browse/FLINK-XXXX)
...
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
...
- It provides users the ability to switch between PyFlink and Pandas seamlessly when processing data in Python language. Users could process data using one execution engine and switch to another seamlessly. For example, it may happen that users have already got a Pandas dataframe at hand and want to perform some expensive transformation of it. Then they could convert it to a PyFlink table and leverage the power of Flink engine. Users could also convert a PyFlink table to Pandas dataframe and perform transformation of it with the rich functionalities provided by the Pandas ecosystem.
- No third-party No intermediate connectors are needed when converting between them.
...
The signature of `from_pandas` is as following:
class TableEnvironment(object): def from_pandas(self, pdf: pd.DataFrame, schema: Union[RowType, List[str], Tuple[str], List[DataType], Tuple[DataType]] = None, splits_num: int = None) -> Table: pass |
The argument `schema` is used to specify the schema of the result table:
...
The signature of `to_pandas` is as following:
class Table(object): def to_pandas(self) -> pd.DataFrame: pass |
Proposed Changes
from_pandas
...