Sink is a ’loading’ part of ETL(Extract, Transformation, Loading) inside Flink. It is last process of data pipeline, to store data inside datalake after it has been extract from source, and transformed into specific format.
This is example of how you can sink from Flink DataStream:
Data stream is a flow of data, which are coming from multiple sources continuously. It can be event log or database log captured by CDC internally, or sensor data from IoT devices.
These kind of streams are usually very high-frequent in production level, and in this case we will think of connecting sources into stream processing framework to handle it, such as Spark or Flink.
About year ago, I’ve wrote some post about BFF architecture. One of the architecture’s component was ‘reverse proxy’. This is for more description about it.
Proxy server Definition of proxy server in networking, means server application works as relay server between client and target server.
This is just for reminding my knowledge, about tree data structure, which most of developers will know:
yes, something looks like this.
Most of the cases we don’t need to think about the implementation deeply, cause most of the programming languages have library packages to make tree structure.
‘Data Skewness’ is a one of issue you can face oftenly while treating spark which is caused based on parallel computing.
Following image simply described the situation of ‘skewed’.
Why it happens Spark is distributed computing system, and it distributes the data separately inside cluster, to make data processing.