Skip to content

Commit b314d7f

Browse files
committed
[readme] Update README.md and add CONTRIBUTING.md
1 parent d78069c commit b314d7f

3 files changed

Lines changed: 368 additions & 150 deletions

File tree

CONTRIBUTING.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# How to become a contributor and submit your own code
2+
3+
We'd love to accept your patches!
4+
5+
## Contributing A Patch
6+
7+
1. Submit an issue describing your proposed change to the repo in question.
8+
2. The repo owner will respond to your issue promptly.
9+
3. Fork the desired repo, develop and test your code changes.
10+
4. Ensure that your code adheres to the existing style in the sample to which you are contributing.
11+
5. Ensure that your code has an appropriate set of unit tests which all pass.
12+
6. Submit a pull request.

README.md

Lines changed: 178 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -1,105 +1,213 @@
1-
Chunjun
2-
============
1+
# ChunJun
32

4-
[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)
3+
<p align="left">
4+
<img src="https://img.shields.io/github/stars/DTStack/chunjun?style=social" alt="npm version" />
5+
<img src="https://img.shields.io/github/license/DTStack/chunjun" alt="license" />
6+
<a href="https://github.com/DTStack/chunjun/releases"><img src="https://img.shields.io/github/downloads/DTStack/chunjun/total" alt="npm downloads" /></a>
7+
<img src="https://img.shields.io/gitlab/coverage/DTStack/chunjun/master" alt="master coverage" />
8+
</p>
59

6-
English | [中文](README_CH.md)
10+
[![EN doc](https://img.shields.io/badge/document-English-blue.svg)](README.md)
11+
[![CN doc](https://img.shields.io/badge/文档-中文版-blue.svg)](README_CH.md)
712

8-
# Communication
13+
## Introduce
914

10-
- We are recruiting **Big data platform development engineers**.If you want more information about the position, please add WeChat ID [**ysqwhiletrue**] or email your resume to [sishu@dtstack.com](mailto:sishu@dtstack.com).
15+
ChunJun(formerly known as FlinkX), is a data integration tool based on Flink, which is **stable**, **easy to use**, **efficient**, and **integrated with DataStream/DataSet API**. It can realize data synchronization and calculation between various heterogeneous data sources. ChunJun has been deployed and running stably in thousands of companies so far.
1116

12-
- We use [DingTalk](https://www.dingtalk.com/) to communicate,You can search the group number [**30537511**] or scan the QR code below to join the communication group
13-
14-
<div align=center>
15-
<img src=docs/images/IMG_3362.JPG width=300 />
16-
</div>
17+
Official website of ChunJun: https://dtstack.github.io/chunjun/
1718

18-
# Introduction
19+
## Features of ChunJun
1920

20-
*[Chunjun 1.12 New Features](docs/changeLog.md)*
21+
ChunJun abstracts different databases into reader/source plugins, writer/sink plugins and lookup plugins, and it has the following features:
2122

22-
Chunjun is a data synchronization tool based on Flink. Chunjun can collect static data, such as MySQL, HDFS, etc, as well as real-time changing data, such as MySQL binlog, Kafka, etc. **At the same time, Chunjun is also a computing framework that supports all the syntax and features of native FlinkSql** , <big>**And provide a large number of [cases](Chunjun-examples)**</big>. Chunjun currently includes the following features:
23+
- Based on the real-time computing engine--Flink, and supports JSON template and SQL script configuration tasks. The SQL script is compatible with Flink SQL syntax;
24+
- Support distributed operation, support flink-standalone, yarn-session, yarn-per job and other submission methods;
25+
- Support Docker one-click deployment, support deploy and run on k8s;
26+
- Supports a variety of heterogeneous data sources, and supports synchronization and calculation of more than 20 data sources such as MySQL, Oracle, SQLServer, Hive, Kudu, etc.
27+
- Easy to expand, highly flexible, newly expanded data source plugins can integrate with existing data source plugins instantly, plugin developers do not need to care about the code logic of other plugins;
28+
- Not only supports full synchronization, but also supports incremental synchronization and interval training;
29+
- Not only supports offline synchronization and calculation, but also compatible with real-time scenarios;
30+
- Support dirty data storage, and provide indicator monitoring, etc.;
31+
- Cooperate with the flink checkpoint mechanism to achieve breakpoint resuming, task disaster recovery;
32+
- Not only supports synchronizing DML data, but also supports DDL synchronization, like 'CREATE TABLE', 'ALTER COLUMN', etc.;
2333

24-
- Most plugins support concurrent reading and writing of data, which can greatly improve the speed of reading and writing;
34+
## Build And Compilation
2535

26-
- Some plug-ins support the function of failure recovery, which can restore tasks from the failed location and save running time; [Failure Recovery](docs/restore.md)
36+
### Get the code
2737

28-
- The source plugin for relational databases supports interval polling. It can continuously collect changing data; [Interval Polling](docs/offline/reader/mysqlreader.md)
38+
Use the git to clone the code of ChunJun
2939

30-
- Some databases support opening Kerberos security authentication; [Kerberos](docs/kerberos.md)
40+
```shell
41+
git clone https://github.com/DTStack/chunjun.git
42+
```
3143

32-
- Limit the reading speed of source plugins and reduce the impact on business databases;
44+
### build
3345

34-
- Save the dirty data when writing data;
46+
Execute the command in the project directory.
3547

36-
- Limit the maximum number of dirty data;
48+
```shell
49+
./mvnw clean package -DskipTests
50+
```
3751

38-
- Multiple running modes: Local,Standalone,Yarn Session,Yarn Per;
52+
Or execute
3953

40-
- **Synchronization tasks support transformer operations that execute flinksql syntax;**
54+
```shell
55+
sh build/build.sh
56+
```
4157

42-
- **sql task support is [shared](docs/conectorShare.md) with flinkSql's own connectors;**
58+
### Multi-platform compatible
4359

44-
The following databases are currently supported:
60+
Chunjun currently supports tdh and open-source hadoop platforms, and different platforms need to be packaged with different maven commands.
4561

46-
| | Database Type | Source | Sink | Lookup
47-
|:----------------------:|:--------------:|:-------------------------------:|:-------------------------------:|:-------------------------------:|
48-
| Batch Synchronization | MySQL | [doc](docs/connectors/mysql/mysql-source.md) | [doc](docs/connectors/mysql/mysql-sink.md) |[doc](docs/connectors/mysql/mysql-lookup.md) |
49-
| | TiDB || reference mysql |reference mysql |
50-
| | Oracle | [doc](docs/connectors/oracle/oracle-source.md) | [doc](docs/connectors/oracle/oracle-sink.md) |[doc](docs/connectors/oracle/oracle-lookup.md) |
51-
| | Doris | | [doc](docs/connectors/doris/dorisbatch-sink.md) | |
52-
| | SqlServer | [doc](docs/connectors/sqlserver/sqlserver-source.md) | [doc](docs/connectors/sqlserver/sqlserver-sink.md) |[doc](docs/connectors/sqlserver/sqlserver-lookup.md)
53-
| | PostgreSQL | [doc](docs/connectors/postgres/postgres-source.md) | [doc](docs/connectors/postgres/postgres-sink.md) | [doc](docs/connectors/postgres/postgres-lookup.md) |
54-
| | DB2 | [doc](docs/connectors/db2/db2-source.md) | [doc](docs/connectors/db2/db2-sink.md) | [doc](docs/connectors/db2/db2-lookup.md)
55-
| | ClickHouse | [doc](docs/connectors/clickhouse/clickhouse-source.md) | [doc](docs/connectors/clickhouse/clickhouse-sink.md) | [doc](docs/connectors/clickhouse/clickhouse-lookup.md) |
56-
| | Greenplum | [doc](docs/connectors/greenplum/greenplum-source.md) | [doc](docs/connectors/greenplum/greenplum-sink.md) |
57-
| | KingBase | [doc](docs/connectors/kingbase/kingbase-source.md) | [doc](docs/connectors/kingbase/kingbase-sink.md) |
58-
| | MongoDB | [doc](docs/connectors/mongodb/mongodb-source.md) | [doc](docs/connectors/mongodb/mongodb-sink.md) |[doc](docs/connectors/mongodb/mongodb-lookup.md) |
59-
| | SAP HANA | [doc](docs/connectors/saphana/saphana-source.md) | [doc](docs/connectors/saphana/saphana-sink.md) |
60-
| | ElasticSearch7 | [doc](docs/connectors/elasticsearch7/es7-source.md) | [doc](docs/connectors/elasticsearch7/es7-sink.md) | [doc](docs/connectors/elasticsearch7/es7-sink.md) |
61-
| | FTP | [doc](docs/connectors/ftp/ftp-source.md) | [doc](docs/connectors/ftp/ftp-sink.md) |
62-
| | HDFS | [doc](docs/connectors/hdfs/hdfs-source.md) | [doc](docs/connectors/hdfs/hdfs-sink.md) |
63-
| | Stream | [doc](docs/connectors/stream/stream-source.md) | [doc](docs/connectors/stream/stream-sink.md) |
64-
| | Redis | | [doc](docs/connectors/redis/redis-sink.md) |[doc](docs/connectors/redis/redis-lookup.md) |
65-
| | Hive | | [doc](docs/connectors/hive/hive-sink.md) |
66-
| | Solr | [doc](docs/connectors/solr/solr-source.md) | [doc](docs/connectors/solr/solr-sink.md) |
67-
| | File | [doc](docs/connectors/file/file-source.md)
68-
| | StarRocks | | [doc](docs/connectors/starrocks/starrocks-sink.md) |
69-
| Stream Synchronization | Kafka | [doc](docs/connectors/kafka/kafka-source.md) | [doc](docs/connectors/kafka/kafka-sink.md) |
70-
| | EMQX | [doc](docs/connectors/emqx/emqx-source.md) | [doc](docs/connectors/emqx/emqx-sink.md) |
71-
| | MySQL Binlog | [doc](docs/connectors/binlog/binlog-source.md) | |
72-
| | Oracle LogMiner | [doc](docs/connectors/logminer/LogMiner-source.md) | |
73-
| | Sqlserver CDC | [doc](docs/connectors/sqlservercdc/SqlserverCDC-source.md) | |
74-
| | Postgres CDC | [doc](docs/connectors/pgwal/Postgres-CDC.md) | |
62+
| Hadoop Platformas | | Comment |
63+
|-------------------| -------------------------------------------- |--------------------------------------------------------------|
64+
| tdh | mvn clean package -DskipTests -P default,tdh | Package the inceport plugin and plugins supported by default |
65+
| default | mvn clean package -DskipTests -P default | Package the all plugins except the inceptor plugin. |
7566

76-
# Quick Start
67+
### Common problem
7768

78-
Please click [Quick Start](docs/quickstart.md)
69+
#### 1.Can not find dependencies
7970

80-
# General Configuration
71+
Solution: There are some driver packages in the directory '$ChunJun_HOME/jars', and you can install these dependencies manually or execute the command below:
8172

82-
Please click [General Configuration](docs/generalconfig.md)
73+
```bash
74+
## windows
75+
./$CHUNJUN_HOME/bin/install_jars.bat
8376

84-
# Statistics Metric
77+
## unix
78+
./$CHUNJUN_HOME/bin/install_jars.sh
79+
```
8580

86-
Please click [Statistics Metric](docs/statistics.md)
81+
#### 2. Compiling module 'ChunJun-core' then throws 'Failed to read artifact descriptor for com.google.errorprone:javac-shaded'
8782

88-
# Iceberg
89-
Please click [Iceberg](docs/iceberg.md)
83+
Error message:
9084

91-
# Kerberos
85+
```java
86+
[ERROR]Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.4.2:check(spotless-check)on project flinkx-core:
87+
Execution spotless-check of goal com.diffplug.spotless:spotless-maven-plugin:2.4.2:check failed:Unable to resolve dependencies:
88+
Failed to collect dependencies at com.google.googlejavaformat:google-java-format:jar:1.7->com.google.errorprone:javac-shaded:jar:9+181-r4173-1:
89+
Failed to read artifact descriptor for com.google.errorprone:javac-shaded:jar:9+181-r4173-1:Could not transfer artifact
90+
com.google.errorprone:javac-shaded:pom:9+181-r4173-1 from/to aliyunmaven(https://maven.aliyun.com/repository/public):
91+
Access denied to:https://maven.aliyun.com/repository/public/com/google/errorprone/javac-shaded/9+181-r4173-1/javac-shaded-9+181-r4173-1.pom -> [Help 1]
92+
```
9293

93-
Please click [Kerberos](docs/kerberos.md)
94+
Solution
95+
Download the 'javac-shaded-9+181-r4173-1.jar' from url 'https://repo1.maven.org/maven2/com/google/errorprone/javac-shaded/9+181-r4173-1/javac-shaded-9+181-r4173-1.jar', and then install locally by using command below:
9496

95-
# Questions
97+
```shell
98+
mvn install:install-file -DgroupId=com.google.errorprone -DartifactId=javac-shaded -Dversion=9+181-r4173-1 -Dpackaging=jar -Dfile=./jars/javac-shaded-9+181-r4173-1.jar
99+
```
96100

97-
Please click [Questions](docs/questions.md)
101+
## Quick Start
98102

99-
# How to contribute Chunjun
103+
The following table shows the correspondence between the branches of ChunJun and the version of flink. If the versions are not aligned, problems such as 'Serialization Exceptions', 'NoSuchMethod Exception', etc. mysql occur in tasks.
100104

101-
Please click [Contribution](docs/contribution.md)
105+
| Branches | Flink version |
106+
|--------------|---------------|
107+
| master | 1.12.7 |
108+
| 1.12_release | 1.12.7 |
109+
| 1.10_release | 1.10.1 |
110+
| 1.8_release | 1.8.3 |
102111

103-
# License
112+
ChunJun supports running tasks in multiple modes. Different modes depend on different environments and steps. The following are
104113

105-
Chunjun is under the Apache 2.0 license. See the [LICENSE](http://www.apache.org/licenses/LICENSE-2.0) file for details.
114+
### Local
115+
116+
Local mode does not depend on the Flink environment and Hadoop environment, and starts a JVM process in the local environment to perform tasks.
117+
118+
#### Steps
119+
120+
Go to the directory of 'chunjun-dist' and execute the command below:
121+
122+
```shell
123+
sh bin/chunjun-local.sh -job $SCRIPT_PATH
124+
```
125+
126+
The parameter of "$SCRIPT_PATH" means 'the path where the task script is located'.
127+
After execute, you can perform a task locally.
128+
129+
[Reference video](https://www.bilibili.com/video/BV1mT411g7fJ?spm_id_from=333.999.0.0)
130+
131+
### Standalone
132+
133+
Standalone mode depend on the Flink Standalone environment and does not depend on the Hadoop environment.
134+
135+
#### Steps
136+
137+
##### 1. Start Flink Standalone Cluster
138+
139+
```shell
140+
sh $FLINK_HOME/bin/start-cluster.sh
141+
```
142+
143+
After the startup is successful, the default port of Flink Web is 8081, which you can configure in the file of 'flink-conf.yaml'. We can access the 8081 port of the current machine to enter the flink web of standalone cluster.
144+
145+
##### 2. Submit task
146+
147+
Go to the directory of 'chunjun-dist' and execute the command below:
148+
149+
```shell
150+
sh bin/chunjun-standalone.sh -job chunjun-examples/json/stream/stream.json
151+
```
152+
153+
After the command execute successfully, you can observe the task staus on the flink web.
154+
155+
[Reference video](https://www.bilibili.com/video/BV1TT41137UV?spm_id_from=333.999.0.0)
156+
157+
### Yarn Session
158+
159+
YarnSession mode depends on the Flink jars and Hadoop environments, and the yarn-session needs to be started before the task is submitted.
160+
161+
#### Steps
162+
163+
##### 1. Start yarn-session environment
164+
165+
Yarn-session mode depend on Flink and Hadoop environment. You need to set $HADOOP_HOME and $FLINK_HOME in advance, and we need to upload 'chunjun-dist' with yarn-session '-t' parameter.
166+
167+
```shell
168+
cd $FLINK_HOME/bin
169+
./yarn-session -t $CHUNJUN_HOME -d
170+
```
171+
172+
##### 2. Submit task
173+
174+
Get the application id $SESSION_APPLICATION_ID corresponding to the yarn-session through yarn web, then enter the directory 'chunjun-dist' and execute the command below:
175+
176+
```shell
177+
sh ./bin/chunjun-yarn-session.sh -job chunjun-examples/json/stream/stream.json -confProp {\"yarn.application.id\":\"SESSION_APPLICATION_ID\"}
178+
```
179+
180+
'yarn.application.id' can also be set in 'flink-conf.yaml'.
181+
After the submission is successful, the task status can be observed on the yarn web.
182+
183+
[Reference video](https://www.bilibili.com/video/BV1oU4y1D7e7?spm_id_from=333.999.0.0)
184+
185+
### Yarn Per-Job
186+
187+
Yarn Per-Job mode depend on Flink and Hadoop environment. You need to set $HADOOP_HOME and $FLINK_HOME in advance.
188+
189+
#### Steps
190+
191+
The yarn per-job task can be submitted after the configuration is correct. Then enter the directory 'chunjun-dist' and execute the command below:
192+
193+
```shell
194+
sh ./bin/chunjun-yarn-perjob.sh -job chunjun-examples/json/stream/stream.json
195+
```
196+
197+
After the submission is successful, the task status can be observed on the yarn web.
198+
199+
## Docs of Connectors
200+
201+
For details, please visit:https://dtstack.github.io/chunjun/documents/
202+
203+
## Contributors
204+
205+
Thanks to all contributors! We are very happy that you can contribute Chunjun.
206+
207+
<a href="https://github.com/DTStack/chunjun/graphs/contributors">
208+
<img src="https://contrib.rocks/image?repo=DTStack/chunjun" alt="contributors"/>
209+
</a>
210+
211+
## License
212+
213+
ChunJun is under the Apache 2.0 license. Please visit [LICENSE](http://www.apache.org/licenses/LICENSE-2.0) for details.

0 commit comments

Comments
 (0)