Skip to content

Commit d61fe47

Browse files
authored
Improve example docs around SQLExecuteQueryOperator in Druid/Hive/Impala/Kylin/Pinot (#48856)
* add hive operators doc and example * add druid operators doc and example * add impala operators doc and example * add kylin operators doc and example * add pinot operators doc and example * fix lint issue * fix lint issue #2
1 parent fd241fc commit d61fe47

16 files changed

Lines changed: 856 additions & 117 deletions

File tree

Lines changed: 59 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,74 @@
1-
.. Licensed to the Apache Software Foundation (ASF) under one
2-
or more contributor license agreements. See the NOTICE file
3-
distributed with this work for additional information
4-
regarding copyright ownership. The ASF licenses this file
5-
to you under the Apache License, Version 2.0 (the
6-
"License"); you may not use this file except in compliance
7-
with the License. You may obtain a copy of the License at
1+
.. Licensed to the Apache Software Foundation (ASF) under one
2+
or more contributor license agreements. See the NOTICE file
3+
distributed with this work for additional information
4+
regarding copyright ownership. The ASF licenses this file
5+
to you under the Apache License, Version 2.0 (the
6+
"License"); you may not use this file except in compliance
7+
with the License. You may obtain a copy of the License at
88
9-
.. http://www.apache.org/licenses/LICENSE-2.0
9+
.. http://www.apache.org/licenses/LICENSE-2.0
1010
11-
.. Unless required by applicable law or agreed to in writing,
12-
software distributed under the License is distributed on an
13-
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14-
KIND, either express or implied. See the License for the
15-
specific language governing permissions and limitations
16-
under the License.
11+
.. Unless required by applicable law or agreed to in writing,
12+
software distributed under the License is distributed on an
13+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
KIND, either express or implied. See the License for the
15+
specific language governing permissions and limitations
16+
under the License.
1717
18+
.. _howto/operator:DruidOperator:
1819

19-
Apache Druid Operators
20-
======================
20+
SQLExecuteQueryOperator to connect to Apache Druid
21+
====================================================
2122

22-
Prerequisite
23-
------------
23+
Use the :class:`SQLExecuteQueryOperator<airflow.providers.common.sql.operators.sql.SQLExecuteQueryOperator>` to execute SQL queries against an
24+
`Apache Druid <https://druid.apache.org/>`__ cluster.
2425

25-
To use :class:`~airflow.providers.apache.druid.operators.druid.DruidOperator`,
26-
you must configure a Druid Connection first.
26+
.. note::
27+
Previously, a dedicated operator for Druid might have been used.
28+
After deprecation, please use the ``SQLExecuteQueryOperator`` instead.
2729

28-
DruidOperator
29-
-------------------
30+
.. note::
31+
Make sure you have installed the ``apache-airflow-providers-apache-druid`` package to enable Druid support.
3032

31-
Submit a task directly to Druid, you need to provide the filepath to the Druid index specification ``json_index_file``, and the connection id of the Druid overlord ``druid_ingest_conn_id`` which accepts index jobs in Airflow Connections.
32-
In addition, you can provide the ingestion type ``ingestion_type`` to determine whether the job is Batch Ingestion or SQL-based ingestion.
33+
Using the Operator
34+
^^^^^^^^^^^^^^^^^^
3335

34-
There is also a example content of the Druid Ingestion specification below.
36+
Use the ``conn_id`` argument to connect to your Apache Druid instance where
37+
the connection metadata is structured as follows:
3538

36-
For parameter definition take a look at :class:`~airflow.providers.apache.druid.operators.druid.DruidOperator`.
39+
.. list-table:: Druid Airflow Connection Metadata
40+
:widths: 25 25
41+
:header-rows: 1
3742

38-
Using the operator
39-
""""""""""""""""""
43+
* - Parameter
44+
- Input
45+
* - Host: string
46+
- Druid broker hostname or IP address
47+
* - Schema: string
48+
- Not applicable (leave blank)
49+
* - Login: string
50+
- Not applicable (leave blank)
51+
* - Password: string
52+
- Not applicable (leave blank)
53+
* - Port: int
54+
- Druid broker port (default: 8082)
55+
* - Extra: JSON
56+
- Additional connection configuration, such as:
57+
``{"endpoint": "/druid/v2/sql/", "method": "POST"}``
4058

41-
.. exampleinclude:: /../tests/system/apache/druid/example_druid_dag.py
42-
:language: python
43-
:dedent: 4
44-
:start-after: [START howto_operator_druid_submit]
45-
:end-before: [END howto_operator_druid_submit]
59+
An example usage of the SQLExecuteQueryOperator to connect to Apache Druid is as follows:
60+
61+
.. exampleinclude:: /../tests/system/apache/druid/example_druid.py
62+
:language: python
63+
:start-after: [START howto_operator_druid]
64+
:end-before: [END howto_operator_druid]
4665

4766
Reference
48-
"""""""""
67+
^^^^^^^^^
68+
For further information, see:
69+
70+
* `Apache Druid Documentation <https://druid.apache.org/docs/latest/>`__
4971

50-
For more information, please refer to `Apache Druid Ingestion spec reference <https://druid.apache.org/docs/latest/ingestion/ingestion-spec.html>`_.
72+
.. note::
73+
Parameters provided directly via SQLExecuteQueryOperator() take precedence over those specified
74+
in the Airflow connection metadata (such as ``schema``, ``login``, ``password``, etc).
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
"""
18+
This is an example DAG for the use of the SQLExecuteQueryOperator with Druid.
19+
"""
20+
21+
from __future__ import annotations
22+
23+
import datetime
24+
from textwrap import dedent
25+
26+
from airflow import DAG
27+
from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator
28+
29+
DAG_ID = "example_druid"
30+
31+
with DAG(
32+
dag_id=DAG_ID,
33+
start_date=datetime.datetime(2025, 1, 1),
34+
default_args={"conn_id": "my_druid_conn"},
35+
schedule="@once",
36+
catchup=False,
37+
) as dag:
38+
# [START howto_operator_druid]
39+
40+
# Task: List all published datasources in Druid.
41+
list_datasources_task = SQLExecuteQueryOperator(
42+
task_id="list_datasources",
43+
sql="SELECT DISTINCT datasource FROM sys.segments WHERE is_published = 1",
44+
)
45+
46+
# Task: Describe the schema for the 'wikipedia' datasource.
47+
# Note: This query returns column information if the datasource exists.
48+
describe_wikipedia_task = SQLExecuteQueryOperator(
49+
task_id="describe_wikipedia",
50+
sql=dedent("""
51+
SELECT COLUMN_NAME, DATA_TYPE
52+
FROM INFORMATION_SCHEMA.COLUMNS
53+
WHERE TABLE_NAME = 'wikipedia'
54+
""").strip(),
55+
)
56+
57+
# Task: Count rows for the 'wikipedia' datasource.
58+
# Here we count the segments for 'wikipedia'. If the datasource is not ingested, it returns 0.
59+
select_count_from_datasource = SQLExecuteQueryOperator(
60+
task_id="select_count_from_datasource",
61+
sql="SELECT COUNT(*) FROM sys.segments WHERE datasource = 'wikipedia'",
62+
)
63+
64+
# [END howto_operator_druid]
65+
66+
list_datasources_task >> describe_wikipedia_task >> select_count_from_datasource
67+
68+
# Optional: watcher for system tests
69+
from tests_common.test_utils.watcher import watcher
70+
71+
list(dag.tasks) >> watcher()
72+
73+
from tests_common.test_utils.system_tests import get_test_run # noqa: E402
74+
75+
test_run = get_test_run(dag)

providers/apache/hive/docs/operators.rst

Lines changed: 52 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -15,26 +15,64 @@
1515
specific language governing permissions and limitations
1616
under the License.
1717
18-
Apache Hive Operators
19-
=====================
2018
21-
The Apache Hive data warehouse software facilitates reading, writing,
22-
and managing large datasets residing in distributed storage using SQL.
23-
Structure can be projected onto data already in storage.
2419
25-
HiveOperator
26-
------------
20+
.. _howto/operator:HiveOperator:
2721

28-
This operator executes hql code or hive script in a specific Hive database.
22+
SQLExecuteQueryOperator to connect to Apache Hive
23+
====================================================
2924

30-
.. exampleinclude:: /../tests/system/apache/hive/example_twitter_dag.py
31-
:language: python
32-
:dedent: 4
33-
:start-after: [START create_hive]
34-
:end-before: [END create_hive]
25+
Use the :class:`SQLExecuteQueryOperator<airflow.providers.common.sql.operators.sql>` to execute
26+
Hive commands in an `Apache Hive <https://cwiki.apache.org/confluence/display/Hive/Home>`__ database.
27+
28+
.. note::
29+
Previously, ``HiveOperator`` was used to perform this kind of operation.
30+
After deprecation this has been removed. Please use ``SQLExecuteQueryOperator`` instead.
31+
32+
.. note::
33+
Make sure you have installed the ``apache-airflow-providers-apache-hive`` package
34+
to enable Hive support.
35+
36+
Using the Operator
37+
^^^^^^^^^^^^^^^^^^
38+
39+
Use the ``conn_id`` argument to connect to your Apache Hive instance where
40+
the connection metadata is structured as follows:
41+
42+
.. list-table:: Hive Airflow Connection Metadata
43+
:widths: 25 25
44+
:header-rows: 1
3545

46+
* - Parameter
47+
- Input
48+
* - Host: string
49+
- HiveServer2 hostname or IP address
50+
* - Schema: string
51+
- Default database name (optional)
52+
* - Login: string
53+
- Hive username (if applicable)
54+
* - Password: string
55+
- Hive password (if applicable)
56+
* - Port: int
57+
- HiveServer2 port (default: 10000)
58+
* - Extra: JSON
59+
- Additional connection configuration, such as the authentication method:
60+
``{"auth": "NOSASL"}``
61+
62+
An example usage of the SQLExecuteQueryOperator to connect to Apache Hive is as follows:
63+
64+
.. exampleinclude:: /../tests/system/apache/hive/example_hive.py
65+
:language: python
66+
:start-after: [START howto_operator_hive]
67+
:end-before: [END howto_operator_hive]
3668

3769
Reference
3870
^^^^^^^^^
71+
For further information, look at:
72+
73+
* `Apache Hive Documentation <https://cwiki.apache.org/confluence/display/Hive/Home>`__
74+
75+
.. note::
3976

40-
For more information check `Apache Hive documentation <https://hive.apache.org/>`__.
77+
Parameters provided directly via SQLExecuteQueryOperator() take precedence
78+
over those specified in the Airflow connection metadata (such as ``schema``, ``login``, ``password``, etc).
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
"""
18+
This is an example DAG for the use of the SQLExecuteQueryOperator with Hive.
19+
"""
20+
21+
from __future__ import annotations
22+
23+
import datetime
24+
25+
from airflow import DAG
26+
from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator
27+
28+
DAG_ID = "example_hive"
29+
30+
with DAG(
31+
dag_id=DAG_ID,
32+
start_date=datetime.datetime(2025, 1, 1),
33+
default_args={"conn_id": "my_hive_conn"},
34+
schedule="@once",
35+
catchup=False,
36+
) as dag:
37+
# [START howto_operator_hive]
38+
39+
create_table_hive_task = SQLExecuteQueryOperator(
40+
task_id="create_table_hive",
41+
sql="create table hive_example(a string, b int) partitioned by(c int)",
42+
)
43+
44+
# [END howto_operator_hive]
45+
46+
alter_table_hive_task = SQLExecuteQueryOperator(
47+
task_id="alter_table_hive",
48+
sql="alter table hive_example add partition(c=1)",
49+
)
50+
51+
insert_data_hive_task = SQLExecuteQueryOperator(
52+
task_id="insert_data_hive",
53+
sql="insert into hive_example partition(c=1) values('a', 1), ('a', 2),('b',3)",
54+
)
55+
56+
select_data_hive_task = SQLExecuteQueryOperator(
57+
task_id="select_data_hive",
58+
sql="select * from hive_example",
59+
)
60+
61+
drop_table_hive_task = SQLExecuteQueryOperator(
62+
task_id="drop_table_hive",
63+
sql="drop table hive_example",
64+
)
65+
66+
(
67+
create_table_hive_task
68+
>> alter_table_hive_task
69+
>> insert_data_hive_task
70+
>> select_data_hive_task
71+
>> drop_table_hive_task
72+
)
73+
74+
from tests_common.test_utils.watcher import watcher
75+
76+
# This test needs watcher in order to properly mark success/failure
77+
# when "tearDown" task with trigger rule is part of the DAG
78+
list(dag.tasks) >> watcher()
79+
80+
from tests_common.test_utils.system_tests import get_test_run # noqa: E402
81+
82+
# Needed to run the example DAG with pytest (see: tests/system/README.md#run_via_pytest)
83+
test_run = get_test_run(dag)

providers/apache/impala/docs/index.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
:caption: Guides
3636

3737
Connection Types <connections/impala>
38+
Operators <operators>
3839

3940
.. toctree::
4041
:hidden:
@@ -58,6 +59,14 @@
5859

5960
Detailed list of commits <commits>
6061

62+
.. toctree::
63+
:hidden:
64+
:maxdepth: 1
65+
:caption: System tests
66+
67+
System Tests <_api/tests/system/apache/impala/index>
68+
69+
6170
.. THE REMAINDER OF THE FILE IS AUTOMATICALLY GENERATED. IT WILL BE OVERWRITTEN AT RELEASE TIME!
6271
6372

0 commit comments

Comments
 (0)