|
1 | 1 | --- |
2 | | -title: Enable BigQuery |
| 2 | +title: Enable Google BigQuery |
3 | 3 | pcx_content_type: how-to |
4 | 4 | sidebar: |
5 | 5 | order: 62 |
6 | 6 | head: |
7 | 7 | - tag: title |
8 | | - content: Enable Logpush to BigQuery |
9 | | - |
| 8 | + content: Enable Logpush to Google BigQuery |
10 | 9 | --- |
11 | 10 |
|
12 | | -Configure Logpush to send batches of Cloudflare logs to BigQuery. |
| 11 | +import { Render } from "~/components"; |
| 12 | + |
| 13 | +Cloudflare Logpush supports pushing logs directly to Google BigQuery (using Legacy Streaming API) via the Cloudflare API. |
| 14 | + |
| 15 | +## Create and get access to a BigQuery table |
| 16 | + |
| 17 | +Cloudflare uses Google Application Credentials provided in Logpush job `destination_conf` to gain write access to your bucket. The provided service account needs a write permission for the bucket. |
| 18 | + |
| 19 | +To enable Logpush to BigQuery: |
| 20 | + |
| 21 | +1. Go to Google Cloud Console for your account. |
| 22 | +2. Go to **IAM & Admin** > **Service Accounts**, and create a new service account. |
| 23 | +3. Add **BigQuery Data Editor** role under Permissions. |
| 24 | + - At minimum, it requires `bigquery.tables.updateData` permission. |
| 25 | +4. Add a key under Keys: |
| 26 | + - Click **Add key**. |
| 27 | + - Click **Create new key**. |
| 28 | + - Select Key type **JSON**. |
| 29 | + - Click **Create**. |
| 30 | + - Save the Application Credentials JSON file. You will need to use this when setting up a new Logpush job. |
| 31 | +5. Go to BigQuery, and create a dataset and table. Refer to [instructions from BigQuery](https://cloud.google.com/bigquery/docs/tables). |
| 32 | + - For example, using `schema.json` and `bq` command: |
| 33 | + |
| 34 | +```bash |
| 35 | +gcloud auth activate-service-account --key-file=${KEY_FILE} |
| 36 | + |
| 37 | +PROJECT_ID=<PROJECT_ID> |
| 38 | +DATASET_ID=<DATASET_ID> |
| 39 | +TABLE_ID=<TABLE_ID> |
| 40 | + |
| 41 | +bq mk --table "${PROJECT_ID}:${DATASET_ID}.${TABLE_ID}" schema.json |
| 42 | +``` |
| 43 | + |
| 44 | +## Manage via API |
| 45 | + |
| 46 | +To set up a BigQuery Logpush job: |
| 47 | + |
| 48 | +1. Create a job with the appropriate endpoint URL and authentication parameters. |
| 49 | +2. Enable the job to begin pushing logs. |
| 50 | + |
| 51 | +:::note |
| 52 | +Unlike configuring Logpush jobs for AWS S3, GCS, or Azure, there is no ownership challenge when configuring Logpush to SentinelOne. |
| 53 | +::: |
| 54 | + |
| 55 | +<Render file="permissions-log-share" product="logs" /> |
| 56 | + |
| 57 | +### 1. Create a job |
| 58 | + |
| 59 | +To create a job, make a `POST` request to the Logpush jobs endpoint with the following fields: |
| 60 | + |
| 61 | +- **name** (optional) - Use your domain name as the job name. |
| 62 | +- **destination_conf** - A log destination consisting of a reference to BigQuery table and credentials in the string format below. |
| 63 | + - **ENCODED_VALUE**: The encoded value of Application Credentials JSON as `credentials`, either base64-encoded with `base64:` prefix, or URL-encoded with `url:` prefix. |
| 64 | + |
| 65 | +```bassh |
| 66 | +"bq://projects/<PROJECT_ID>/datasets/<DATASET_ID>/tables/<TABLE_ID>?credentials=<ENCODED_VALUE>" |
| 67 | +``` |
| 68 | + |
| 69 | +- **dataset** - The category of logs you want to receive. Refer to [Datasets](/logs/logpush/logpush-job/datasets/) for the full list of supported datasets. |
| 70 | +- **output_options** (optional) - To configure fields, sample rate, and timestamp format, refer to [Log Output Options](/logs/logpush/logpush-job/log-output-options/). For timestamp, Cloudflare recommends using `timestamps=rfc3339`. |
| 71 | + - When including custom formatting options, such as `output_type`, or any prefix / suffix / delimiter / template optopns, make sure to set `stringify_object` true, too, otherwise fields with `object` type may not be serialized in the format compatible to BiqQuery Legacy Streaming API. |
| 72 | + |
| 73 | +Example request using cURL: |
| 74 | + |
| 75 | +<APIRequest |
| 76 | + path="/zones/{zone_id}/logpush/jobs" |
| 77 | + method="POST" |
| 78 | + json={{ |
| 79 | + name: "<DOMAIN_NAME>", |
| 80 | + destination_conf: |
| 81 | + "destination_conf": "bq://projects/<PROJECT_ID>/datasets/<DATASET_ID>/tables/<TABLE_ID>?credentials=<ENCODED_VALUE>", |
| 82 | + output_options: { |
| 83 | + field_names: [ |
| 84 | + "ClientIP", |
| 85 | + "ClientRequestHost", |
| 86 | + "ClientRequestMethod", |
| 87 | + "ClientRequestURI", |
| 88 | + "EdgeEndTimestamp", |
| 89 | + "EdgeResponseBytes", |
| 90 | + "EdgeResponseStatus", |
| 91 | + "EdgeStartTimestamp", |
| 92 | + "RayID", |
| 93 | + ], |
| 94 | + timestamp_format: "rfc3339", |
| 95 | + }, |
| 96 | + dataset: "http_requests", |
| 97 | + max_upload_bytes: 5000000, |
| 98 | + max_upload_records: 50000, |
| 99 | + }} |
| 100 | +/> |
| 101 | + |
| 102 | +Response: |
| 103 | + |
| 104 | +```json |
| 105 | +{ |
| 106 | + "errors": [], |
| 107 | + "messages": [], |
| 108 | + "result": { |
| 109 | + "id": <JOB_ID>, |
| 110 | + "dataset": "http_requests", |
| 111 | + "max_upload_bytes": 5000000, |
| 112 | + "max_upload_records": 50000, |
| 113 | + "enabled": false, |
| 114 | + "name": "<DOMAIN_NAME>", |
| 115 | + "output_options": { |
| 116 | + "field_names": ["ClientIP", "ClientRequestHost", "ClientRequestMethod", "ClientRequestURI", "EdgeEndTimestamp", "EdgeResponseBytes", "EdgeResponseStatus" ,"EdgeStartTimestamp", "RayID"], |
| 117 | + "timestamp_format": "rfc3339" |
| 118 | + }, |
| 119 | + "destination_conf": "bq://projects/<PROJECT_ID>/datasets/<DATASET_ID>/tables/<TABLE_ID>?credentials=<ENCODED_VALUE>", |
| 120 | + "last_complete": null, |
| 121 | + "last_error": null, |
| 122 | + "error_message": null |
| 123 | + }, |
| 124 | + "success": true |
| 125 | +} |
| 126 | +``` |
| 127 | + |
| 128 | +This will make a test upload with an empty content to verify that Logpush can upload, and you may see a row with empty data. |
| 129 | + |
| 130 | +### 2. Enable (update) a job |
| 131 | + |
| 132 | +To enable a job, make a `PUT` request to the Logpush jobs endpoint. You will use the job ID returned from the previous step in the URL, and send `{"enabled": true}` in the request body. |
| 133 | + |
| 134 | +Example request using cURL: |
| 135 | + |
| 136 | +<APIRequest |
| 137 | + method="PUT" |
| 138 | + path="/zones/{zone_id}/logpush/jobs/{job_id}" |
| 139 | + json={{ |
| 140 | + enabled: true, |
| 141 | + }} |
| 142 | +/> |
| 143 | + |
| 144 | +Response: |
| 145 | + |
| 146 | +```json |
| 147 | +{ |
| 148 | + "errors": [], |
| 149 | + "messages": [], |
| 150 | + "result": { |
| 151 | + "id": <JOB_ID>, |
| 152 | + "dataset": "http_requests", |
| 153 | + "max_upload_bytes": 5000000, |
| 154 | + "max_upload_records": 50000, |
| 155 | + "enabled": true, |
| 156 | + "name": "<DOMAIN_NAME>", |
| 157 | + "output_options": { |
| 158 | + "field_names": ["ClientIP", "ClientRequestHost", "ClientRequestMethod", "ClientRequestURI", "EdgeEndTimestamp", "EdgeResponseBytes", "EdgeResponseStatus" ,"EdgeStartTimestamp", "RayID"], |
| 159 | + "timestamp_format": "rfc3339" |
| 160 | + }, |
| 161 | + "destination_conf": "bq://projects/<PROJECT_ID>/datasets/<DATASET_ID>/tables/<TABLE_ID>?credentials=<ENCODED_VALUE>", |
| 162 | + "last_complete": null, |
| 163 | + "last_error": null, |
| 164 | + "error_message": null |
| 165 | + }, |
| 166 | + "success": true |
| 167 | +} |
| 168 | +``` |
| 169 | + |
| 170 | +## Limitations |
| 171 | + |
| 172 | +Note the following default quota and limits, as described in the [BigQuery documentation](https://docs.cloud.google.com/bigquery/quotas#streaming_inserts). |
| 173 | + |
| 174 | +Send your logs to your Datadog platform over HTTP. Limits per HTTP request are the following: |
| 175 | + |
| 176 | +- Maximum HTTP request size (uncompressed, may include headers): 10 MB |
| 177 | +- Maximum row size: 10 MB |
| 178 | +- Maximum rows per request size: 50,000 rows. |
| 179 | + |
| 180 | +These are default quota / limit, and you should adjust the Logpush jobs to match the limit, and/or request Google to increase them when needed. |
| 181 | + |
| 182 | +## Google Cloud Storage integration |
| 183 | + |
| 184 | +Cloudflare Logpush supports pushing logs to [Google Cloud Storage](/logs/logpush/logpush-job/enable-destinations/google-cloud-storage/). |
13 | 185 |
|
14 | | -BigQuery supports loading up to 1,500 jobs per table per day (including failures) with up to 10 million files in each load. That means you can load into BigQuery once per minute and include up to 10 million files in a load. For more information, refer to BigQuery's quotas for load jobs. |
| 186 | +BigQuery supports loading up to 1,500 jobs per table per day (including failures) with up to 10 million files in each load. |
| 187 | +That means you can load into BigQuery once per minute and include up to 10 million files in a load. |
| 188 | +For more information, refer to BigQuery's quotas for load jobs. |
15 | 189 |
|
16 | 190 | Logpush delivers batches of logs as soon as possible, which means you could receive more than one batch of files per minute. Ensure your BigQuery job is configured to ingest files on a given time interval, like every minute, as opposed to when files are received. Ingesting files into BigQuery as each Logpush file is received could exhaust your BigQuery quota quickly. |
17 | 191 |
|
|
0 commit comments