azureBlobStorage Table Function
Provides a table-like interface to select/insert files in Azure Blob Storage. This table function is similar to the s3 function.
Syntax
- Connection string
- Storage account URL
- Named collection
Credentials are embedded in the connection string, so no separate account_name/account_key is needed:
Requires account_name and account_key as separate arguments:
See Named Collections below for the full list of supported keys:
Arguments
| Argument | Description |
|---|---|
connection_string | A connection string that includes embedded credentials (account name + account key or SAS token). When using this form, account_name and account_key should not be passed separately. See Configure a connection string. |
storage_account_url | The storage account endpoint URL, e.g. https://myaccount.blob.core.windows.net/. When using this form, you must also pass account_name and account_key. |
container_name | Container name. |
blobpath | File path. Supports the following wildcards in read-only mode: *, **, ?, {abc,def} and {N..M} where N, M — numbers, 'abc', 'def' — strings. |
account_name | Storage account name. Required when using storage_account_url without SAS; must not be passed when using connection_string. |
account_key | Storage account key. Required when using storage_account_url without SAS; must not be passed when using connection_string. |
format | The format of the file. |
compression | Supported values: none, gzip/gz, brotli/br, xz/LZMA, zstd/zst. By default, it will autodetect compression by file extension (same as setting to auto). |
structure | Structure of the table. Format 'column1_name column1_type, column2_name column2_type, ...'. |
partition_strategy | Optional. Supported values: WILDCARD or HIVE. WILDCARD requires a {_partition_id} in the path, which is replaced with the partition key. HIVE does not allow wildcards, assumes the path is the table root, and generates Hive-style partitioned directories with Snowflake IDs as filenames and the file format as the extension. Defaults to WILDCARD. |
partition_columns_in_data_file | Optional. Only used with HIVE partition strategy. Tells ClickHouse whether to expect partition columns to be written in the data file. Defaults false. |
extra_credentials | Use client_id and tenant_id for authentication. If extra_credentials are provided, they are given priority over account_name and account_key. |
Named Collections
Arguments can also be passed using named collections. In this case the following keys are supported:
| Key | Required | Description |
|---|---|---|
container | Yes | Container name. Corresponds to the positional argument container_name. |
blob_path | Yes | File path (with optional wildcards). Corresponds to the positional argument blobpath. |
connection_string | No* | Connection string with embedded credentials. *Either connection_string or storage_account_url must be provided. |
storage_account_url | No* | Storage account endpoint URL. *Either connection_string or storage_account_url must be provided. |
account_name | No | Required when using storage_account_url |
account_key | No | Required when using storage_account_url |
format | No | File format. |
compression | No | Compression type. |
structure | No | Table structure. |
client_id | No | Client ID for authentication. |
tenant_id | No | Tenant ID for authentication. |
Named collection key names differ from positional function argument names: container (not container_name) and blob_path (not blobpath).
Example:
You can also override named collection values at query time:
Returned value
A table with the specified structure for reading or writing data in the specified file.
Examples
Reading with storage_account_url form
Reading with connection_string form
Writing with partitions
Then read back a specific partition:
Virtual Columns
_path— Path to the file. Type:LowCardinality(String)._file— Name of the file. Type:LowCardinality(String)._size— Size of the file in bytes. Type:Nullable(UInt64). If the file size is unknown, the value isNULL._time— Last modified time of the file. Type:Nullable(DateTime). If the time is unknown, the value isNULL.
Partitioned Write
Partition Strategy
Supported for INSERT queries only.
WILDCARD (default): Replaces the {_partition_id} wildcard in the file path with the actual partition key.
HIVE implements hive style partitioning for reads & writes. It generates files using the following format: <prefix>/<key1=val1/key2=val2...>/<snowflakeid>.<toLower(file_format)>.
Example of HIVE partition strategy
use_hive_partitioning setting
This is a hint for ClickHouse to parse hive style partitioned files upon reading time. It has no effect on writing. For symmetrical reads and writes, use the partition_strategy argument.
When setting use_hive_partitioning is set to 1, ClickHouse will detect Hive-style partitioning in the path (/name=value/) and will allow to use partition columns as virtual columns in the query. These virtual columns will have the same names as in the partitioned path, but starting with _.
Example
Use virtual column, created with Hive-style partitioning
Using Shared Access Signatures (SAS)
A Shared Access Signature (SAS) is a URI that grants restricted access to an Azure Storage container or file. Use it to provide time-limited access to storage account resources without sharing your storage account key. More details here.
The azureBlobStorage function supports Shared Access Signatures (SAS).
A Blob SAS token contains all the information needed to authenticate the request, including the target blob, permissions, and validity period. To construct a blob URL, append the SAS token to the blob service endpoint. For example, if the endpoint is https://clickhousedocstest.blob.core.windows.net/, the request becomes:
Alternatively, users can use the generated Blob SAS URL: