transform_concat_columns
TransformConcatColumnsAction
¶
Bases: PipelineAction
Concatenates the specified columns in the given DataFrame.
Example
Source code in src/cloe_nessy/pipeline/actions/transform_concat_columns.py
run(context, *, name='', columns=None, separator=None, **_)
¶
Concatenates the specified columns in the given DataFrame.
Warning
Null Handling Behavior¶
The behavior of null handling differs based on whether a separator is provided:
- When
separatoris specified: The function uses Spark'sconcat_ws, which ignoresNULLvalues. In this case,NULLvalues are treated as empty strings ("") and are excluded from the final concatenated result. - When
separatoris not specified: The function defaults to using Spark'sconcat, which returnsNULLif any of the concatenated values isNULL. This means the presence of aNULLin any input will make the entire outputNULL.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
PipelineContext
|
The context in which this Action is executed. |
required |
name
|
str
|
The name of the new concatenated column. |
''
|
columns
|
list[str] | None
|
A list of columns to be concatenated. |
None
|
separator
|
str | None
|
The separator used between concatenated column values. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If no name is provided. |
ValueError
|
If no columns are provided. |
ValueError
|
If the data from context is None. |
ValueError
|
If 'columns' is not a list. |
Returns:
| Type | Description |
|---|---|
PipelineContext
|
The context after the execution of this Action, containing the DataFrame with the concatenated column. |