transform_with_column
Transform action to add or update a column using a SQL expression.
TransformWithColumnAction
¶
Bases: PipelineAction
Add or update a column in the DataFrame using a SQL expression.
This action uses PySpark's expr() function to evaluate SQL expressions and create or update columns in the DataFrame.
Examples:
Source code in src/cloe_nessy/pipeline/actions/transform_with_column.py
run(context, *, column_name='', expression='', **_)
¶
Add or update a column using a SQL expression.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
PipelineContext
|
The pipeline context containing the DataFrame |
required |
column_name
|
str
|
Name of the column to create or update |
''
|
expression
|
str
|
SQL expression to evaluate for the column value |
''
|
**_
|
Any
|
Additional unused keyword arguments |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
PipelineContext |
PipelineContext
|
Updated context with the modified DataFrame |
Raises:
| Type | Description |
|---|---|
ValueError
|
If column_name is not provided |
ValueError
|
If expression is not provided |
ValueError
|
If context.data is None |
Exception
|
If the SQL expression is invalid |