transform_json_normalize
TransformJsonNormalize
¶
Bases: PipelineAction
Normalizes and flattens the DataFrame by exploding array columns and flattening struct columns.
The method performs recursive normalization on the DataFrame present in the context, ensuring that the order of columns is retained and new columns created by flattening structs are appended after existing columns.
Example
Example Input Data:| id | name | coordinates | attributes |
|---|---|---|---|
| 1 | Alice | [10.0, 20.0] | {"age": 30, "city": "NY"} |
| 2 | Bob | [30.0, 40.0] | {"age": 25, "city": "LA"} |
Example Output Data:
| id | name | coordinates | attributes_age | attributes_city |
|---|---|---|---|---|
| 1 | Alice | [10.0, 20.0] | 30 | NY |
| 2 | Bob | [30.0, 40.0] | 25 | LA |
Source code in src/cloe_nessy/pipeline/actions/transform_json_normalize.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
run(context, *, exclude_columns=None, **_)
¶
Executes the normalization process on the DataFrame present in the context.
Please note that columns retain their relative order during the normalization process, and new columns created by flattening structs are appended after the existing columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
PipelineContext
|
The pipeline context that contains the DataFrame to be normalized. |
required |
exclude_columns
|
list[str] | None
|
A list of column names to exclude from the normalization process. These columns will not be exploded or flattened. |
None
|
**_
|
Any
|
Additional keyword arguments (not used). |
{}
|
Returns:
| Type | Description |
|---|---|
PipelineContext
|
A new pipeline context with the normalized DataFrame. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the DataFrame in the context is |