Index
Pipeline
¶
Bases: LoggerMixin
A Pipeline represents the logical unit of one ETL process.
This class manages a directed acyclic graph (DAG) of steps, ensuring that each step is executed in the correct order based on dependencies.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the pipeline. |
steps |
OrderedDict[str, PipelineStep]
|
An ordered dictionary of PipelineSteps that are part of the pipeline. |
Source code in src/cloe_nessy/pipeline/pipeline.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 | |
graph
property
¶
Get the pipeline graph.
plot_graph(save_path=None)
¶
Generates a visual representation of the pipeline steps and their dependencies.
This method uses the PipelinePlottingService to create a plot of the pipeline graph. If a save path is specified, the plot will be saved to that location; otherwise, it will be displayed interactively.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
save_path
|
str | None
|
Optional; the file path where the plot should be saved. If None, the plot is displayed interactively. |
None
|
Source code in src/cloe_nessy/pipeline/pipeline.py
run(until=None)
¶
Executes the pipeline steps in the correct order based on dependencies.
This method creates a directed acyclic graph (DAG) of the pipeline steps and, if specified, trims the graph to only include steps up to the given 'until' step (excluding: the step specified as 'until' will not be executed). It then concurrently executes steps with no pending dependencies using a ThreadPoolExecutor, ensuring that all steps are run in order. If a cyclic dependency is detected, or if any step fails during execution, the method raises an error.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
until
|
str | None
|
Optional; the identifier of the step up to which the pipeline should be executed. |
None
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If a cyclic dependency is detected. |
Exception
|
Propagates any error raised during the execution of a step. |
Source code in src/cloe_nessy/pipeline/pipeline.py
PipelineAction
¶
Bases: ABC, LoggerMixin
Models the operation being executed against an Input.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the action. |
Source code in src/cloe_nessy/pipeline/pipeline_action.py
__init__(tabular_logger=None)
¶
Initializes the PipelineAction object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tabular_logger
|
Logger | None
|
The tabular logger to use for dependency injection. |
None
|
Source code in src/cloe_nessy/pipeline/pipeline_action.py
PipelineBuilder
¶
Fluent API builder for creating Nessy pipelines programmatically.
This class provides a chainable interface for building pipelines using method calls instead of YAML configuration. It dynamically creates methods for all available PipelineActions.
Example
Source code in src/cloe_nessy/pipeline/pipeline_builder.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 | |
__getattr__(name)
¶
Dynamically create methods for pipeline actions.
This method is called when an attribute that doesn't exist is accessed.
It converts method calls like read_files() into the corresponding PipelineAction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The method name being called. |
required |
Returns:
| Type | Description |
|---|---|
Callable[..., PipelineBuilder]
|
A callable that adds the corresponding pipeline step. |
Raises:
| Type | Description |
|---|---|
AttributeError
|
If the method name doesn't correspond to a known action. |
Source code in src/cloe_nessy/pipeline/pipeline_builder.py
__init__(name)
¶
Initialize the pipeline builder.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the pipeline. |
required |
Source code in src/cloe_nessy/pipeline/pipeline_builder.py
build()
¶
Build the pipeline from the configured steps.
Returns:
| Type | Description |
|---|---|
Pipeline
|
A Pipeline object ready for execution. |
run()
¶
Build and run the pipeline immediately.
This is a convenience method equivalent to calling build().run().
PipelineContext
¶
A class that models the context of a pipeline.
The context consists of Table Metadata (the Table definition) and the actual data as a DataFrame.
Attributes:
| Name | Type | Description |
|---|---|---|
table_metadata |
The Nessy-Table definition. |
|
data |
The data of the context. |
|
runtime_info |
Additional runtime information, e.g. streaming status. |
|
status |
The status of the context. Can be "initialized", "successful" or "failed". |
Note
This is not a pydantic class, because Fabric does not support the type ConnectDataFrame.
Source code in src/cloe_nessy/pipeline/pipeline_context.py
from_existing(table_metadata=None, data=None, runtime_info=None)
¶
Creates a new PipelineContext from an existing one.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table_metadata
|
Table | None
|
The metadata of the new context. |
None
|
data
|
DataFrame | None
|
The data of the new context. |
None
|
runtime_info
|
dict[str, Any] | None
|
The runtime_info of the new context. |
None
|
Returns:
| Type | Description |
|---|---|
PipelineContext
|
The new PipelineContext. |
Source code in src/cloe_nessy/pipeline/pipeline_context.py
PipelineParsingService
¶
A service class that parses a YAML document or string into a Pipeline object.
Source code in src/cloe_nessy/pipeline/pipeline_parsing_service.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | |
parse(path=None, yaml_str=None)
staticmethod
¶
Reads the YAML from a given Path and returns a Pipeline object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | None
|
Path to the YAML document. |
None
|
yaml_str
|
str | None
|
A string that can be parsed in YAML format. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If neither 'path' nor 'yaml_str' has been provided. |
Returns:
| Name | Type | Description |
|---|---|---|
Pipeline |
Pipeline
|
The resulting Pipeline instance. |
Source code in src/cloe_nessy/pipeline/pipeline_parsing_service.py
register_pipeline_action(pipeline_action_class)
staticmethod
¶
Registers a custom pipeline action class.
Note
Registering an action enables the custom action to be used in the pipeline YAML definition. This is automatically called, when the PipelineParsingService is instantiated with (a list of) custom actions.
Source code in src/cloe_nessy/pipeline/pipeline_parsing_service.py
PipelineStep
dataclass
¶
A PipelineStep is a logical step within a Pipeline.
The step stores the PipelineContext and offers an interface to interact with the Steps DataFrame.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the step. |
action |
PipelineAction
|
The action to be executed. |
env |
dict[str, str]
|
The step environment variables. |
is_successor |
dict[str, str]
|
A boolean indicating if the step is a successor and takes the previous steps context. |
context |
PipelineContext
|
The context of the step. |
options |
dict[str, Any]
|
Additional options for the step |
_predecessors |
set[str]
|
A list of names of the steps that are predecessors to this step. |
_context_ref |
str | None
|
Reference to the previous steps context |
_table_metadata_ref |
str | None
|
Reference to the previous steps metadata |