Error Handling in Script Engine
Error Row Handling
During the script run, different errors may occur.Error_row object makes it possible to capture the failed row at the time of processing in the script in any of the following ways-
> error_rows.capture(failed_row, "error message")
> error_rows.capture(failed_row, "error message", "debug message")
> error_rows.capture(http_status_code, row, "error message")
> error_rows.capture(http_status_code, row, "error message", "debug message")
HTTP Status Code
HTTP status code is a 3-digit integer response from the server to the browser-side request. The status codes are divided into 5 parts, as follows-
Code | Description |
---|---|
1XX | It means the request has been accepted and the process is continuing |
2XX | It means the request is successfully accepted, received, and understood |
3XX | It means further actions need to be taken to accept the request |
4XX | It means the request contains incorrect syntax |
5XX | It means the server is failed to fulfill the request |
Examples to capture 4XX and 5XX error status codes
In the following two examples, we have used Error_row
object to capture the request that has served 5XX and 4XX error-
error_rows.capture(500, row, "message", "debug message")
error_rows.capture(400, row, "message")
Error Handling Mechanism in Test Run Pipeline
To avoid data loss or memory issues, data is inserted into a pipeline in the form of batches. However, not all rows are cleaned and contain accurate values. As a result, errors often get occur when we write a script due to missing or incorrect values that disrupt the pipeline flow. In the script engine, there is the most powerful error handling mechanism you can use to handle the pipeline errors during the test run time.
Different error handling mechanisms lead to different statuses for the Pipeline.
- Fail Pipeline
Fail Pipeline is a bit precarious to choose because when the previous row batch gets failed, the Fail Pipeline skips the next to succeed row baches and will report the entire Pipeline as a failure.
2. Fail Current Batch
Fail Current Batch will skip the error row batch, pass the error-free batches, and save the entire Pipeline from getting failed.
Error Handle Mechanism | Description |
---|---|
Fail Pipeline | One row batch fails; overall pipeline fails |
Fail Current Batch | One row batch fails doesn't impact on the overall pipeline. The pipeline would report the success. |
Suppose you have four-row batches of data to be synced to a data warehouse. The first three-row batches are accurate, and the last batch contains incorrect values.
In the case of Fail Current Batch, the first three rows get executed successfully, and the last will fail, but it doesn't impact the entire pipeline.
If you choose Fail Pipeline as an error handling mechanism, even if there is an error in the last row, it interrupts the entire pipeline.
Example 1:
In the following example, we select Fail Current Batch to save our entire pipeline from failure. Then, we write a script to get a Profit Per Unit of Total Profit and Unit Sold. In the input area, there are three rows, in which the first and third rows of the Total Profit column contain string values, but it has to be in integer. As you know, we cannot apply any calculative operations on the string value. So, when the script runs, it will give us an error and pass the second row because its Total Profit contains an integer value. In the sample example, if we choose Fail Pipeline as an error handling mechanism, the entire pipeline will fail.
Example 2:
The simplest example to understand error handling-
Assume there's a small Food Sales data of 10 rows that contain columns like Total Revenue, Unit Price, Sales Channels, and so on. The column name Sales Channel has an Offline value in some rows. We write a script that states- raise an exception if Sales Channel is Offline. and select Current Fail Pipeline as an error handling mechanism.
def process(rows, context):
processed_rows = []
for row_no, row in enumerate(rows):
try:
if row['Sales Channel'] == 'Offline':
raise Exception("Sales Channel is Offline")
else:
processed_rows.append(row)
except Exception as e:
error_rows.capture(400, row, str(e), 'Correct Sales Channel')
return processed_rows
When we test the script, the rows that have Sales Channel = Offline gets failed and rest of them get passed successfully.
To identify the root cause of the error, you can either click on the Download Error File or click the Error text located in the Test Result window.
If you download the Bolt with exception, you will see how many rows are failed and successfully executed.
Any Question? 🤓
We are always an email away to help you resolve your queries. If you need any help, write to us at - 📧 support@boltic.io