Wednesday 21 October 2020

Reuse logic/code in Databricks Notebook

Databricks is great tool and it make processing of large amount of data very easy. While designing Databricks workflow, I came across the need to reuse the logic and business rules across many notebooks. I do not want to create jar or python wheel as it will create dependency on another tool. My team primarily being consist of data engineers from SQL and ETL background, I don’t not want them to learn new things and they were also least interested going out of Databricks Notebook. 

After research I came up with following solution to include the reusable logic from one notebook to another notebook.

%run ./pyclass

Let me explain you in detail. I have created one notebook with python class including all the reusable logic and included that class in another notebook using %run magic command. 
Once the class is included, I can simply create instance of the class and reuse it. In the example I created notebook name pyclass with class having two methods:

    1. msg: displays message
    2. fab_num : calculates Fibonacci numbers

Databricks-Include-Notebook-Class

In another notebook I created the instance of class and reused the logic. 

Databricks-Include-Notebook


Happy Coding!

Sunday 4 October 2020

ADF Copy Activity to Oracle Error-"Error tolerance exceeded. Bulk load operation terminated"

Recently I have have been copying data from Azure Data Factory Copy activity to Oracle (on-premise) using Self-Hosted IR. After few days starting getting below error. 

{     
"errorCode": "BadRequest",     
"message": "Operation on target CD_SQL_ORA failed: Failure happened on 'Sink' side. ErrorCode=UserErrorOdbcOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ERROR [HY000] [Microsoft][ODBC Oracle Wire Protocol driver]Error tolerance exceeded. Bulk load operation terminated. 
Error in parameter 4.\r\nERROR [HY000] [Microsoft][ODBC Oracle Wire Protocol driver][Oracle]Invalid Position. Error in row 10001. Error in parameter 4.,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,''Type=Microsoft.DataTransfer.ClientLibrary.Odbc.Exceptions.OdbcException,Message=ERROR [HY000] [Microsoft][ODBC Oracle Wire Protocol driver]Error tolerance exceeded. Bulk load operation terminated. Error in parameter 4.\r\nERROR [HY000] [Microsoft][ODBC Oracle Wire Protocol driver][Oracle]Invalid Position. Error in row 10001. Error in parameter 4.,Source=msora28.dll,'",     "failureType": "UserError",     "target": "PL_COPY_CUST",     "details": "" }

After googling about the error and even contacting Oracle DBA there was not solution found. Contacted Microsoft support as last option. After few trail and error it worked out after changing 'writeBatchSize' from 10,000 (default value) to 50,000. 

Again not sure of the reason and there is not much information around it in the documentation. Please let me know if anyone know exact cause of the issue.


Happy Coding!