Databricks info

Databricks is a spark in the cloud system, which is gaining a lot of popularity lately.

Provided by Microsoft with Azure subscription, it uses spark for its workings and Azure storage. Azure storage comes in dbfs and blob storage varieties, with GenLake 2 system being the derived modern component encompassing the both storage systems

Testing and proof of concept work is done using notebooks, sporting scala, python and sql language options. The scala jars or python wheels and eggs can be uploaded and executed in the Databricks using REST API calls by curl, or some interfaces which encapsulate REST API calls. For the application to run a cluster needs to be created and started. This can be done in advance or as part of the REST API call.

General description

working with notebooks

  1. Sign in with Azure ID
  2. To create a new notebook, select New from the navigation menu, and then select notebook

Uploading a jar

After creating a jar on prem and its testing it can be uploaded to Databricks

Java Storage SDK

to download the files from Azure there is an Azure Storage SDK available

using REST API to run an application

the calls to REST API could be used to run the applications and creating the clusters