Saturday, 18 September 2021

Using Python with Tableau Prep Builder

Connecting to TabPy from Prep Builder

ü  Open Tableau Prep Builder

ü  Open the Superstore Sample flow

ü  Click the plus icon to the right of the "Clean 2" step

Select Add Script


ü  In the Settings panel at the left edge of the Prep builder UI, Select "Tableau Python (TabPy) Server" as the Connection type

    Click "Connect to Tableau Python (TabPy) Server"



ü  Enter "localhost" as the server and port 9004

Write the Python script

ü  Open Visual Studio Code

Exercise scenario: Sales management would like to gain insight on the revenue effect if discounting. To support this, we will use Python to add a field to our flow called "non-discounted sales".

In this context, our Python script must be written with a function that can be called from Prep Builder. We choose the name of this function. It will accept a structure called a dataframe, and will return a dataframe.

ü  Create a new file called nds.ps (nds = non-discounted sales).

First we'll simply print the dataframe to get a look at what our function will receive from Prep Builder.

ü  Add the following Python code to nds.py:

def nds(df):

  print(df)

  return(df)

ü  Save the file to your Documents directory. (The location does not matter as long as you can find it later.)

ü  In the Prep Builder configuration panel for the script added above, browse to the nds.py file to connect this script step to your new Python script.

ü  Look at the command prompt running TabPy. You should see the results of the print statement.



ü  Let's add our calculation. Replace the print(df) statement with the following:

  df['Non Discounted Sales'] = df['Sales'] / (1-df['Discount'])

This line creates a new column in the dataframe called "Non Discounted Sales".

ü  Be sure the two leading spaces are included. The Python specification requires the contents of a function to be indented.

If the schema of the returned dataframe does not exactly match that of the calling dataframe, we need to include an additional function called "get_output_schema".

In this case we will return our new "Non Discounted Sales" column along with the "Row ID" column so that we can join this column back into the flow.

ü  Add the following to nds.ps:

def get_output_schema():

return pd.DataFrame({

  'Row ID': prep_int(),

  'Non Discounted Sales': prep_decimal()

})

ü  Save the file.

ü  In Prep Builder, in the flow, click off of the script step icon then back on it. This will cause Prep Builder to re-execute the Python script.

You should see the result of the script in Prep Builder.



Join the result of the script to the flow before the "Create Superstore Sales.tde" output step by dragging the “Clean 2” step icon over the “Non Discounted Sales” script step icon, and dropping it on the “Join” area that appears


Thanks Guru's,
Prazwal Ks

DAX Time Intelligence Functions

   Thanks Guru's, Prazval.Ks