Connecting to TabPy from Prep Builder
ü Open
Tableau Prep Builder
ü Open the
Superstore Sample flow
ü Click
the plus icon to the right of the "Clean 2" step
Select Add Script
ü In the Settings panel at the left edge of the Prep builder UI, Select "Tableau Python (TabPy) Server" as the Connection type
Click "Connect to Tableau Python (TabPy) Server"
ü Enter
"localhost" as the server and port 9004
Write the Python script
ü Open
Visual Studio Code
Exercise
scenario: Sales management would like to gain insight on the revenue effect if
discounting. To support this, we will use Python to add a field to our flow
called "non-discounted sales".
In
this context, our Python script must be written with a function that can be
called from Prep Builder. We choose the name of this function. It will accept a
structure called a dataframe, and will return a dataframe.
ü Create a
new file called nds.ps (nds = non-discounted sales).
First
we'll simply print the dataframe to get a look at what our function will
receive from Prep Builder.
ü Add the
following Python code to nds.py:
def nds(df):
print(df)
return(df)
ü Save the
file to your Documents directory. (The location does not matter as long as you
can find it later.)
ü In the Prep
Builder configuration panel for the script added above, browse to the nds.py
file to connect this script step to your new Python script.
ü Look at
the command prompt running TabPy. You should see the results of the print
statement.
ü Let's
add our calculation. Replace
the print(df) statement with the following:
df['Non Discounted Sales'] = df['Sales'] / (1-df['Discount'])
This
line creates a new column in the dataframe called "Non Discounted
Sales".
ü Be sure
the two leading spaces are included. The Python specification requires the
contents of a function to be indented.
If
the schema of the returned dataframe does not exactly match that of the calling
dataframe, we need to include an additional function called
"get_output_schema".
In
this case we will return our new "Non Discounted Sales" column along
with the "Row ID" column so that we can join this column back into
the flow.
ü Add the
following to nds.ps:
def get_output_schema():
return pd.DataFrame({
'Row ID':
prep_int(),
'Non Discounted
Sales': prep_decimal()
})
ü Save the
file.
ü In Prep
Builder, in the flow, click off of the script step icon then back on it. This
will cause Prep Builder to re-execute the Python script.
You
should see the result of the script in Prep Builder.
Join the result of the script to the flow before the "Create Superstore Sales.tde" output step by dragging the “Clean 2” step icon over the “Non Discounted Sales” script step icon, and dropping it on the “Join” area that appears