Creating a custom recipe using our Advanced Editor, requires a specific syntaxis to call the steps you can use to create your projects.

You can reach the Advanced Editor in three ways:

1.- When you are in the Custom Recipe Creator and a any point click on Open Advanced Editor

2.- When you are creating a project and click on Configure Recipe

3.- When you are creating a project from the dataset preview, select a Recipe and go to Editor

HOW TO EDIT A RECIPE:

In a recipe that is not ready to be executed, you will not be able to click on execute. When the recipe is ready, the EXECUTE button will activate:

When you start typing, the editor will suggest names of steps matching your typing.

When typing inside a step, you can get suggestion of the column names of a dataset by typing the name of your dataset, followed by "." (eg. ds.)

A subset of columns of the dataset can be selected using the ds[["column1", "column2", ...]] syntax. With this syntax, you can also subset a dataset excluding specific columns using ds[!["column1", "column2", ...]]

To get column suggestions in this syntax, you can type ds["] and press Ctrl+Space to get suggestions on column names.

HOW TO USE STEPS IN A RECIPE:

Steps are functions that form the Recipe. They can have the following uses:

  1. Transform the original dataset:Prepare Data: filter rows, wrangle columns or aggregate your datasetHydrate Data: add new columns with data from the outside world, calling our built-in models or third-party APIs
  2. Create and Analyse the Networks: choose the definition of your similarity function to create the network
  3. Set the visualization options of the graph

Minimal recipe

The minimal recipe for Graphext to create a project may not contain any steps the first section (Transform the original dataset) or the third section (Set the visualization options of the graph), but must include at least the following steps of section two:

# Create network links calculating similarity between multidimensional and multitype documents.

link_similar_rows(ds[[ "price", "value"]]) -> (links)

# Compute a force-directed graph layout with a fast forceAtlas2 implementation.

layout_network(links, {"gravity": 0.03, "avoidHubs": true, "scalingRatio": 3}) -> (ds.x, ds.y)

# Configures the column that is displayed as the title of the node

cluster_network(links) -> (ds._cluster)

# Export data for use in N5 (nodes.json, links.json, columns.json)

data_export(links ,ds) 

To customize more our Recipe, more steps can be added to transform the dataset before creating the network. Let's see how:

1. Transform the original dataset:

In a Recipe, we first execute the data preparation and hydration steps.

Steps are executed in order, so a specific step will have available all columns in the dataset created above them. In the following example, infer_sentiment uses the column ds.lang  that was created in the step above:

# Detect the languages used in a specific text column

infer_language(ds.caption) -> (ds.lang)

# Parse text and calculate the overall positive<->negative sentiment polarity (in [-1, 1]).

infer_sentiment(ds.caption ,ds.lang) -> (ds.sentiment)

Some steps may create a whole new dataset as an output, like the filter steps. When using one of these steps, the output dataset must have a new name, different to the input dataset. The following steps must refer to the new dataset, as in the following example:

# Filter rows where column matches explicit values.
filter_values(ds, {"column": "salary", "values": ["low", "high"]}) -> (ds_filtered)

2. Create and Analyse the Networks:

Network creation steps define the similarity function to create the network. The input for these functions are datasets formed by rows and columns, and the output are the links (source, target and weight columns) between similar rows. Want to learn more about networks? You can start here

Choose the most appropriate step to create your network, if you have a dataset that does not have sources and targets, you will use link_similar_rows. If you have a dataset that already has columns with sources and target, you will use link_rows_by_id. For more complex analysis such as text networks with embeddings, please refer to the documentation

Once we have created the links we call the layout functions to create our network from the links:

# Create network links calculating similarity between multidimensional and multitype documents.

link_similar_rows(ds[[ "author_name", "lang"]]) -> (links)

# Compute a force-directed graph layout with a fast forceAtlas2 implementation.

layout_network(links, {"gravity": 0.03, "avoidHubs": true, "scalingRatio": 3}) -> (ds.x, ds.y)

# Export data for use in N5 (nodes.json, links.json, columns.json)
data_export(links, ds)

If you want to modify the default parameters of the links and  layout, you can find it how to do it in those sections

3. Set the visualisation options of the graph

Finally, we can set visualisation options in the interface in Graphext. You can see the options in the documentation

Did this answer your question?