Skip to main content

Plotly Dash is all you need

· 12 min read
Jonathan Nye
Engineering @ Tesla

… for enterprise analytics and reporting.1

*This post is inspired by DHH's merchants of complexity.

Over the years, I've seen many teams over-engineer their dashboards. In this post, I'll discuss my approach to making performant easy to develop and maintain dashboards.

Introduction

There are many tools available for creating dashboards. They range from easy to use code-free tools, to fully custom "solutions" built with significant investment.

The easy-to-use tools are great for simple reports with limited interactivity. They also don't require a development team.

On the latter end of the scale, the usual approach looks like a custom backend that interfaces with various data sources and then uses a custom React frontend. This is great if you have infinite time and resources with relaxed deadlines. But let's be real. These solutions often don't live up to their initial goals, and any changes that need to be made require input from the entire development team.

I've personally settled on something in the middle. Plotly Dash. It's a versatile and simple solution for creating highly interactive dashboards.

Problem

In the real world, businesses typically need to see a report on something promptly. Time is money after all, and not having visibility into a critical process costs you money each day. Businesses might also not know exactly what they need to see. They just know that they need to see something and something is better than nothing. You will have a lot of back and forth, with a lot of iteration until finally settling on the final solution. The key point is providing value along the way and not only once everything is complete.

Solution

Plotly Dash allows one to develop reports quickly and present them to your users within a few days, if not hours. One can iterate to provide something useful within a very short time frame. Version control and multiple environments can be set up so that any changes can easily be rolled back. To sum it up, you can move quickly and break things.

In addition:

  • Dash looks good enough with sensible defaults.
  • Custom components enable your app to follow your companies design guidelines
  • Highly performant, especially with caching

Dash allows you to build interactive dashboards quickly, at scale and provide benefits incrementally as you build.

While Dash requires someone that can program Python, and has some basic understanding of web development it is easy to learn and get started with. It can be as complex as needed and doesn't require a huge amount of boilerplate to get started. You can however use all your web development skills if you happen to have them. In my opinion, it is a huge benefit when one can have something that satisfies both the beginner and the expert, with both feeling challenged and productive.

Typical project layout

Let's get into how to structure your Dash project and go through a few common issues that you might encounter.

While a Plotly Dash project is simple to set up, it still requires some overhead. A typical (maybe slighly more advanced) layout might look something like this:

assets
components
db
config
pages
redis
- Dockerfile
- redis.conf
prometheus
schemas
tests
utils
.dockerignore
.gitignore
.pre-commit-config.yaml
.python-version
config.env
docker-compose.yml
Dockerfile
gunicorn.conf.py
main.py
odbcinst.ini
poetry.toml
poetry.lock
pyproject.toml
README.md
secrets.env (Don't add to git)

Configuration

For all my projects, I follow the Twelve Factor App principles.

Application config and secrets

For any Python application, the Pydantic Settings Management works wonderfully to load your config from environment variables and validate that your config is correct. This helps debug deployment issues and also helps document what should go into each environment variable.

Managing data sources and database connections

Dash allows you to use data from any source. Databases, APIs, CSV files - it doesn't matter.

In some applications I've developed, it's been necessary to connect to many different databases. This can get quite overwhelming, but a simple pattern can help manage all the connections.

Firstly, the database connection details are defined in your Pydantic Config. After the config has been initialised you can set up the various connections using connection strings. In this case I'm using SQLAlchemy which allows connection to many databases with a common interface.

To standardize the connection, create a simple Pydantic model.


from pydantic import BaseModel

class SQLAlchemyBind(BaseModel):
"""SQLAlchemy bind config.
This object is a generic representation of SQLAlchemy bind configs.
"""
url: str
isolation_level: str | None = None

In the config file, define all your database connection strings.

config = DashboardConfig(_env_file=["config.env", "secrets.env"], _env_file_encoding="utf-8") # type: ignore

config.sqlalchemy_binds = {
"mysql_db": SQLAlchemyBind(
url=f"mysql+pymysql://{config.mysql_user}:{config.mysql_pass.get_secret_value()}@{config.mysql_host}:{config.mysql_port}/{config.mysql_name}",
isolation_level="READ UNCOMMITTED",
),
"clickhouse_db": SQLAlchemyBind(
url=f"clickhouse://{config.clickhouse_username}:{config.clickhouse_secret.get_secret_value()}@{config.clickhouse_host}:{config.clickhouse_port}/{config.clickhouse_name}",
isolation_level=None,
),

To create and collect all the database sessions, create an init_sessions function which will store all the database sessions in a dictionary as a singleton.

You can then access a particular database session using database_sessions[bind].

In db/__init__.py:

database_sessions: dict[str, scoped_session] = {}

@contextmanager
def query_db(
query_path: str, bind: str, params: dict = {}
) -> Generator[list[dict[str, Any]], None, None]:
"""Executes the given query and returns the results.
Parameters
----------
query_path: str
Path to the query file
bind: str
sqlalchemy bind id
params: dict
Parameters to render in query
Yields
------
list[Dict[str, Any]]: query results
"""

try:
with open(query_path, "r") as f:
query = f.read()
except FileNotFoundError as e:
logger.error(f"Error opening query file {query_path}: {e}")
raise
except Exception as e:
logger.error(f"An unexpected error occurred while opening query file {query_path}: {e}")
raise

db_session = database_sessions[bind]
session = db_session()

try:
logger.debug(f"Query {bind}: {query_path} executed with params {params}")
results = session.execute(text(query), params=params).mappings().all()
# Convert SQLAlchemy Row to dict for compatibility
yield [dict(row) for row in results]
except Exception as e:
logger.error(f"An error occurred while executing query {query_path}: {e}")
raise
finally:
try:
db_session.close()
except Exception as e:
logger.error(f"An error occurred while closing the database session: {e}")

def init_sessions():
for bind in config.sqlalchemy_binds.keys():
engine = create_engine(
url=config.sqlalchemy_binds[bind].url,
isolation_level=config.sqlalchemy_binds[bind].isolation_level,
pool_recycle=3600,
pool_pre_ping=True,
)
session_factory = sessionmaker(bind=engine)
db_session = scoped_session(session_factory)
database_sessions[bind] = db_session

I personally put all queries into .sql files structured according to the database name. The queries can be linted and formatted with SQLFluff.

Then you can use the query using the template:

db/clickhouse.py

@cache.memoize(config.cache_timeout_seconds)
def get_ultimate_question(
parameter_one: str
) -> pl.DataFrame:

with get_result(
query_path="db/queries/clickhouse/some_amazing_query.sql",
bind="clickhouse",
params={
"parameter_one": parameter_one,
},
) as results:
ultimate_question = pl.DataFrame(
results,
schema={
"name": pl.Utf8,
"date": pl.Date,
"hour": pl.Int8,
"answer": pl.Int16,
},
orient="row",
)

# Any other logic you need in Python...

return ultimate_question

Caching and background callbacks

Caching is important for any Plotly Dash app to ensure good performance. It limits recomputing things unnecessarily.

Because Plotly Dash is based on Flask, one can just use the excellent Flask Caching library.

After setting it up, function calls are cached with the @cache.memoize decorator. In practice, this usually means adding the decorator to any database query functions to cache the database queries but any function can be cached using this pattern.

The cached data can then be used in multiple callbacks, without having to make repeat database queries.

Caching goes hand in hand with using background callbacks to execute queries that take longer than a few seconds. Functions that query the database are called first in the background callback. Once the background callback is complete a subsequent callback can call that same function and get the result almost immediately.

This leads onto the next topic...

Background callback signalling

Signalling is the process of letting one callback know that the long running background callback is complete.

The general pattern goes like this:

The general steps of the code below are:

  1. Add a dcc.Store component to our layout.
  2. Create a background callback that calls the expensive function and outputs a "signalling" value to the dcc.Store when it's complete.
  3. Create another callback that uses the dcc.Store as an input so that it is only triggered once the expensive function is complete.
# Add to layout
data_signal_store = dcc.Store(id="data-signal-store")

@callback(
Output("data-signal-store", "data"),
Input("date-picker", "date"),
background=True,
)
def fetch_data(datestr: str) -> str:
"""Fetches the data in the background
"""
_date = datetime.strptime(datestr, DATE_FORMAT_STRING)

# Takes a long time to execute
data_df = get_clickhouse_data(date=_date)

return f"signal-{start_datetime.strftime(DATE_FORMAT_STRING)}">)

@dash.callback(
Output("output-one", "children"),
Output("output-two", "children"),
Input("data-signal-store", "data"),
State("date-picker", "date"),
)
def update_widget(
board_state_signal: str,
datestr: str,
) -> tuple[str, str]:
"""Updates the widgets with the data
"""
_date = datetime.strptime(datestr, DATE_FORMAT_STRING)

# Gets fetched from cache
data_df = get_clickhouse_data(date=_date)

# Other logic

return widget_one, widget_two

In this way, we can ensure the long-running tasks always execute before the main callbacks as Dash will only execute the update_widget callback once the background callback is complete.

Authentication

As Dash is based upon Flask, you can just use Flask Login to manage your Auth.

Create routes for your particular use case such as /login, /unauthorized, /auth on the Flask server directly …


@server.route("/login", methods=["GET"])
def login():
next_url = flask.request.args.get("next")

# Validate that next is a valid url and has the same host as the server
valid_next_url = validate_next_parameter(flask.request, next_url)
if not valid_next_url:
return "Invalid URL for redirect", 400

flash("You have to be logged in to access this page.")

logger.info("Logging user in...")
# Custom login logic

return redirect(f"{url}/?next={next_url}"

... then add authentication to routes by checking whether the user is logged in before every request. Here is an example:

@server.before_request
def before_request_func():
# Allow the auth routes
allowed_paths = [
"/alive",
"/ready",
"/auth",
"/unauthorized",
"/login",
]

if not current_user.is_authenticated and not any(
request.path.startswith(path) for path in allowed_paths
):
next_url = request.url
valid_next_url = validate_next_parameter(flask.request, next_url)
if not valid_next_url:
return "Invalid URL for redirect", 400

return redirect(
f"/login?next={next_url}",
)

Embedding your application in an iFrame

I sometimes have the case where users would like to embed certain graphs in Confluence or some other tool. To achieve this, one needs to add the website to X-Frame-Options and Content-Security-Policy in the request headers. To read more about these policies see Content Security Policy and Same-origin policy.

@server.after_request
def add_headers(response):
response.headers["X-Frame-Options"] = "ALLOW-FROM https://confluence.com/"
response.headers[
"Content-Security-Policy"
] = "frame-ancestors 'self' https://confluence.com/;"
return response

Logging and metrics

I like to use Loguru for logging and Prometheus for metrics. You can then use your companies usual monitoring tools such as Grafana or ELK to monitor the usage of your application.

Customizing components

Many companies have their own design system and typically want any developed applications to match their design. With Plotly Dash, you can relatively easily convert your companies components (especially if they are React components) to Dash components. They even provide a template to start with.

Alternatively, if you just want something that looks good without too much customization Dash Bootstrap Components or Dash Mantine Components are good alternatives to the dash core components.

Deploying

Deploying a Dash app is pretty easy. To include caching and background jobs, it requires just three containers. One for the Dash app itself, one for the background jobs and one for Redis.

Where to deploy is a matter of personal preference. I've always deployed to Kubernetes using a CI pipeline for seamless updates but docker compose would work just as well.

Local testing with Docker

This can all be tested locally with docker.

docker-compose.yaml

services:
myapp:
build:
context: .
dockerfile: Dockerfile
ports:
- 8080:8080
- 8050:8050
volumes:
- ./config.env:/app/config.env
- ./secrets.env:/app/secrets.env
- .:/app
- /app/prometheus/
environment:
- PYTHONUNBUFFERED=1
- PROMETHEUS_MULTIPROC_DIR=./prometheus
- USE_REDIS=True
- REDIS_DSN=redis
command: ["gunicorn", "main:server", "--reload", "--workers=1"]
dash-celery:
build:
context: .
dockerfile: Dockerfile
ports:
- 5555:5555
volumes:
- ./config.env:/app/config.env
- ./secrets.env:/app/secrets.env
- .:/app
- /app/prometheus/
entrypoint: ["celery"]
command: ['-A', 'main:celery_app', 'worker', '--loglevel=debug', '--concurrency=2', '--without-mingle', '-n', 'worker1']
environment:
- USE_REDIS=True
- REDIS_DSN=redis
redis:
build:
context: ./redis/
dockerfile: Dockerfile
ports:
- 6379:6379

Dash supports multiple pages, so the different pages go into the pages folder.

Once the initial overhead of setting up the database connections, deployments and creating a few sample pages even inexperienced developers and analysts can create new pages using the existing ones as a template.

It's as easy as copying an existing page from your pages folder and editing as needed.

Sharing views

Sharing reports with others can be done using query parameters as long as you ensure all callback inputs are parametrized in the URL. This means you can share the exact view by copying the URL.


# Add to the header in your main layout
dcc.Location(id="url", refresh=False),

# In each page create the following callback
@dash.callback(
Output("url", "search", allow_duplicate=True),
Input("input-one", "value"),
Input("input-two", "value"),
prevent_initial_call="initial_duplicate",
)
def update_url_state(
input_one: list[str],
input_two: int,
) -> str:
"""
Updates the URL query parameters based upon the input values
"""

params = ""
if input_one:
#Serialize any objects to json
params += f"input_one={json.dumps(input_one, separators=(',', ':'))}"
if input_two:
params += f"&input_two={input_two}"

if params != "":
return f"?{params}"
return ""


# Define the layout for the page with the parameters and inputs
def layout(
input_one: str | None = None,
input_two: int | None = None,
) -> dash.development.base_component.Component:

# Deserialize the object from json in the query parameter
_input_one = []
if input_one:
_input_one = json.loads(input_one)

... other layout

Conclusion

Dash is a great tool to create easy to update and maintainable reports. Once the initial overhead of configuration and deployment is done, adding and updating reports becomes trivial. If someone in your company is proposing to create over-engineered reporting infrastructure, pause for a moment and ask yourself whether it is really needed.

In many cases, something simpler might be all that's required.

You might find that all you need is Dash.


Please let me know if I have missed something or made any mistakes.

Footnotes

  1. This is assuming you already some some databases or data sources that you can utilise for your dashboards.