Plotly Dash is all you need

September 1, 2024 · 11 min read

Engineering @ Tesla

… for enterprise analytics and reporting.¹

*This post is inspired by DHH's merchants of complexity.

Plotly Dash is a versatile and simple solution for creating interactive dashboards. Over the years, I've seen many teams over-engineer their dashboards. The usual approach looks like a custom backend that interfaces with their various data sources and then uses a custom React frontend. This is great if you have infinite time and resources with relaxed deadlines.

In the real world, people often need to see a report on something promptly. They might also not know exactly what they need to see. They just know that they need to see something and something is better than nothing. You will have a lot of back and forth, with a lot of iteration.

Plotly Dash allows one to develop these reports quickly and present them to your users within a few days, if not hours once you have a project set up. You can then iterate to provide something useful within a very short time frame. You can use version control or multiple environments so that any changes can easily be rolled back. You don't need to be afraid of moving quickly.

Dash also looks good enough with sensible defaults. If one needs a highly customized frontend, Dash might not the right solution, but it could still be with some simple customization to follow your companies design guidelines. It also performs really well for most users who don't need their to scale to tens of thousands of users.

Typical project layout

Let's get into how to structure your Dash project and go through a few common issues that you might encounter.

While a Plotly Dash project is simple to set up, it still requires some overhead. A typical layout might look something like this:

assets
components
db
config
pages
redis
  - Dockerfile
  - redis.conf
prometheus
schemas
tests
utils
.dockerignore
.gitignore
.pre-commit-config.yaml
.python-version
config.env
docker-compose.yml
Dockerfile
gunicorn.conf.py
main.py
odbcinst.ini
poetry.toml
poetry.lock
pyproject.toml
README.md
secrets.env (Don't add to git)

Configuration

For all my projects, I follow the Twelve Factor App principles.

Application config and secrets

For any Python application, the Pydantic Settings Management works wonderfully to load your config from environment variables and validate that your config is correct. This helps debug deployment issues and also helps document what should go into each environment variable.

Data sources and database connections

Dash allows you to use data from any source. Databases, APIs, CSV files it doesn't matter.

In some of the applications I've developed, I needed to connect to many databases. This can get quite overwhelming, but a simple pattern can help manage all the connections.

Firstly the database connection details are defined in your Pydantic Config and then after the config has been initialised you can set up the various connections using connection strings. In this case I'm using SQLAlchemy which allows connection to many databases with a common interface.

To standardize the connection create a simple Pydantic model.

from pydantic import BaseModel

class SQLAlchemyBind(BaseModel):
    """SQLAlchemy bind config.
    This object is a generic representation of SQLAlchemy bind configs.
    """
    url: str
    isolation_level: str | None = None

In the config file, define all your database connection strings.

config = DashboardConfig(_env_file=["config.env", "secrets.env"], _env_file_encoding="utf-8") # type: ignore

config.sqlalchemy_binds = {
	"mysql_db": SQLAlchemyBind(
		url=f"mysql+pymysql://{config.mysql_user}:{config.mysql_pass.get_secret_value()}@{config.mysql_host}:{config.mysql_port}/{config.mysql_name}",
		isolation_level="READ UNCOMMITTED",
		),
	"clickhouse_db": SQLAlchemyBind(
		url=f"clickhouse://{config.clickhouse_username}:{config.clickhouse_secret.get_secret_value()}@{config.clickhouse_host}:{config.clickhouse_port}/{config.clickhouse_name}",
		isolation_level=None,
),

To create and collect all the database sessions, create an init_sessions function which will store all the database sessions in a dictionary.

You can then access a particular database session using database_sessions[bind].

In db/__init__.py:

database_sessions: dict[str, scoped_session] = {}

@contextmanager
def query_db(
    query_path: str, bind: str, params: dict = {}
) -> Generator[list[dict[str, Any]], None, None]:
    """Executes the given query and returns the results.
    Parameters
    ----------
    query_path: str
        Path to the query file
    bind: str
        sqlalchemy bind id
    params: dict
        Parameters to render in query
    Yields
    ------
    list[Dict[str, Any]]: query results
    """

    try:
        with open(query_path, "r") as f:
            query = f.read()
    except FileNotFoundError as e:
        logger.error(f"Error opening query file {query_path}: {e}")
        raise
    except Exception as e:
        logger.error(f"An unexpected error occurred while opening query file {query_path}: {e}")
        raise

    db_session = database_sessions[bind]
    session = db_session()

    try:
        logger.debug(f"Query {bind}: {query_path} executed with params {params}")
        results = session.execute(text(query), params=params).mappings().all()
        # Convert SQLAlchemy Row to dict for compatibility
        yield [dict(row) for row in results]
    except Exception as e:
        logger.error(f"An error occurred while executing query {query_path}: {e}")
        raise
    finally:
        try:
            db_session.close()
        except Exception as e:
            logger.error(f"An error occurred while closing the database session: {e}")

def init_sessions():
    for bind in config.sqlalchemy_binds.keys():
        engine = create_engine(
            url=config.sqlalchemy_binds[bind].url,
            isolation_level=config.sqlalchemy_binds[bind].isolation_level,
            pool_recycle=3600,
            pool_pre_ping=True,
        )
        session_factory = sessionmaker(bind=engine)
        db_session = scoped_session(session_factory)
        database_sessions[bind] = db_session

I personally put all queries into .sql files structured according to the database name. The queries can be linted and formatted with SQLFluff.

Then you can use the query using the template:

db/clickhouse.py

@cache.memoize(config.cache_timeout_seconds)
def get_ultimate_question(
    parameter_one: str
) -> pl.DataFrame:
    day_start_minute = int(utc_day_start_hour * 60)

    with get_result(
        query_path="db/queries/clickhouse/some_amazing_query.sql",
        bind="clickhouse",
        params={
            "parameter_one": parameter_one,
        },
    ) as results:
        ultimate_question = pl.DataFrame(
            results,
            schema={
                "name": pl.Utf8,
                "date": pl.Date,
                "hour": pl.Int8,
                "answer": pl.Int16,
            },
            orient="row",
        )

		# Any other logic you need in Python...

    return ultimate_question

Caching and background callbacks

Caching is important for any Plotly Dash app to ensure good performance and limit recomputing things unnecessarily.

One can cache function calls with the @cache.memoize decorator. In practice, this usually means adding the decorator to any database query functions to cache the database queries.

This allows you to call the same function in multiple other callbacks, without having to fetch data again.

This goes hand in hand with using background callbacks to execute queries that take longer than a few seconds. The function that queries the database is called first in the background callback and once the background callback is complete a subsequent callback can call that same function and get the result almost immediately.

This leads onto the next topic...

Background callback signalling

Signalling is the process of letting one callback know that the long running background callback is complete.

The general pattern goes like this:

# Add to layout
data_signal_store = dcc.Store(id="data-signal-store")

@callback(
    Output("data-signal-store", "data"),
    Input("date-picker", "date"),
    background=True,
)
def fetch_data(datestr: str) -> str:
    """Fetches the data in the background
    """
    _date = datetime.strptime(datestr, DATE_FORMAT_STRING)

	# Takes a long time to execute
    data_df = get_clickhouse_data(date=_date)
    
    return f"signal-{start_datetime.strftime(DATE_FORMAT_STRING)}">)

@dash.callback(
    Output("output-one", "children"),
    Output("output-two", "children"),
    Input("data-signal-store", "data"),
    State("date-picker", "date"),
)
def update_widget(
    board_state_signal: str,
    datestr: str,
) -> tuple[str, str]:
    """Updates the widgets with the data
    """
    _date = datetime.strptime(datestr, DATE_FORMAT_STRING)

	# Gets fetched from cache
	data_df = get_clickhouse_data(date=_date)

	# Other logic

	return widget_one, widget_two
    

In this way, we can ensure the long-running tasks always execute before the main callbacks as Dash will only execute the update_widget callback once the background callback is complete.

Authentication

As Dash is based upon Flask, you can just use Flask Login to manage your Auth.

Create routes for your particular use case such as /login, /unauthorized, /auth on the Flask server directly …

@server.route("/login", methods=["GET"])
def login():
    next_url = flask.request.args.get("next")

    # Validate that next is a valid url and has the same host as the server
    valid_next_url = validate_next_parameter(flask.request, next_url)
    if not valid_next_url:
        return "Invalid URL for redirect", 400

    flash("You have to be logged in to access this page.")

    logger.info("Logging user in...")
    # Custom login logic

    return redirect(f"{url}/?next={next_url}"

... then add authentication to routes you specify by checking whether the user is logged in before every request. Here is an example:

@server.before_request
def before_request_func():
	# Allow the auth routes
    if request.path in ["/auth", "/unauthorized"]:
        pass
    else:
        excluded_paths = [
            "/_dash-update-component",
            "/assets",
            "/_dash-dependencies",
            "/_dash-layout",
            "/_dash-component-suites",
            "/_reload-hash",
            "/alive",
            "/ready",
        ]
        # Just log the user and request for the main paths in the app and not all the dash paths
        if not any(request.path.startswith(excluded_path) for excluded_path in excluded_paths):
            logger.info(f"Before request checking auth for user: {request.path} {current_user}")
        if not current_user.is_authenticated and request.path not in ["/login"]:
            # Check the request path and method
            if (request.path == "/_dash-update-component") and request.method == "POST":
                # If the user is not logged in, return a JSON response to redirect to /login
                return jsonify({"multi": True, "response": {"url": {"pathname": "/login"}}})
            else:
                # Validate that next is a valid url and has the same host as the server
                next_url = request.url
                valid_next_url = validate_next_parameter(flask.request, next_url)
                if not valid_next_url:
                    return "Invalid URL for redirect", 400

                return redirect(
                    f"/login?next={next_url}",)

Embedding your application in an iFrame

I sometimes have the case where users would like to embed certain graphs in Confluence or some other tool. To achieve this, one needs to add the website to X-Frame-Options and Content-Security-Policy in the request headers. To read more about these policies see Content Security Policy and Same-origin policy.

@server.after_request
def add_headers(response):
    response.headers["X-Frame-Options"] = "ALLOW-FROM https://confluence.com/"
    response.headers[
        "Content-Security-Policy"
    ] = "frame-ancestors 'self' https://confluence.com/;"
    return response>

Logging and metrics

I like to use Loguru for logging and Prometheus for metrics. You can then use your companies usual monitoring tools such as Grafana or ELK to monitor the usage of your application.

Customizing components

Many companies have their own design system and typically want any developed applications to match their design. With Plotly Dash, you can relatively easily convert your companies components (Especially if they are React components) to Dash components. They even provide a template to start with.

Alternatively, if you just want something that looks good without too much customization Dash Bootstrap Components or Dash Mantine Components are good alternatives to the dash core components.

Deploying

Deploying a Dash app is pretty easy. To include caching and background jobs, it requires just the three containers. One for the Dash app itself, one for the background jobs and one for Redis.

Where to deploy is a matter of personal preference. I've always deployed to Kubernetes using a CI pipeline for seamless updates.

Local testing with Docker

This can all be tested locally with docker.

docker-compose.yaml

services:
  myapp:
    platform: linux/arm64
    build:
      context: .
      dockerfile: DockerfileMac
    ports:
      - 8080:8080
      - 8050:8050
    volumes:
      - ./config.env:/app/config.env
      - ./secrets.env:/app/secrets.env
      - .:/app
      - /app/prometheus/
    environment:
      - PYTHONUNBUFFERED=1
      - PROMETHEUS_MULTIPROC_DIR=./prometheus
      - USE_REDIS=True
      - REDIS_DSN=redis
    command: ["gunicorn", "main:server", "--reload", "--workers=1"]
  dash-celery:
    platform: linux/arm64
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - 5555:5555
    volumes:
      - ./config.env:/app/config.env
      - ./secrets.env:/app/secrets.env
      - .:/app
      - /app/prometheus/
    entrypoint: ["celery"]
    command: ['-A', 'main:celery_app', 'worker', '--loglevel=debug', '--concurrency=2', '--without-mingle', '-n', 'worker1']
    environment:
      - USE_REDIS=True
      - REDIS_DSN=redis
  redis:
    build:
      context: ./redis/
      dockerfile: Dockerfile
    ports:
      - 6379:6379

Adding pages and creating direct links to reports

Pages go into the pages folder. As Dash supports multiple pages

Once the initial overhead of setting up the database connections, deployments and creating a few sample pages even inexperienced developers and analysts can create new pages using the existing ones as a template.

It's as easy as: Copy an existing page from your pages folder and edit as needed.

Sharing reports with others can be done using query parameters as long as you ensure all callback inputs are parametrized in the URL. This means you can share the exact view by copying the URL.

# Add to the header in your main layout
dcc.Location(id="url", refresh=False),

# In each page create the following callback
@dash.callback(
    Output("url", "search", allow_duplicate=True),
    Input("input-one", "value"),
    Input("input-two", "value"),
    prevent_initial_call="initial_duplicate",
)
def update_url_state(
    input_one: list[str],
    input_two: int,
) -> str:
    """
    Updates the URL query parameters based upon the input values
    """

    params = ""
    if input_one:
	    #Serialize any objects to json
        params += f"input_one={json.dumps(input_one, separators=(',', ':'))}"
    if input_two:
        params += f"&input_two={input_two}"

    if params != "":
        return f"?{params}"
    return ""


# Define the layout for the page with the parameters and inputs
def layout(
    input_one: str | None = None,
    input_two: int | None = None,
) -> dash.development.base_component.Component:

	# Deserialize the object from json in the query parameter
	_input_one = []
	if input_one:
		_input_one = json.loads(input_one)

	... other layout

Conclusion

Dash is a great tool to create easy to update and maintainable reports. Once the initial overhead of configuration and deployment is done, adding and updating reports becomes trivial.

As a reminder, if a team presents a complex solution to a problem ask yourself whether it is all really needed. Break the solution down into what is necessary and see whether the solution that is offered meets those criteria.

You might find that all you need is Dash.

Please let me know if I have missed something or made any mistakes.

This is assuming you already some some databases or data sources that you can utilise for your dashboards. ↩

Typical project layout​

Configuration​

Application config and secrets​

Data sources and database connections​

Caching and background callbacks​

Background callback signalling​

Authentication​

Embedding your application in an iFrame​

Logging and metrics​

Customizing components​

Deploying​

Local testing with Docker​

Adding pages and creating direct links to reports​

Sharing views​

Conclusion​

Footnotes​