Nousot logo

Local Development

The Nousot Ecosystem includes the Nous CLI, which provides an easy-to-use interface for development, deployment, and management of the Ecosystem. One of the primary functions of the CLI is to provide a development environment for the Data Engineering Platform.

NOTE: The Nous CLI is currently experimental. Commands and functionality are subject to change without notice or warning. The CLI is privately available, so please contact us for access. To run the CLI, you must be authenticated with Nousot's sandbox AWS environment to set up credentials and pull the proper images.

Installing

The Nous CLI is installable through pip. Simply run "pip install nous-cli" in your environment to download and install.

Before starting with the CLI, please ensure that you have Docker installed and running on your machine. The CLI uses Docker to run a number of services.

Configuring

The CLI can be configured by running:

nous config set <config name> <config value>

Valid values are included here for reference:

Config Name
Valid Values
Default Value
Description
runtimenative, dockerdockerThe runtime used for running workloads. The "native" workload will install and run everything directly on your machine, whereas "docker" will run everything within containers. Regardless of runtime, Docker will need to be running on your machine to run the Nous CLI.
spark_version3.1, 3.2, 3.33.2The version of Apache Spark to use when running commands.

Starting

Whether using the "native" or "docker" runtime, there are services that need to run in the background as Docker containers. These services are started by running:

nous start

On first startup the "start" command will initialize the environment.

Running Airflow

The CLI can be used to run Airflow DAGs (Directed Acyclic Graphs) for testing and debugging without the need to deploy to a shared environment. Currently, the command must be run from the folder that contains the files where the DAGs are specified. The following command will run the specified DAG:

nous run airflow <Python module> <DAG variable>

Example: If the DAG is constructed thusly:

with DAG(...) as ingest_dag:

And it lives in the file "main_schedule.py", you would run the command:

nous run airflow main_schedule ingest_dag

The CLI will handle the configuration of Airflow, Spark, dbt, etc.

Running the Lakehouse

When running the CLI with the "docker" runtime, the lakehouse will begin running in the background when you run "nous start".

When running with the "native" runtime, you will need to explicitly run:

nous run lakehouse

With the lakehouse running, you can connect to it as specified in the Running SQL section, using "localhost" as the host name when setting up the connection.

Running dbt

To run a dbt project against the local lakehouse, simply navigate to the folder containin the dbt project and run:

nous run dbt

The CLI will handle configuration and override any default dbt profiles that may already exist.

You can also run dbt tests with:

nous run dbt --dbt-command test

Stopping

To stop all running containers, regardless of which runtime is configured, run:

nous stop

Resetting

If at any point the CLI is in an unstable state, or you simply want to reset the data in the lakehouse including any changes made and start over, you can run:

nous reset

After a reset, the start command will need to be rerun and any configuration changes will need to be remade.

Further Reading