Skip to main content

Setting up your Dev Environment

Prerequisites#

In order to contribute to Great Expectations, you will need the following:

Fork and clone the repository#

1. Fork the Great Expectations repo#

  • Go to the Great Expectations repo on GitHub.

  • Click the Fork button in the top right. This will make a copy of the repo in your own GitHub account.

  • GitHub will take you to your forked version of the repository.

2. Clone your fork#

  • Click the green Clone button and choose the SSH or HTTPS URL depending on your setup.

  • Copy the URL and run git clone <url> in your local terminal.

  • This will clone the develop branch of the great_expectations repo. Please use develop (not main!) as the starting point for your work.

  • Atlassian has a nice tutorial for developing on a fork.

3. Add the upstream remote#

  • On your local machine, cd into the great_expectations repo you cloned in the previous step.

  • Run: git remote add upstream git@github.com:great-expectations/great_expectations.git

  • This sets up a remote called upstream to track changes to the main branch.

4. Create a feature branch to start working on your changes.#

  • Example: git checkout -b feature/my-feature-name

  • We do not currently follow a strict naming convention for branches. Please pick something clear and self-explanatory, so that it will be easy for others to get the gist of your work.

Install Python dependencies#

5. Create a new virtual environment#

  • Make a new virtual environment (e.g. using virtualenv or conda), name it “great_expectations_dev” or similar.

  • Ex virtualenv: python3 -m venv <path_to_environments_folder\>/great_expectations_dev and then <source path_to_environments_folder\>/great_expectations_dev/bin/activate

  • Ex conda: conda create --name great_expectations_dev and then conda activate great_expectations_dev

This is not required, but highly recommended.

6. Install dependencies from requirements-dev.txt#

  • pip install -r requirements-dev.txt -c constraints-dev.txt

  • MacOS users will be able to pip / pip3 install requirements-dev.txt using the above command from within conda, yet Windows users utilizing a conda environment will need to individually install all files within requirements-dev.txt

  • This will ensure that sure you have the right libraries installed in your Python environment.

    • Note that you can also substitute requirements-dev-test.txt to only install requirements required for testing all backends, and requirements-dev-spark.txt or requirements-dev-sqlalchemy.txt if you would like to add support for Spark or SQLAlchemy tests, respectively. For some database backends, such as MSSQL additional driver installation may required in your environment; see below for more information.

    • Installing Microsoft ODBC driver for MacOS

    • Installing Microsoft ODBC driver for Linux

7. Install great_expectations from your cloned repo#

  • pip install -e .

    *-e will install Great Expectations in “editable” mode. This is not required, but is often very convenient as a developer.

(Optional) Configure resources for testing and documentation#

Depending on which features of Great Expectations you want to work on, you may want to configure different backends for local testing, such as PostgreSQL and Spark. Also, there are a couple of extra steps if you want to build documentation locally.

If you want to develop against local PostgreSQL:#

  • To simplify setup, the repository includes a docker-compose file that can stand up a local PostgreSQL container. To use it, you’ll need to have Docker installed.

  • Navigate to assets/docker/postgresql in your great_expectations repo and run docker-compose up -d

  • Within the same directory, you can run docker-compose ps to verify that the container is running. You should see something like:

        Name                       Command              State           Ports———————————————————————————————————————————postgresql_travis_db_1   docker-entrypoint.sh postgres   Up      0.0.0.0:5432->5432/tcp
  • Once you’re done testing, you can shut down your postgesql container by running docker-compose down from the same directory.

  • Caution: If another service is using port 5432, Docker may start the container but silently fail to set up the port. In that case, you will probably see errors like this:

    psycopg2.OperationalError: could not connect to server: Connection refused    Is the server running on host "localhost" (::1) and accepting    TCP/IP connections on port 5432?could not connect to server: Connection refused    Is the server running on host "localhost" (127.0.0.1) and accepting    TCP/IP connections on port 5432?
  • Or this…

    sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL:  database "test_ci" does not exist(Background on this error at: http://sqlalche.me/e/e3q8)

If you want to develop against local mysql:#

  • To simplify setup, the repository includes a docker-compose file that can stand up a local mysqldb container. To use it, you’ll need to have Docker installed.

  • Navigate to assets/docker/mysql in your great_expectations repo and run docker-compose up -d

  • Within the same directory, you can run docker-compose ps to verify that the container is running. You should see something like:

          Name                   Command             State                 Ports------------------------------------------------------------------------------------------mysql_mysql_db_1   docker-entrypoint.sh mysqld   Up      0.0.0.0:3306->3306/tcp, 33060/tcp
  • Once you’re done testing, you can shut down your mysql container by running docker-compose down from the same directory.

  • Caution: If another service is using port 3306, Docker may start the container but silently fail to set up the port.

If you want to develop against local Spark:#

  • In most cases, pip install requirements-dev.txt should set up pyspark for you.

  • If you don’t have Java installed, you will probably need to install it and set your PATH or JAVA_HOME environment variables appropriately.

  • You can find official installation instructions for Spark here.

If you want to build documentation locally:#

  • pip install -r docs/requirements.txt

  • To build documentation, the command is cd docs; make html

  • Documentation will be generated in docs/build/html/ with the index.html as the index page.

  • Note: we use autoapi to generate API reference docs, but it’s not compatible with pandas 1.1.0. You’ll need to have pandas 1.0.5 (or a previous version) installed in order to successfully build docs.

Run tests to confirm that everything is working#

  • You can run all tests by running pytest in the great_expectations directory root. Please see Testing for testing options and details.

Start coding!#

At this point, you have everything you need to start coding!