Improving Python Dependency Management With pipx and Poetry
Over time, how I develop applications in python has changed noticeably. I will divide the topic into three sections and see how they tie into each other at the end.
- Development
- Packaging
- Usage
Development#
Under development, the issues I will focus on are the following:
- Dependency Management
- Virtualenvs and managing them
Historically, the way to do dependency management was through
requirements.txt
. I found requirements.txt
hard to manage. In that setup,
adding a dependency and installing it was two steps:
- Add the package
bar
torequirements.txt
- Either do
pip install bar
orpip install -r requirements.txt
While focused on development, I would often forget one or both of these steps.
Also, the lack of a lock file was a small downside for me (could be a much
larger downside for others). The separation between pip
and
requirements.txt
can also easily lead you to accidentally depend on packages
installed on your system or in your virtualenv but not specified in your
requirements.txt
.
Managing virtualenvs was also difficult. As a virtualenv and a project are not
related, you need a directory structure. Otherwise, you can’t tell which
virtualenv is being used for which project. You can use the same virtualenvs
for multiple projects, but that partially defeats the point of virtualenvs and
makes requirements.txt
more error-prone (higher chances of forgetting to add
packages to it). The approach generally used is one of the following two:
foo/
├── foo_src/
└── foo_venv/
or
foo_src/
└── venv/
I preferred the second one as the first one nests the source code one directory deeper.
A new standard - pyproject.toml
#
In PEP-518
, python standardized
the pyproject.toml
file which allows users to choose alternate build systems
for package generation.
One such project that provides an alternate build system is Poetry . Poetry hits the nail on the head and solves my major gripes with traditional tooling.
Poetry and virtualenvs#
Poetry manages the virtualenvs automatically and keeps track of which project uses which virtualenv automatically. Working on an existing project which uses poetry is as simple as this:
$ git clone https://gitlab.com/ceda_ei/verlauf
$ poetry install
The poetry install
command sets up the virtualenv, install all the required
dependencies inside that, and sets up any commands accordingly (I will get to
this soon). To activate the virtualenv, simply run:
. "$(poetry env info --path)/bin/activate"
I wrap this in a small function which lets me toggle it quickly:
function poet() {
POET_MANUAL=1
if [[ -v VIRTUAL_ENV ]]; then
deactivate
else
. "$(poetry env info --path)/bin/activate"
fi
}
Running poet
activates the virtualenv if it is not active and deactivates it if
it is active. To make things even easier, I automatically activate and
deactivate the virtualenv as I enter and leave the project directory. To do
so, simply drop this in your .bashrc
.
function find_in_parent() {
local path
IFS="/" read -ra path <<<"$PWD"
for ((i=${#path[@]}; i > 0; i--)); do
local current_path=""
for ((j=1; j<i; j++)); do
current_path="$current_path/${path[j]}"
done
if [[ -e "${current_path}/$1" ]]; then
echo "${current_path}/"
return
fi
done
return 1
}
function auto_poet() {
ret="$?"
if [[ -v POET_MANUAL ]]; then
return $ret
fi
if find_in_parent pyproject.toml &> /dev/null; then
if [[ ! -v VIRTUAL_ENV ]]; then
if BASE="$(poetry env info --path)"; then
. "$BASE/bin/activate"
PS1=""
else
POET_MANUAL=1
fi
fi
elif [[ -v VIRTUAL_ENV ]]; then
deactivate
fi
return $ret
}
PROMPT_COMMAND="auto_poet;$PROMPT_COMMAND"
This ties in well with the poet
function; if you use poet
anytime in a bash
session, activation switches from automatic to manual and changing directories
no longer auto-toggles the virtualenv.
Poetry and dependency management#
Instead of using requirements.txt
, poetry stores the dependencies inside
pyproject.toml
. Poetry is more strict compared to pip
in resolving
versioning issues. Dependencies and dev-dependencies are stored inside
tool.poetry.dependencies
and tool.poetry.dev-dependencies
respectively.
Here is an example of a pyproject.toml
for a project I am working on.
[tool.poetry]
name = "bells"
version = "0.3.0"
description = "Bells is a program for keeping track of sound recordings."
authors = ["Ceda EI <ceda_ei@webionite.com>"]
license = "GPL-3.0"
readme = "README.md"
homepage = "https://gitlab.com/ceda_ei/bells.git"
repository = "https://gitlab.com/ceda_ei/bells.git"
[tool.poetry.dependencies]
python = ">=3.7,<3.11"
click = "^8.0.1"
questionary = "^1.10.0"
sounddevice = "^0.4.2"
SoundFile = "^0.10.3"
numpy = "^1.21.2"
[tool.poetry.dev-dependencies]
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
# I will talk about this section soon
[tool.poetry.scripts]
bells = "bells.__main__:main"
One of the upsides of poetry is that you don’t have to manage the dependencies
in pyproject.toml
file yourself. Poetry adds an npm
-like interface for
adding and removing dependencies. To add a dependency to your project, simply
run poetry add bar
and it will add it to your pyproject.toml
file and
install it in the virtualenv as well. To remove a dependency, just run poetry remove bar
. For development dependencies, just add the --dev
flag to the
commands.
Packaging#
Since poetry replaces the build system, we can now configure the build using
poetry via pyproject.toml
. Inside pyproject.toml
, the tool.poetry
section
stores all the build info needed; tool.poetry
contains the metadata,
tool.poetry.dependencies
contains the dependencies, tool.poetry.source
contains private repository details (in case, you don’t want to use PyPi).
One of the options is tool.poetry.scripts
. It contains scripts that the
project exposes. This replaces console_scripts
in entry_points
of
setuptools
.
For example,
[tool.poetry.scripts]
foobar = "foo.bar:main"
This will add a script named foobar
in your PATH
. Running that is
equivalent to running the following script
from foo.bar import main
if __name__ == "__main__":
main()
For further details, check the reference .
Poetry also removes the need for manually doing editable installs (pip install -e .
). The package is automatically installed as editable when you run
poetry install
. Any scripts specified in tool.poetry.scripts
are
automatically available in your PATH
when you activate the venv
.1
To build the package, simply run poetry build
. This will generate a wheel and
a tarball in the dist folder.
To publish the package to PyPi (or another repo), simply run poetry publish
.
You can combine the build and publish into one command with poetry publish --build
.
Usage#
This part is more user-facing rather than dev-facing. If you want to use two
packages globally that expose some scripts to the user, (e.g. awscli
,
youtube-dl
, etc.) the general approach to do so is to run something like pip install --user youtube-dl
. This install the package at the user level and
exposes the script through ~/.local/bin/youtube-dl
. However, this installs
all the packages at the same user level. Hypothetically, if you have two
packages foo
and bar
which have conflicting dependencies, this causes an
issue. If you run,
$ pip install foo
$ pip install bar
$ bar # works
$ foo # breaks because of dependency mismatch
While installing bar
, pip
will install the dependencies for bar
which
will break foo
after warning you2.
To solve this, there is pipx
. Pipx installs
each package in a separate virtualenv without requiring the user to activate
said virtualenv before using the package.3
In the same scenario as before, doing the following works just fine.
$ pipx install foo
$ pipx install bar
$ bar # works
$ foo # also works
In this scenario, both bar
and foo
are installed in separate virtualenvs so
the dependency conflict doesn’t matter.
Some more things from my bashrc#
function wrapper_no_poet() {
local last_env
if [[ -v VIRTUAL_ENV ]]; then
last_env="$VIRTUAL_ENV"
deactivate
fi
"$@"
ret=$?
if [[ -v last_env ]]; then
. "$last_env/bin/activate"
fi
return $ret
}
alias wnp='wrapper_no_poet'
alias pm='POET_MANUAL=1'
Prefixing any command with wnp
runs it outside the virtualenv if a virtualenv
is active. Running pm
turns off automatic virtualenv activation.
-
This also allows for a nice switch between the development and production versions of the app. Essentially, when the virtualenv is active, you are using the development script while when it is deactivated, you are using the global (likely production) version. ↩︎
-
To be precise, it will warn you that it broke
foo
but will still continue with the installation ↩︎ -
For development, poetry also provides
poetry run
which runs a file without having to activate the virtualenv. ↩︎