Prometheus, Python, Flask

Python based apps are significant part of the ecosystem in our company. In the era of microservices there is really no difference which technology is in use under the hood. But each service should provide its own API and the way to monitor its state.

In our setup we use Prometheus (opens new window) to aggregate metrics from all our services. And it works fine. Especially with our JVM apps. But that was not a case for all out Pythons till today.

As a historical tradition our Pythons apps are mostly done with Flask (opens new window). And it is a nice framework. In some cases it is also wrapped with Gunicorn (opens new window). And here is a tutorial how to make it work with Prometheus:

  1. Add the dependency to your project setup.py to prometheus_client==0.1.1
  2. Inside your application define some metrics you want to collect, for example: FLASK_REQUEST_LATENCY = Histogram(__name__.replace(‘.’, ‘_’) + ‘_request_latency_seconds’, ‘Flask Request Latency’) .
  3. Annotate a method you want to measure with @FLASK_REQUEST_LATENCY.time()
  4. Add endpoint to expose the statistics:
@app.route('/stats', methods=['GET'])
def metrics():
    return generate_latest(registry), 200 #registry is global

And that’s it! If your app is just plain Flask without Gunicorn. To make it work in multiprocessor scenario you need to make several additional steps:

  1. Create config for Gunicorn:
def worker_exit(server, worker):
    from prometheus_client import multiprocess
    multiprocess.mark_process_dead(worker.pid)
  1. This config can be provided to gunicorn cli with -c  flag.

  2. Add Environment variable prometheus_multiproc_dir that should point to a directory where Prometheus can temporarily store the metrics. If you use Kubernetes update the deployment descriptor to mount Volume with emptyDir  type.

  3. And the last thing to do — update code for your /stats endpoint like this:

@app.route('/stats', methods=['GET'])
def metrics():
    registry = CollectorRegistry()
    multiprocess.MultiProcessCollector(registry)
    return generate_latest(registry), 200

I hope this short guide can be useful and will help you to make your apps measurable and system more reliable.