Scaled Metrics with Thanos
Thanos is an open-sourced scaled prometheus solution used to provide features like global query view, unlimited retention, and downsampling and compaction. Orchestrator provides large deployments the option to deploy with Thanos in order to have a more robust metrics pipeline.
Deploying Orc8r with Thanos
The terraform module makes deploying Thanos very easy. In your main.tf
you
just need to set the following values:
module orc8r {
thanos_enabled = true
}
module orc8r-app {
thanos_enabled = true
thanos_object_store_bucket_name = "<globally-unique-bucket-name>"
}
Choose a value for thanos_object_store_bucket_name
that will be globally
(across all of AWS) unique, but other than that the value doesn't matter.
That's all you need to do to deploy with Thanos! All interacting components will be adjusted accordingly, so the NMS/Grafana will work the same as before. If you don't care about the internals you can stop reading here.
If you run kubectl -n orc8r get pods
it should now look like this:
NAME READY STATUS RESTARTS AGE
fluentd-6fb9f57dff-ljmfw 1/1 Running 0 24h
fluentd-6fb9f57dff-p54p9 1/1 Running 0 24h
nms-magmalte-f4bbf4cfb-tqblm 1/1 Running 0 24h
nms-nginx-proxy-57b8585d6-4ml6s 1/1 Running 0 24h
orc8r-alertmanager-84d79f774b-4svrs 1/1 Running 0 24h
orc8r-alertmanager-configurer-68d6c55c9c-6q9xg 1/1 Running 0 24h
orc8r-controller-7494c96646-4w4jp 1/1 Running 0 20h
orc8r-controller-7494c96646-7fcg5 1/1 Running 0 20h
orc8r-nginx-5f9d7f4bcc-cz5ld 1/1 Running 0 20h
orc8r-nginx-5f9d7f4bcc-rszpr 1/1 Running 0 20h
orc8r-prometheus-5bdd644fd8-mm8gb 2/2 Running 0 24h
orc8r-prometheus-cache-f84884575-7vw8d 1/1 Running 0 24h
orc8r-prometheus-configurer-69df67988-w9dc6 2/2 Running 0 20h
orc8r-thanos-compact-66dd4d974b-jwzjk 1/1 Running 0 20h
orc8r-thanos-query-5d5cb888bd-vm9t8 1/1 Running 0 114m
orc8r-thanos-store-0-7479bf59f6-97wbp 1/1 Running 0 114m
orc8r-user-grafana-bc644b4fc-28nmf 1/1 Running 0 24h
Notice that the prometheus pod now has another container running, this is the
thanos sidecar. There is another
sidecar that runs with prometheus-configurer
, and then three more components
that run independently: compact, query, and store.
Advanced configuration options
The default infrastructure setup deploys an additional node for Thanos, since
there is one component that requires significant on-node ephemeral storage.
However, you may want to deploy more nodes if you want to make sure thanos
components run on different nodes than the rest of orc8r. To do that you can
override the default value for thanos_worker_groups
in the orc8r
module.
The default value is:
[
{
name = "thanos-1"
instance_type = "m5d.xlarge"
asg_desired_capacity = 1
asg_min_size = 1
asg_max_size = 1
autoscaling_enabled = false
kubelet_extra_args = "--node-labels=compute-type=thanos"
},
]
To add more workers, either adjust the asg_...
values in that object, or add
another entry to that array of worker groups. To specify thanos components to
run on specific nodes, just set the following variables in the orc8r-app
module:
thanos_query_node_selector = "thanos"
thanos_store_node_selector = "thanos"
Note: set the value to the same value you used for
--node-labels=compute-type=<value>
in order to run on that worker group
These are advanced configuration options, and we don't expect them to be necessary, but are available to give more fine-grained control over your deployment.