Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying multi dimensions in a single job #21

Closed
manvitha9347 opened this issue Apr 20, 2022 · 15 comments · Fixed by #53
Closed

Querying multi dimensions in a single job #21

manvitha9347 opened this issue Apr 20, 2022 · 15 comments · Fixed by #53

Comments

@manvitha9347
Copy link

manvitha9347 commented Apr 20, 2022

Hi,
I am working on config to get metrics from multi dimension from a single resource type
my config looks like this

- job_name: azure-metrics-example-dimensions
    scrape_interval: 1m
    scrape_timeout: 1m
    metrics_path: /probe/metrics/list
    params:
      template:
        - 'azuremetricsexplist_{metric}_{aggregation}_{unit}'
      cache:
        - 5s
      subscription:
        - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      resourceType:
        - Microsoft.DocumentDB/databaseAccounts
      metric:
        - ServerSideLatency
      interval: ["PT1M"]
      timespan: ["PT1M"]
      aggregation:
        - average
      metricFilter:
        - DatabaseName eq 'mydb' and CollectionName eq '*' and OperationType eq '*' and ConnectionMode eq '*'
    static_configs:
      - targets: ["172.17.0.2:8080"]

when i use this config, there is a mismatch of data between azure and exporter data
is there a way to specify all dimensions(CollectionName,OperationType,ConnectionMode)of a particular metric(Server side latency) in a single job name ?
can you help me with this? @mblaschke

@manvitha9347
Copy link
Author

manvitha9347 commented Apr 21, 2022

and also, for the same resourceType, I was not able to access the azure via via resourceGroup scope with /resource end point.
any idea why this happens?

- job_name: azure-metrics-example-mongodb
    scrape_interval: 1m
    scrape_timeout: 1m
    metrics_path: /probe/metrics/resource
    params:
      template:
        - 'azuremetricsexp_{metric}_{aggregation}_{unit}'
      cache:
        - 5s
      subscription:
        - xxxxxxxxxxxxxxxxxxxxxxxxxxxx
      target:
         - /subscriptions/xxxxxxxxxxxxxxxxxxxxxxxxxxxx/resourceGroups/oscocosmosxxxxxx
      resourceType:
        - Microsoft.DocumentDB/databaseAccounts
      metric:
        - ServerSideLatency
      interval: ["PT1M"]
      timespan: ["PT1M"]
      aggregation:
        - average
      metricFilter:
        - DatabaseName eq 'osco' and CollectionName eq '*'
    static_configs:
      - targets: ["172.17.0.2:8080"]

@mblaschke
Copy link
Member

what mismatch do you get? can you give me more details?

(added code blocks to your posts)

@manvitha9347
Copy link
Author

manvitha9347 commented Apr 22, 2022

i need to configure multiple dimensions for multiple metrics in a single job name
for example, for
ServerSideLatency metric- Dimesions are DataBaseName,CollectionName,ConnectionMode,OperationType
NormalisedRUConsumption-Dimensions are DataBaseName,CollectionName,PartitionKey

Everything needs to be configured under single prometheus job name
I tried like this:

- job_name: azure-metrics-example-dimensions
    scrape_interval: 1m
    scrape_timeout: 1m
    metrics_path: /probe/metrics/list
    params:
      template:
        - 'azuremetricsexplist_{metric}_{aggregation}_{unit}'
      cache:
        - 5s
      subscription:
        - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      resourceType:
        - Microsoft.DocumentDB/databaseAccounts
      metric:
        - ServerSideLatency
        - NormalizedRUConsumption
      interval: ["PT1M"]
      timespan: ["PT1M"]
      aggregation:
        - average
      metricFilter:
        - DatabaseName eq 'mydb' and CollectionName eq '*' and OperationType eq '*' and ConnectionMode eq '*' and PartitionKey eq '*'
    static_configs:
      - targets: ["172.17.0.2:8080"]
 This is not working (getting no data).Also if I try to configure it only for one metric(as mentioned in the previous comment),that is showing wrong data(higher values due to duplicacy)
 Is there a way to configure this? @mblaschke 

@mblaschke
Copy link
Member

mblaschke commented Apr 26, 2022

which version are you using?

you should get a warning message in the console/container logs that the metric query wasn't possible:

for ServerSideLatency

Metric: ServerSideLatency does not support requested dimension combination: 
databasename,collectionname,operationtype,connectionmode,partitionkey, 
supported ones are: 
DatabaseName,CollectionName,Region,ConnectionMode,OperationType,PublicAPIType

for NormalizedRUConsumption

Metric: NormalizedRUConsumption does not support requested dimension combination: 
databasename,collectionname,operationtype,connectionmode,partitionkey, 
supported ones are: 
CollectionName,DatabaseName,Region,PartitionKeyRangeId,CollectionRid

@mblaschke
Copy link
Member

as a hint: you can try and execute queries with http://azure-metrics-exporter-url/query with your browser if you enable --development.webui(will be always on with next update).

@manvitha9347
Copy link
Author

yes,the combinations were not possible as metric doesnot support requested dimension combination
to make it work i am using a different job for each metric and each combination(2 metrics, 2 jobs)

if the combination is different,i need to write a new job
for suppose,
for metric:ServerSideLatency, I need 2 combinations
1.DatabaseName eq 'osco' and CollectionName eq ''
2.DatabaseName eq 'osco' and ConnectionMode eq '
'
for metric NormalisedRUConsumption I need 2 combinations
1.DatabaseName eq 'osco' and ConnectionMode eq ''
2.DatabaseName eq 'osco' and PartitionKey eq '
'

for this combination I need to write 4 different jobs only then data is matching
Is there a way to configure in single job/should we need to do using 4 different jobs?
@mblaschke

@manvitha9347
Copy link
Author

lastest docker image is used

@mblaschke
Copy link
Member

you need at least two jobs because the dimensions are different

suggestion for metric:ServerSideLatency, use

DatabaseName eq 'osco' and CollectionName eq '*' and and ConnectionMode eq '*'

for metric NormalisedRUConsumption use

DatabaseName eq 'osco' and ConnectionMode eq '*' and PartitionKey eq '*'

on Prometheus side you can combine the samples using sum() and avg() or other functions.

azure-metrics-exporter itself is just a client to Azure Monitor API and doesn't do any additional transformations. It only fetches the metrics and provides them for Prometheus. So you can transform/combine them with PromQL.

@manvitha9347
Copy link
Author

manvitha9347 commented May 4, 2022

DatabaseName eq 'osco' and CollectionName eq '' and ConnectionMode eq ''

this is giving wrong data actually

if i use it as
DatabaseName eq 'osco' and CollectionName eq ''
DatabaseName eq 'osco' and and ConnectionMode eq '
'

this gives correct data when compared to azure
@mblaschke

@mblaschke
Copy link
Member

Are you checking the metrics in Prometheus? And you don't get the combined metrics when you sum() the averages together so it matches the Metrics in Azure?

@manvitha9347
Copy link
Author

Hi @mblaschke
i am using the following config of prometheus

  • job_name: azmetricsexp_ServerSideLatency
    scrape_interval: 5m
    scrape_timeout: 5m
    metrics_path: /probe/metrics/list
    params:
    template:
    - 'azuremetricsexplist_{metric}{aggregation}{unit}'
    cache:
    - 5s
    subscription:
    - xxxxxxxxxxxxxxxxxxxxxxxxxx
    resourceType:
    - Microsoft.DocumentDB/databaseAccounts
    metric:
    - ServerSideLatency
    interval: ["PT5M"]
    timespan: ["PT5M"]
    aggregation:
    - average
    - count
    - maximum
    - total
    metricFilter:
    - DatabaseName eq 'osco' or DatabaseName eq 'orderff' or DatabaseName eq 'auth' and CollectionName eq '' and ConnectionMode eq '' and OperationType eq '*'
    static_configs:
    - targets: ["192.168.0.104:8080"]

    this is the error i see:
    

time="2022-06-14T12:33:12+05:30" level=warning msg="insights.MetricsClient#List: Failure responding to request: StatusCode=529 -- Original
Error: autorest/azure: Service returned an error. Status=529 Code="Unknown" Message="Unknown service error" Details=[{"cost":0,"interval":"PT5M","namespace":"Microsoft.DocumentDb/databaseAccounts","resourceregion":"westus","timespan":"2022-06-14T06:57:12Z/2022-06-14T07:02:12Z","value":[{"displayDescription":"Server Side Latency","errorCode":"Throttled","errorMessage":"Query was throttled with reason: ServerBusy. Requested Metric:CosmosDBCustomer|AzureMonitor|ServerSideLatency. Output Dimensions: collectionname,connectionmode,databasename,operationtype. Dimension Filters: . FirstOutputSamplingType: NullableAverage. Start time: 6/14/2022 6:57:12 AM End time: 6/14/2022 7:01:12 AM. Resolution: 00:05:00, Last Value Mode: False.

and also i see a lot of gaps in metrics when viewed in grafana.I tried decreasing the scrape interval to 1m but my jobs are getting down very fastly showing "context deadline exceed"
they are working better with 5m scrape which cant be changed.

How can i resolve this??

@mblaschke
Copy link
Member

the following error message is coming from the Azure API, not from the exporter itself. The Azure API is failing here so you might want to approach your Azure support.

Error: autorest/azure: Service returned an error. Status=529 Code="Unknown" Message="Unknown service error" Details=[{"cost":0,"interval":"PT5M","namespace":"Microsoft.DocumentDb/databaseAccounts","resourceregion":"westus","timespan":"2022-06-14T06:57:12Z/2022-06-14T07:02:12Z","value":[{"displayDescription":"Server Side Latency","errorCode":"Throttled","errorMessage":"Query was throttled with reason: ServerBusy. Requested Metric:CosmosDBCustomer|AzureMonitor|ServerSideLatency. Output Dimensions: collectionname,connectionmode,databasename,operationtype. Dimension Filters: . FirstOutputSamplingType: NullableAverage. Start time: 6/14/2022 6:57:12 AM End time: 6/14/2022 7:01:12 AM. Resolution: 00:05:00, Last Value Mode: False.

Something is Azure is broken, it's not the exporter. The exporter cannot fix anything if the Azure API is down or is not responding (the error message itself is also "procuded" from autorest/azure which is the azure-sdk-for-go).

For the gaps:

  • which version are you using?
  • where is the exporter running? eg. as container inside AKS?
  • how many cpu cores are assigned to the exporter and is the container throtteling?
  • are you using caching? was the cache set to 5s (which disables the cache if you set scraping interval to 5m)
  • how many resources are you requesting for one run?

If, for any reason, the Azure API is failing (see error messge) the exporter cannot do anything and it will produce gaps as the Azure API is not responding. Normally this is not happening often but also the Azure API can be down, for outages check https://status.azure.com/en-us/status

For caching:
If caching is enabled in the exporter (eg. via env var ENABLE_CACHING=1) you should set:

scrape_interval: 1m
scrape_timeout: 1m

and set cache to the same time as the interval:

cache: ["5m"]

then the exporter will be queried every minute but will deliver the same metric until the cache invalidates (5 minutes).

@manvitha9347
Copy link
Author

manvitha9347 commented Jun 15, 2022

Hi @mblaschke thankyou for consistent response

For this dimension: DatabaseName eq 'osco' or DatabaseName eq 'orderff' or DatabaseName eq 'auth' and CollectionName eq '' and ConnectionMode eq '' and OperationType eq '*'

I am seeing data for osco but very less data for orderff or auth.I am able to see 1-2 scrapes in 1 hour interval
What could be the reason for this? As i am able to see ample amount of data in azure

and can you help me to understand what is the difference between these key value pairs?
interval: ["PT5M"]
timespan: ["PT5M"]
Time interval – The period of time between the gathering of two metric values.
Time Span- aggregation span(like if it is 1m ,the aggregation will be one for every 1m and data is sent)

is the understanding correct?

@mblaschke
Copy link
Member

If you use dimensions you get the top N results from the API, see https://docs.microsoft.com/en-us/rest/api/monitor/metrics/list (azure-metrics-exporter is just an Azure Monitor Metrics API client).

If you don't specify metricTop: [10], then you get the top 10 results from the Azure Monitor API.

For interval and timespan also see https://docs.microsoft.com/en-us/rest/api/monitor/metrics/list

@mblaschke
Copy link
Member

closed due to inactivity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants