Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use in memory cache in parallel with db cache, convert keys to be date specific #8563

Merged
merged 4 commits into from
Mar 27, 2025

Conversation

jmaher
Copy link
Collaborator

@jmaher jmaher commented Mar 11, 2025

I want to figure out how to test this. Also some of this might be a little overkill.

@jmaher jmaher force-pushed the cachewrite branch 8 times, most recently from 355088b to fab8e01 Compare March 12, 2025 19:19
@jmaher jmaher self-assigned this Mar 12, 2025
@jmaher
Copy link
Collaborator Author

jmaher commented Mar 12, 2025

I have tests- ready for review.

@jmaher jmaher requested a review from Archaeopteryx March 12, 2025 19:24
@jmaher jmaher force-pushed the cachewrite branch 2 times, most recently from 82ab491 to ef049b5 Compare March 12, 2025 20:22
Copy link
Collaborator

@Archaeopteryx Archaeopteryx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to migrate the old line cache?

@jmaher
Copy link
Collaborator Author

jmaher commented Mar 18, 2025

I have updated the code- when finding no _keys, we create an empty list and move forward. Now when we get that we look for the old key, and then split existing data into keys; finally delete the old key format from the db.

this is simple and lightweight.

@jmaher
Copy link
Collaborator Author

jmaher commented Mar 18, 2025

hmm, somehow I got a bunch of additional files/changes mixed in with this; no idea why

@jmaher
Copy link
Collaborator Author

jmaher commented Mar 18, 2025

all ready for review!

Copy link
Collaborator

@Archaeopteryx Archaeopteryx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved but will push it to prototype instance before merging.

@jmaher jmaher force-pushed the cachewrite branch 2 times, most recently from 8e9fd77 to f0c52f4 Compare March 21, 2025 23:41
@jmaher
Copy link
Collaborator Author

jmaher commented Mar 21, 2025

with the latest commit to the stack a few things are solved:

  1. db flush is properly handled
  2. when annotating we actually decrement and allow for future failures to be new
  3. for treeherder RDBS schemas #2 there is a test
  4. an issue with cache key names existed when using get_cache(), this is now consistent.

Copy link
Collaborator

@Archaeopteryx Archaeopteryx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been pushed to prototype.

@Archaeopteryx
Copy link
Collaborator

This broke bug suggestions on prototype:

{
  "textPayload": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.10/site-packages/celery/app/trace.py\", line 453, in trace_task\n    R = retval = fun(*args, **kwargs)\n  File \"/usr/local/lib/python3.10/site-packages/newrelic/hooks/application_celery.py\", line 123, in wrapper\n    return wrapped(*args, **kwargs)\n  File \"/usr/local/lib/python3.10/site-packages/celery/app/trace.py\", line 736, in __protected_call__\n    return self.run(*args, **kwargs)\n  File \"/app/treeherder/workers/task.py\", line 65, in inner\n    raise task_func.retry(exc=e, countdown=timeout, throw=False)\n  File \"/usr/local/lib/python3.10/site-packages/celery/app/task.py\", line 736, in retry\n    raise_with_context(exc)\n  File \"/app/treeherder/workers/task.py\", line 41, in inner\n    return f(*args, **kwargs)\n  File \"/app/treeherder/log_parser/tasks.py\", line 76, in parse_logs\n    raise first_exception\n  File \"/app/treeherder/log_parser/tasks.py\", line 58, in parse_logs\n    parser(job_log)\n  File \"/app/treeherder/log_parser/tasks.py\", line 112, in post_log_artifacts\n    store_job_artifacts(serialized_artifacts)\n  File \"/app/treeherder/etl/artifact.py\", line 95, in store_job_artifacts\n    store_text_log_summary_artifact(job, artifact)\n  File \"/app/treeherder/etl/artifact.py\", line 38, in store_text_log_summary_artifact\n    bugs = error_summary.get_error_summary(job, queryset=log_errors)\n  File \"/app/treeherder/model/error_summary.py\", line 161, in get_error_summary\n    summary, line_cache = bug_suggestions_line(\n  File \"/app/treeherder/model/error_summary.py\", line 220, in bug_suggestions_line\n    counter += line_cache[day].get(cache_clean_line, 0)\nAttributeError: 'NoneType' object has no attribute 'get'",
  "insertId": "w8081dikttev746m",
  "resource": {
    "type": "k8s_container",
    "labels": {
      "project_id": "moz-fx-treeherde-nonprod-34ec",
      "namespace_name": "prototype-treeherder",
      "location": "us-west1",
      "cluster_name": "treeherder-nonprod-v1",
      "container_name": "treeherder",
      "pod_name": "treeherder-log-parser-68dc55d96f-9bj2p"
    }
  },
  "timestamp": "2025-03-25T10:00:34.081113714Z",
  "severity": "ERROR",
  "labels": {
    "k8s-pod/fullname": "treeherder-log-parser",
    "k8s-pod/jenkins-build-id": "319",
    "k8s-pod/app_kubernetes_io/part-of": "treeherder",
    "k8s-pod/app_kubernetes_io/version": "1.0.0",
    "logging.gke.io/top_level_controller_type": "Deployment",
    "compute.googleapis.com/resource_name": "gke-treeherder-nonpro-default-pool-v4-88dddb70-7gzf",
    "k8s-pod/app_kubernetes_io/managed-by": "jenkins",
    "k8s-pod/app_kubernetes_io/instance": "prototype",
    "k8s-pod/pod-template-hash": "68dc55d96f",
    "k8s-pod/app_kubernetes_io/component": "app",
    "k8s-pod/app_kubernetes_io/name": "treeherder-log-parser",
    "logging.gke.io/top_level_controller_name": "treeherder-log-parser"
  },
  "logName": "projects/moz-fx-treeherde-nonprod-34ec/logs/stderr",
  "receiveTimestamp": "2025-03-25T10:00:34.898659602Z",
  "errorGroups": [
    {
      "id": "CPr0--i2kYWH8QE"
    }
  ]
}

@jmaher
Copy link
Collaborator Author

jmaher commented Mar 25, 2025

my most recent commit does:

  • fixes the case where a migration created mc_error_lines_mc_error_lines_<date>
  • fixed issue with line_cache[date] == None on prototype (same as ^)
  • added test to reproduce failure on prototype and ensure this "migrated" state is reliable

Copy link
Collaborator

@Archaeopteryx Archaeopteryx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed to prototype one minute ago.

@jmaher
Copy link
Collaborator Author

jmaher commented Mar 25, 2025

I still see this error AttributeError: 'NoneType' object has no attribute 'get', is it possible the code hasn't been deployed yet? Maybe there are tests required or another service to restart?

@Archaeopteryx
Copy link
Collaborator

Jenkins says the top of this patch stack got deployed shortly before 20:12 UTC. There have been issues with older pushes getting deployed in the past but this was fixed a few weeks ago.

The issue was some keys had duplicate root prefixes. I have deleted all caching data on prototype and the cache gets populated again. The spinner in the failure lines are persists because of an internal issue with bug-job-map from a migration out of sync with production. Lets keep it as it is to verify the cache code works correct, afterwards the prototype database can be reset to resolve the issue.

@jmaher jmaher merged commit 3bb3bb7 into mozilla:master Mar 27, 2025
4 checks passed
@jmaher jmaher deleted the cachewrite branch March 27, 2025 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants