Exception occurred in retry method that was not classified as transient #536

lamstutz · 2019-07-23T13:49:09Z

Related issues

[REQUIRED] Version info

  "dependencies": {
    "@google-cloud/firestore": "^2.2.1",
    "firebase-admin": "^8.2.0",
    "firebase-functions": "^3.0.1",
  },
  "engines": {
    "node": "8"
  }

node: 8

firebase-functions: 3.0.1

firebase-tools: 7.0.1

firebase-admin: 8.2.0

Steps to reproduce

import * as admin from 'firebase-admin';

admin.initializeApp();
const db = admin.firestore();
db.settings({ timestampsInSnapshots: true });


const users = db.collection('users');

users.doc('myUserId').update({ fieldToUpdate: 'newValue' })

Update method throw this error :

{ Error
at Http2CallStream.call.on (/srv/node_modules/@grpc/grpc-js/build/src/client.js:101:45)
at emitOne (events.js:121:20)
at Http2CallStream.emit (events.js:211:7)
at process.nextTick (/srv/node_modules/@grpc/grpc-js/build/src/call-stream.js:71:22)
at _combinedTickCallback (internal/process/next_tick.js:132:7)
at process._tickDomainCallback (internal/process/next_tick.js:219:9)
code: 13,
details: '',
metadata: Metadata { options: undefined, internalRepr: Map {} },
note: 'Exception occurred in retry method that was not classified as transient' }

Were you able to successfully deploy your functions?

the deployment displays no errors

The text was updated successfully, but these errors were encountered:

google-oss-bot · 2019-07-23T13:49:10Z

I found a few problems with this issue:

I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.
This issue does not seem to follow the issue template. Make sure you provide all the required information.

thechenky · 2019-07-24T22:14:02Z

Hi @lamstutz does this occur on every function invocation? Would you mind pasting more logs? Also, does this happen when updating other collections?

audkar · 2019-07-25T07:54:27Z

Having same error

does this occur on every function invocation?

No. [following statement is might be wrong] this error happens only after idle function is invoked after longer period of time.

thechenky · 2019-07-25T19:56:42Z

From the stack trace and the fact that it happens on idle function it looks like a grpc related error that is happening on a cold start. Does this happen on other collection updates? Are you properly handling promises in this function? Because the stack trace mentions retries, it may be happening because the value you're trying to write into Firestore is somehow problematic - are you able to replicate this with another collection with a very simple interface - something like updating one field that is string?

lamstutz · 2019-07-26T08:25:19Z

The error occurs regularly but not systematically. Only with "update" function (whether with a simple object or an increment). Only with "triggers", not with a http function. Promises are well used and catched.

ex:

const categoryIds = ['a','b'];
  return users.doc(userId).update({ categoryIds });

Or

return statistics.doc('users').update({ users: gcFieldValue.increment(-1) });

thechenky · 2019-07-26T16:37:20Z

Hmm, it seems that something is going wrong with that particular Firestore update call - @schmidt-sebastian @hiranya911 I wonder if you have ideas on what could be going wrong here?

McGroover-Bottleneck · 2019-07-29T13:37:32Z

I am having the exact same problem

spoxies · 2019-07-29T14:23:01Z

I have to add that in my experience the problem is also present with 'http' calls and not only with 'triggers'. It seems it is (extra) present when an instance/function went to sleep and has indeed a 'cold start'.

stshelton · 2019-07-29T18:56:26Z

I'm having the same problem. I'm receiving this error.

Error
at Http2CallStream.call.on (/srv/node_modules/@grpc/grpc-js/build/src/client.js:96:45)
at emitOne (events.js:121:20)
at Http2CallStream.emit (events.js:211:7)
at process.nextTick (/srv/node_modules/@grpc/grpc-js/build/src/call-stream.js:71:22)
at _combinedTickCallback (internal/process/next_tick.js:132:7)
at process._tickDomainCallback (internal/process/next_tick.js:219:9

Mine seems to happen randomly in onCreate and onWrite functions. These functions with this error are triggered daily and the error has only occurred once, I've had them fire multiple times after error occurred and error has yet to return. These errors start appearing once I updated firebase functions from version 2.3.0 to 3.0.1 and firebase admin from 7.0.0 to 8.0.0

jaycosaur · 2019-07-30T02:06:17Z

Just thought I'd give some input here. My first instance of this error was on the 15th of July and I now get it regularly (but not consistently) across all our functions.

We have a logging system implemented on our functions that essentially tells us when a function is cold started or not, (we ping them to keep warm every minute). Prior to the 15th of July (from 2017 to now! so I have a lot of logs on this) (ie when these errors started happening to me) cloud functions would delete themselves at approx. 3-5 minute intervals from first creation, making the next invocation a cold start. Since the 15th of July this has increased substantially to greater than 5 hours(!!!) and we have seen today a function stay warm for 28 hours (causing a lot of issues to our caching). My guess would be that a previously short running connection is now having to cope with these much much much longer alive periods.

Now unfortunately we do not ping over the weekends, for cost reduction reasons, but on the 12th (and for the last 12+ months) it was cold starting every 3-5 minutes, and on the 15th it now doesn't cold start for 5+ hours. If this is a new 'feature' of cloud functions it is amazing btw! Almost makes them never have to hit cold starts if the keep warm invocations are done right.

McGroover-Bottleneck · 2019-07-30T11:33:48Z

I am getting it with Pub/Sub functions. I also think it is related to cold starts

damienix · 2019-07-30T11:46:55Z

Also started getting these recently :(

schmidt-sebastian · 2019-07-30T21:00:19Z

Sorry for all the trouble this is causing! While we are currently looking into this, we don't have a strong lead as to what is going on. Please do bear with us.

schmidt-sebastian · 2019-07-30T21:27:06Z

@damienix, @bottleneck-admin, @jaycosaur, @spoxies, @lamstutz:

Would you mind sending your project IDs and an approximate time window for these errors (including your timezone) to [email protected]? Thanks

steren · 2019-07-31T15:44:57Z

This also affects Cloud Run.

For googlers on this thread, 138705198 is the internal issue

McGroover-Bottleneck · 2019-07-31T15:47:37Z

How do I private message you @schmidt-sebastian

schmidt-sebastian · 2019-07-31T16:18:00Z

Thanks for sending us your project info. Our backend team will look into the errors. While they do,
can you quickly confirm where your GCF instances are all running in Europe and where your Firestore project is located.

Thanks!

@bottleneck-admin If you need to send project-specific or confidential data to us for issue triage, the recommended way is to open a Support ticket via https://support.google.com/

McGroover-Bottleneck · 2019-07-31T16:31:26Z

Mine are in europe-west2 and project is -> Google Cloud Platform (GCP) resource location
europe-west2

ltomes · 2019-07-31T21:14:06Z

@steren @schmidt-sebastian

I came across this issue doing a google search.

I am seeing the same behavior surface from GRPC calls made by @grpc/grpc-js which is a dependency of @google-cloud/storage in my case.

We see intermittent failures in k8s pods when a large number of files are being streamed to
storage objects via @google-cloud/storage.

If those logs might be useful let me know and I can provide a project id.

{ Error: The caller does not have permission at Http2CallStream.call.on (/home/app/node_modules/@grpc/grpc-js/build/src/client.js:101:45) at Http2CallStream.emit (events.js:194:15) at process.nextTick (/home/app/node_modules/@grpc/grpc-js/build/src/call-stream.js:71:22) at process._tickCallback (internal/process/next_tick.js:61:11)
code 7
details: 'The caller does not have permission',
metadata: Metadata { options: undefined, internalRepr: Map {} },
note:
'Exception occurred in retry method that was not classified as transient' }

Note we receive a code 7 instead of 13 like @lamstutz

jaycosaur · 2019-07-31T23:08:28Z

Our project is us-central1 for functions and project.

schmidt-sebastian · 2019-08-01T21:17:22Z

Our backend team believes that they know what the root cause is, but it might take quite a while for the issue to be fixed in all production environments.

damienix · 2019-08-07T09:28:31Z

Any progress on that? This is a really severe issue ;/

As per docs https://firebase.google.com/docs/functions/retries

Cloud Functions guarantees at-least-once execution of a background function for each event emitted by an event source.

Which is no longer true. On my backend, this leads to more and more inconsistent data, as I'm getting random errors from triggered functions that would normally run without any problems :(

Has anyone tried to enable retries of a function to defend from this error, will it work for system-level errors?

schmidt-sebastian · 2019-08-07T15:52:03Z

With errors like these, your request will likely succeed if you retry. Our client only retries in a couple of cases where we know it is safe (we can always retry a get() request, but we cannot retry writes as we don't know whether there are any side effects). If you know that you can always retry (based on your data model), then I would recommend that as a solution.

You could also wrap your writes in a transaction, which the client retries.

manwithsteelnerves · 2020-07-13T16:08:50Z

We got this error today
"Exception occurred in retry method that was not classified as transient"

Update :
This error seems to happen with cloud pubsub and @grpc/grpc-js libraries.
Fix : Unfortunately, I need to add Pubsub Editor role to my firebase admin sdk and then it started working again. I'm not sure why the role is suddenly required as it was not the case earlier. Is it like they fixed the earlier issue leading to this strict check or this being a new issue? @thechenky @mdietz94

damienromito · 2020-07-27T14:29:03Z

I have the same probleme on differents endpoints.
Here are the logs if it helps :

funnierinspanish · 2020-08-03T20:07:27Z

Everything was woking fine with my functions with the emulator for several hours and suddenly I got that same error:

{
  code: 2,
  details: '',
  metadata: Metadata {
    internalRepr: Map(1) { 'content-type' => [Array] },
    options: {}
  },
  note: 'Exception occurred in retry method that was not classified as transient'
}

Using node 14 without any problem. Tried using 10 and nothing changed.

I've been using the Functions emulator with Firestore triggers. The data will be written regardless, but no logs are shown (and also the error we're all having is thrown).

EDIT:

I found the origin of the problem for my particular case: a trailing / :

functions.firestore.document('users/{userId}/')

I figured it out while trying to upload the function when I gave up and was going to test it live. I ended up getting to this thread that led me to the solution: https://stackoverflow.com/questions/46818082/error-http-error-400-the-request-has-errors-firebase-firestore-cloud-function

kythin · 2020-08-09T03:31:09Z

This error was driving me nuts, but it turned out to be some form of permission thing for me. After updating the service account to just have 'Project -> Editor' access the firestore writes started working again.

Obviously not ideal to give the service account such wide access, but it's a start!

jonrandahl · 2020-08-14T19:17:22Z

Just adding my own two pence as I've landed on this thread too many times now not to ...

I can concur with @manwithsteelnerves in that adding the Pub/Sub Editor role to the service account has somehow stopped the DEADLINE EXCEEDED errors occurring in one of our cloud functions on one of our instances.

However, on another of our instances where the Service account does not have the Pub/Sub Editor role, the same cloud function is working fine via both the local emulator and deployed to that same non-local instance which the emulator was connected to, and the cloud function does not error on that instance.

Thanks to everyone nonetheless for all your hard work to resolve this issue, and for all the comments that have helped others including myself previously!

Please let me know if you would like further information. 🙏

fabiank0 · 2020-11-06T13:03:56Z

Got it today while handling files that were in total over 3MB. No Triggers are used.

Error: 14 UNAVAILABLE: No connection established at Object.callErrorFromStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call.js:30:26) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:175:52) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:341:141) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:304:181) at Http2CallStream.outputStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:116:74) at Http2CallStream.maybeOutputStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:155:22) at Http2CallStream.endCall (/workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:141:18) at Http2CallStream.cancelWithStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:457:14) at ChannelImplementation.tryPick (/workspace/node_modules/@grpc/grpc-js/build/src/channel.js:237:32) at Object.updateState (/workspace/node_modules/@grpc/grpc-js/build/src/channel.js:106:26)

Node 10,
"@google-cloud/storage": "^4.3.1",
"firebase-admin": "^8.9.2",
"firebase-functions": "^3.8.0",

Pixelwelder · 2021-03-20T13:52:00Z

I cannot tell you how many days I burned on this issue. In my case, I left a curly bracket out of a document address.

.document('a/{docA}/chat/{docB') // missing final bracket

The error was occurring in a totally different function.

MattGoldwater · 2021-04-01T20:52:24Z

Is anyone else still getting this error?

Dric0 · 2021-04-01T21:35:47Z

Is anyone else still getting this error?

I did, yesterday. But I only get these once every 2 months

MattGoldwater · 2021-04-02T02:08:47Z

Is anyone else still getting this error?

I did, yesterday. But I only get these once every 2 months

Thanks for answering. Yeah it turned out I got it because I had a syntax error I wasn't aware of.

calclavia · 2021-05-07T15:23:16Z

I'm getting the same issue when I have to write a lot of documents in many async calls (500+ async writes).

inlined · 2021-05-28T03:59:24Z

Hi there, it seems like the original bug has been fixed and we're getting more reports of possibly more than one bug. Collecting them in the wrong old issue can hurt our ability to triage and get your issues taken care of, so I'm going to close it and let you open new bugs that can be resolved with more specific conversations.

The original bug was due to a networking issue that can happen when a server is idle: the connection gets reset and the next request may fail. Originally this was a problem with the gRPC library because it wasn't handling a clean connection reset. This problem also happens more generally when the FIN packet isn't sent across the internet due to a number of reasons involving performance and security. If the library isn't already aware the connection is invalid (e.g. the FIN packet was dropped or the library isn't handling FIN correctly) the next request will fail. Thanks to the Two Generals Problem, it's impossible to know if a request failed before or after the server got your request. The library can retry if it knows the request is idempotent (e.g. GET) but it can't necessarily retry if the request isn't (e.g. POST). Fortunately, you might know that your code is idempotent. In fact, our guidance is that all cloud functions should be idempotent because you may get more than one invocation. So a retry at the application level should be safe.

Normally you can retry with a simple try/catch. Diving through some of this bug and the internal ones as well, it looks like you couldn't always catch an error in the gRPC library. If that's (still) the case, it's an issue and someone should file a new bug against the gRPC repo or possibly gax-nodejs that exceptions cannot be caught. A foolproof way to handle exceptions anywhere in your codebase is to turn on retries in your functions. This adds a risk that a crash loop will cause indefinite executions, so you'll need to find some way to drop events-of-death.

I can guarantee you that the event type you're listening to has no impact on this issue. It's happening because your function was idle this whole time and we didn't garbage collect the container so that you could avoid a cold start. Crashes popping up in the Firestore/Datastore library should probably be filed against those SDKs (nodejs-firestore and nodejs-datastore). If you get an obvious networking error, you could also consider filing a bug against the gRPC library instead. You can of course file a bug against this repo as a starting point, but you just might have a slower response as we find the right people and move your bug to the right location. You're our customers and we care about your experience; this repo just isn't where the exception lies so it's not where the fix will come.

MNorgren · 2021-12-10T16:42:25Z

I cannot tell you how many days I burned on this issue. In my case, I left a curly bracket out of a document address.

.document('a/{docA}/chat/{docB') // missing final bracket

The error was occurring in a totally different function.

Wow! Thanks. I double I would have found this if you hadn't mentioned it!!! In my case, I was using a dollar sign in my path that caused this error... Seriously.

.document('a/{docA}/chat/${docB'})

radhikadeo · 2022-02-12T17:56:21Z

I am still getting this same issue, can someone help?

Pixelwelder · 2022-02-14T14:34:08Z

@radhikadeo Just to verify, you've checked all your paths?

moritzmorgenroth · 2023-10-12T07:21:59Z

I ran into this problem when working with Firestore Point-in-time recovery (PITR) (which is an awesome beta feature! 🎉).

For me, the solution was to specify a timestamp in the transaction that actually resolves to a whole hour exactly, i.e.

const q = firestore.collectionGroup("trips");
  const querySnapshot = await firestore.runTransaction(
    (t) => t.get(q),
    { readOnly: true, readTime: new Timestamp(1696827600, 0) }
  );

✅ works, but

const q = firestore.collectionGroup("trips");
  const querySnapshot = await firestore.runTransaction(
    (t) => t.get(q),
    { readOnly: true, readTime: new Timestamp(1696827601, 0) }
  );

❌ fails.

This is not mentioned in the docs, will leave a comment there. ⛑️

hkchakladar · 2023-12-26T16:17:56Z

The issue arises randomly for me. I'm seeing it once/twice a month (out of ~ 10k in a month).

vikasdduc · 2024-02-10T06:01:00Z

getting error when trying to delete topic and subscription with delete method anyone can help?
code: 7,
details: 'User not authorized to perform this action.',
metadata: Metadata { internalRepr: Map(0) {}, options: {} },
note: 'Exception occurred in retry method that was not classified as transient'

sandhya1349 · 2024-02-13T11:23:45Z

@vikasdduc , can you please let us know the module and node versions?

trevor-rex · 2024-03-29T15:50:51Z

@taeold Could we please re-open this issue? It appears to still be occurring for users. It happened to me yesterday on node 18 and [email protected]

olboghgc · 2024-04-04T14:57:43Z

@taeold Could we please re-open this issue? It appears to still be occurring for users. It happened to me yesterday on node 18 and [email protected]

same here

nojaf · 2024-06-01T19:17:37Z

Got this today with node v20.11.0 and firebase-functions 5.0.1

at async runHTTPS (/home/nojaf/.bun/install/global/node_modules/firebase-tools/lib/emulator/functionsEmulatorRuntime.js:531:5)\n    at async /home/nojaf/.bun/install/global/node_modules/firebase-tools/lib/emulator/functionsEmulatorRuntime.js:694:21 {\n  code: 2,\n  details: '',\n  metadata: Metadata {\n    internalRepr: Map(1) { 'content-type' => [Array] },\n    options: {}\n  },\n  note: 'Exception occurred in retry method that was not classified as transient'\n}"}

haayhappen · 2024-06-21T10:08:11Z

Still happening to me as well cc @schmidt-sebastian

dschnare · 2024-06-27T16:14:23Z

We are getting this error as well. It was during a WriteBatch.commit when we were updating one document. We have been using @google-cloud/[email protected].

EDIT
We did receive this error beforehand. We have exponential backoff as a retry strategy when errors like this are caught. So we proceeded to retry our update call, but then the "Exception occurred in retry method that was not classified as transient" error occurred.

firestoreRetryable retrying due to 13 INTERNAL Error: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error)

spock123 · 2024-08-15T19:13:36Z

@dschnare did you try to use the preferRest setting in the configuration?

SoufianeBenbah · 2024-12-18T10:48:54Z

Personally, I have had this problem several times, and each time it has something to do with my code.
Either I forgot a { somewhere or something like that.

The last one was because I tried to connect to a database that didn't exist.

const db = admin.firestore();
db.settings({ databaseId:"non-existante-db", ignoreUndefinedProperties: true })

Hope this helps some of you

google-oss-bot added the needs-triage label Jul 23, 2019

thechenky added api: firestore Needs: Author Feedback Issues awaiting author feedback and removed needs-triage labels Jul 24, 2019

thechenky self-assigned this Jul 24, 2019

google-oss-bot added Needs: Attention and removed Needs: Author Feedback Issues awaiting author feedback labels Jul 26, 2019

thechenky removed their assignment Feb 22, 2021

inlined closed this as completed May 28, 2021

evengul mentioned this issue Mar 10, 2022

Improve error message when invalid Firestore document path is specified in Firestore trigger definitions firebase/firebase-tools#4284

Open

dandv mentioned this issue Oct 21, 2024

Make the "5 NOT FOUND" / "Exception occurred in retry method that was not classified as transient" error message specific and useful firebase/firebase-admin-node#2732

Open

Exception occurred in retry method that was not classified as transient #536

Exception occurred in retry method that was not classified as transient #536

Comments

lamstutz commented Jul 23, 2019

Related issues

[REQUIRED] Version info

Steps to reproduce

Were you able to successfully deploy your functions?

google-oss-bot commented Jul 23, 2019

thechenky commented Jul 24, 2019

audkar commented Jul 25, 2019

thechenky commented Jul 25, 2019

lamstutz commented Jul 26, 2019

thechenky commented Jul 26, 2019

McGroover-Bottleneck commented Jul 29, 2019

spoxies commented Jul 29, 2019 • edited Loading

stshelton commented Jul 29, 2019

jaycosaur commented Jul 30, 2019 • edited Loading

McGroover-Bottleneck commented Jul 30, 2019

damienix commented Jul 30, 2019

schmidt-sebastian commented Jul 30, 2019

schmidt-sebastian commented Jul 30, 2019

steren commented Jul 31, 2019

McGroover-Bottleneck commented Jul 31, 2019

schmidt-sebastian commented Jul 31, 2019

McGroover-Bottleneck commented Jul 31, 2019

ltomes commented Jul 31, 2019 • edited Loading

jaycosaur commented Jul 31, 2019

schmidt-sebastian commented Aug 1, 2019

damienix commented Aug 7, 2019

schmidt-sebastian commented Aug 7, 2019

manwithsteelnerves commented Jul 13, 2020 • edited Loading

damienromito commented Jul 27, 2020 • edited Loading

funnierinspanish commented Aug 3, 2020 • edited Loading

kythin commented Aug 9, 2020

jonrandahl commented Aug 14, 2020 • edited Loading

fabiank0 commented Nov 6, 2020

Pixelwelder commented Mar 20, 2021

MattGoldwater commented Apr 1, 2021

Dric0 commented Apr 1, 2021

MattGoldwater commented Apr 2, 2021

calclavia commented May 7, 2021

inlined commented May 28, 2021 • edited Loading

MNorgren commented Dec 10, 2021

radhikadeo commented Feb 12, 2022

Pixelwelder commented Feb 14, 2022

moritzmorgenroth commented Oct 12, 2023 • edited Loading

hkchakladar commented Dec 26, 2023

vikasdduc commented Feb 10, 2024

sandhya1349 commented Feb 13, 2024

trevor-rex commented Mar 29, 2024

olboghgc commented Apr 4, 2024 • edited Loading

nojaf commented Jun 1, 2024

haayhappen commented Jun 21, 2024

dschnare commented Jun 27, 2024 • edited Loading

spock123 commented Aug 15, 2024

SoufianeBenbah commented Dec 18, 2024

spoxies commented Jul 29, 2019 •

edited

Loading

jaycosaur commented Jul 30, 2019 •

edited

Loading

ltomes commented Jul 31, 2019 •

edited

Loading

manwithsteelnerves commented Jul 13, 2020 •

edited

Loading

damienromito commented Jul 27, 2020 •

edited

Loading

funnierinspanish commented Aug 3, 2020 •

edited

Loading

jonrandahl commented Aug 14, 2020 •

edited

Loading

inlined commented May 28, 2021 •

edited

Loading

moritzmorgenroth commented Oct 12, 2023 •

edited

Loading

olboghgc commented Apr 4, 2024 •

edited

Loading

dschnare commented Jun 27, 2024 •

edited

Loading