-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception occurred in retry method that was not classified as transient #536
Comments
I found a few problems with this issue:
|
Hi @lamstutz does this occur on every function invocation? Would you mind pasting more logs? Also, does this happen when updating other collections? |
Having same error
No. [following statement is might be wrong] this error happens only after idle function is invoked after longer period of time. |
From the stack trace and the fact that it happens on idle function it looks like a grpc related error that is happening on a cold start. Does this happen on other collection updates? Are you properly handling promises in this function? Because the stack trace mentions retries, it may be happening because the value you're trying to write into Firestore is somehow problematic - are you able to replicate this with another collection with a very simple interface - something like updating one field that is string? |
The error occurs regularly but not systematically. Only with "update" function (whether with a simple object or an increment). Only with "triggers", not with a http function. Promises are well used and catched. ex:
Or
|
Hmm, it seems that something is going wrong with that particular Firestore update call - @schmidt-sebastian @hiranya911 I wonder if you have ideas on what could be going wrong here? |
I am having the exact same problem |
I have to add that in my experience the problem is also present with 'http' calls and not only with 'triggers'. It seems it is (extra) present when an instance/function went to sleep and has indeed a 'cold start'. |
I'm having the same problem. I'm receiving this error.
Mine seems to happen randomly in onCreate and onWrite functions. These functions with this error are triggered daily and the error has only occurred once, I've had them fire multiple times after error occurred and error has yet to return. These errors start appearing once I updated firebase functions from version 2.3.0 to 3.0.1 and firebase admin from 7.0.0 to 8.0.0 |
Just thought I'd give some input here. My first instance of this error was on the 15th of July and I now get it regularly (but not consistently) across all our functions. We have a logging system implemented on our functions that essentially tells us when a function is cold started or not, (we ping them to keep warm every minute). Prior to the 15th of July (from 2017 to now! so I have a lot of logs on this) (ie when these errors started happening to me) cloud functions would delete themselves at approx. 3-5 minute intervals from first creation, making the next invocation a cold start. Since the 15th of July this has increased substantially to greater than 5 hours(!!!) and we have seen today a function stay warm for 28 hours (causing a lot of issues to our caching). My guess would be that a previously short running connection is now having to cope with these much much much longer alive periods. Now unfortunately we do not ping over the weekends, for cost reduction reasons, but on the 12th (and for the last 12+ months) it was cold starting every 3-5 minutes, and on the 15th it now doesn't cold start for 5+ hours. If this is a new 'feature' of cloud functions it is amazing btw! Almost makes them never have to hit cold starts if the keep warm invocations are done right. |
I am getting it with Pub/Sub functions. I also think it is related to cold starts |
Also started getting these recently :( |
Sorry for all the trouble this is causing! While we are currently looking into this, we don't have a strong lead as to what is going on. Please do bear with us. |
@damienix, @bottleneck-admin, @jaycosaur, @spoxies, @lamstutz: Would you mind sending your project IDs and an approximate time window for these errors (including your timezone) to [email protected]? Thanks |
This also affects Cloud Run. For googlers on this thread, 138705198 is the internal issue |
How do I private message you @schmidt-sebastian |
Thanks for sending us your project info. Our backend team will look into the errors. While they do, Thanks! @bottleneck-admin If you need to send project-specific or confidential data to us for issue triage, the recommended way is to open a Support ticket via https://support.google.com/ |
Mine are in europe-west2 and project is -> Google Cloud Platform (GCP) resource location |
I came across this issue doing a google search. I am seeing the same behavior surface from GRPC calls made by We see intermittent failures in k8s pods when a large number of files are being streamed to If those logs might be useful let me know and I can provide a project id.
Note we receive a code 7 instead of 13 like @lamstutz |
Our project is us-central1 for functions and project. |
Our backend team believes that they know what the root cause is, but it might take quite a while for the issue to be fixed in all production environments. |
Any progress on that? This is a really severe issue ;/ As per docs https://firebase.google.com/docs/functions/retries
Which is no longer true. On my backend, this leads to more and more inconsistent data, as I'm getting random errors from triggered functions that would normally run without any problems :( Has anyone tried to enable retries of a function to defend from this error, will it work for system-level errors? |
With errors like these, your request will likely succeed if you retry. Our client only retries in a couple of cases where we know it is safe (we can always retry a get() request, but we cannot retry writes as we don't know whether there are any side effects). If you know that you can always retry (based on your data model), then I would recommend that as a solution. You could also wrap your writes in a transaction, which the client retries. |
We got this error today Update : |
Everything was woking fine with my functions with the emulator for several hours and suddenly I got that same error:
Using node 14 without any problem. Tried using 10 and nothing changed. I've been using the Functions emulator with Firestore triggers. The data will be written regardless, but no logs are shown (and also the error we're all having is thrown). EDIT: I found the origin of the problem for my particular case: a trailing functions.firestore.document('users/{userId}/') I figured it out while trying to upload the function when I gave up and was going to test it live. I ended up getting to this thread that led me to the solution: https://stackoverflow.com/questions/46818082/error-http-error-400-the-request-has-errors-firebase-firestore-cloud-function |
This error was driving me nuts, but it turned out to be some form of permission thing for me. After updating the service account to just have 'Project -> Editor' access the firestore writes started working again. Obviously not ideal to give the service account such wide access, but it's a start! |
Just adding my own two pence as I've landed on this thread too many times now not to ... I can concur with @manwithsteelnerves in that adding the However, on another of our instances where the Service account does not have the Thanks to everyone nonetheless for all your hard work to resolve this issue, and for all the comments that have helped others including myself previously! Please let me know if you would like further information. 🙏 |
Got it today while handling files that were in total over 3MB. No Triggers are used. Error: 14 UNAVAILABLE: No connection established at Object.callErrorFromStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call.js:30:26) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:175:52) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:341:141) at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:304:181) at Http2CallStream.outputStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:116:74) at Http2CallStream.maybeOutputStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:155:22) at Http2CallStream.endCall (/workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:141:18) at Http2CallStream.cancelWithStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:457:14) at ChannelImplementation.tryPick (/workspace/node_modules/@grpc/grpc-js/build/src/channel.js:237:32) at Object.updateState (/workspace/node_modules/@grpc/grpc-js/build/src/channel.js:106:26) Node 10, |
I cannot tell you how many days I burned on this issue. In my case, I left a curly bracket out of a document address.
The error was occurring in a totally different function. |
Is anyone else still getting this error? |
I did, yesterday. But I only get these once every 2 months |
Thanks for answering. Yeah it turned out I got it because I had a syntax error I wasn't aware of. |
I'm getting the same issue when I have to write a lot of documents in many async calls (500+ async writes). |
Hi there, it seems like the original bug has been fixed and we're getting more reports of possibly more than one bug. Collecting them in the wrong old issue can hurt our ability to triage and get your issues taken care of, so I'm going to close it and let you open new bugs that can be resolved with more specific conversations. The original bug was due to a networking issue that can happen when a server is idle: the connection gets reset and the next request may fail. Originally this was a problem with the gRPC library because it wasn't handling a clean connection reset. This problem also happens more generally when the FIN packet isn't sent across the internet due to a number of reasons involving performance and security. If the library isn't already aware the connection is invalid (e.g. the FIN packet was dropped or the library isn't handling FIN correctly) the next request will fail. Thanks to the Two Generals Problem, it's impossible to know if a request failed before or after the server got your request. The library can retry if it knows the request is idempotent (e.g. GET) but it can't necessarily retry if the request isn't (e.g. POST). Fortunately, you might know that your code is idempotent. In fact, our guidance is that all cloud functions should be idempotent because you may get more than one invocation. So a retry at the application level should be safe. Normally you can retry with a simple I can guarantee you that the event type you're listening to has no impact on this issue. It's happening because your function was idle this whole time and we didn't garbage collect the container so that you could avoid a cold start. Crashes popping up in the Firestore/Datastore library should probably be filed against those SDKs (nodejs-firestore and nodejs-datastore). If you get an obvious networking error, you could also consider filing a bug against the gRPC library instead. You can of course file a bug against this repo as a starting point, but you just might have a slower response as we find the right people and move your bug to the right location. You're our customers and we care about your experience; this repo just isn't where the exception lies so it's not where the fix will come. |
Wow! Thanks. I double I would have found this if you hadn't mentioned it!!! In my case, I was using a dollar sign in my path that caused this error... Seriously.
|
I am still getting this same issue, can someone help? |
@radhikadeo Just to verify, you've checked all your paths? |
I ran into this problem when working with Firestore Point-in-time recovery (PITR) (which is an awesome beta feature! 🎉). For me, the solution was to specify a timestamp in the transaction that actually resolves to a whole hour exactly, i.e.
✅ works, but
❌ fails. This is not mentioned in the docs, will leave a comment there. ⛑️ |
The issue arises randomly for me. I'm seeing it once/twice a month (out of ~ 10k in a month). |
getting error when trying to delete topic and subscription with delete method anyone can help? |
@vikasdduc , can you please let us know the module and node versions? |
@taeold Could we please re-open this issue? It appears to still be occurring for users. It happened to me yesterday on node 18 and |
same here |
Got this today with
|
Still happening to me as well cc @schmidt-sebastian |
We are getting this error as well. It was during a WriteBatch.commit when we were updating one document. We have been using @google-cloud/[email protected]. EDIT
|
@dschnare did you try to use the |
Personally, I have had this problem several times, and each time it has something to do with my code. The last one was because I tried to connect to a database that didn't exist.
Hope this helps some of you |
Related issues
#522
[REQUIRED] Version info
node: 8
firebase-functions: 3.0.1
firebase-tools: 7.0.1
firebase-admin: 8.2.0
Steps to reproduce
Update method throw this error :
Were you able to successfully deploy your functions?
the deployment displays no errors
The text was updated successfully, but these errors were encountered: