Pushpin worker crashes often with assertion error

arunvelsriram · April 27, 2024, 5:29am

Am running pushpin in Kubernetes. The pushpin pod is crashing often with below error:

[INFO] 2024-04-26 13:12:30.487 [handler] control: POST /publish/ code=200 10 items=1
[INFO] 2024-04-26 13:12:30.487 [handler] publish channel=2d7adb71-de95-4824-9b38-516732b8805b receivers=1
thread 'server-worker-0' panicked at 'assertion failed: `(left == right)`
  left: `601`,
 right: `16985`', src/websocket.rs:1187:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[INFO] 2024-04-26 13:12:30.545 [handler] control: POST /publish/ code=200 10 items=1
[INFO] 2024-04-26 13:12:30.545 [handler] publish channel=52a09fd1-f880-483e-9805-b477a6c824d1 receivers=0
[ERR] 2024-04-26 13:12:30.618 condure: Exited unexpectedly
[INFO] 2024-04-26 13:12:30.618 [zurl] stopping...
[INFO] 2024-04-26 13:12:30.618 [handler] stopping...
[INFO] 2024-04-26 13:12:30.619 [zurl] stopped
[INFO] 2024-04-26 13:12:30.618 [proxy] stopping...
[INFO] 2024-04-26 13:12:30.621 [handler] stopped
[INFO] 2024-04-26 13:12:31.126 [proxy] stopped
[INFO] 2024-04-26 13:12:31.128 stopped

I analysed the memory and cpu consumption but they look normal. And there is no limits configured for the pushpin pod. From the assertion error, am not able to figure out what exactly its asserting. Could someone help?

kevin.fleming · April 28, 2024, 2:27pm

Have you tried setting the environment variable listed in the error message? The backtrace could be quite useful.

arunvelsriram · May 2, 2024, 5:13am

Yes. However am not sure how to interpret it:

thread 'server-worker-1' panicked at 'assertion failed: `(left == right)`
  left: `1065`,
 right: `17449`', src/websocket.rs:1187:13
stack backtrace:
   0:     0x5b88cc02b38f - <unknown>
   1:     0x5b88cc048d3e - <unknown>
   2:     0x5b88cc00fcd5 - <unknown>
   3:     0x5b88cc02b145 - <unknown>
   4:     0x5b88cc018d3f - <unknown>
   5:     0x5b88cc0189f8 - <unknown>
   6:     0x5b88cc01934b - <unknown>
   7:     0x5b88cc02b6e7 - <unknown>
   8:     0x5b88cc02b4dc - <unknown>
   9:     0x5b88cc018ef2 - <unknown>
  10:     0x5b88cbe36a33 - <unknown>
  11:     0x5b88cc04824b - <unknown>
  12:     0x5b88cbe303eb - <unknown>

kevin.fleming · May 2, 2024, 4:54pm

It appears that the line number has changed a bit, but the relevant code is here. This code is validating that the size of a data item matches the expected size, because (based on the comments) a failure to get the proper size is a serious failure.

Unfortunately we’ll need someone familiar with the code to help figure this out… maybe @jkarneges can find some time to help.

jkarneges · May 7, 2024, 11:04pm

The log output shows a published message received right before the crash. Is that common or does it happen without such log lines too?

arunvelsriram · May 8, 2024, 4:48am

@jkarneges the logs are always the same whenever this restart is happening

jkarneges · May 9, 2024, 12:38am

Did you increase the client_buffer_size above the 8192 default? Do your messages usually exceed this buffer size? Does it seem possible that you send enough data in a short time to the same client such that its buffer fills and possibly its TCP send buffer in the OS also fills? Maybe this happens during some bursty moment. Just trying to think about which code paths are being taken.

Topic		Replies	Views
Is there a config reloader when the mounted tls certs at the location runner/certs changed? Pushpin	4	110	April 2, 2024
List channels with active connections Pushpin	2	256	January 31, 2024
Prevent killing the backend on restart Pushpin	2	299	December 21, 2023
How to send pushpin instance details while making a call to backend service? Pushpin	3	531	August 15, 2023
Liveness/readiness probes Pushpin	2	330	December 7, 2023

Pushpin worker crashes often with assertion error

Related Topics