Hello<p>We are facing an intermittent issue in our web application where for some users for some reasons http requests are ending in error ( 400s ) esp. during token refresh with authentication server.<p>Normally, we would ask user to generate the HAR ( HTTP archive file ) and we inspect to find the root cause. However, at this time it is challenging to collect the HAR file manually because the error is not consistent. Sometimes it seems to goes away but suddenly appears causing bad user experience.<p>It is also hard to add logs etc. because the token refresh happens on the client side from the browser so technically there is no traces of it on the server side.<p>I am looking into ways to automate generating the HAR file but it seems not straightforward to do it.<p>If anyone of you have faces similar issue in the past and find a way to add such error logging in a web service let me know. Any other thoughts and suggestions are highly appreciated.<p>Thank you in advance.
This isn't a direct answer to your question, but be very careful with asking for HAR files. They're super convenient, but if your tech support doesn't understand that HAR files are the worst kind of PII you can get in big trouble.<p>I've seen HAR files containing Google account session tokens attached in plain text to Jira tickets. If you end up leaking those tokens your customers will <i>not</i> be amused.<p>See the Okta breach:<p><a href="https://www.rezonate.io/blog/har-files-attack-okta-customers/" rel="nofollow">https://www.rezonate.io/blog/har-files-attack-okta-customers...</a>
What was the body of the HTTP 400? You should log that. Maybe there's a refresh token grace period depending on implementation.<p>I'd sooner be testing in a lab environment recording a pcap file on both sides to try to get the client's TLS session to break before I'd want a client's confidential credential flow sent to me. I don't like to bother people. I've always hated refresh tokens, at least OAuth's design of them. Is sending a client's decrypted MITM logs around really safer?
How intermittent of an issue is it? I don't think collecting client side HAR files from real customers is the way to go, even if they're willing. What happens when the next weird error shows up? More HAR files?<p>Echoing some other suggestions, but to a different extent, increase logging in the problem areas both client-side and server-side. It might be directly related to the token refresh since it only happens there, so a great place to start is within that functionality. Log the entire connection's info to both services (front and back logging) and if users are manually submitting tickets you should be able to track them down by userID / IP in the logs.<p>Also extend the fuzzing capabilities w/ your tests through browser (potentially could be headless, depending on the issue) automation that authenticates and uses the app "normally". Keep it on repeat using the app and when token refresh time comes see if the error pops up. Throw some extra variables in their, ensure its off the corporate network or routed through DCs farther away to see if it's a latency issue somewhere else. You could log the HAR file for this.<p>Multiple versions of tests might need to be run in parallel with different modifiers, such as one being allowed to directly communicate w/ the origin, vs. another going through the CDN like a standard customer would.<p>This is also an edge-case, but I've seen it popup sometimes; ensure that there aren't any other required variables that are missing during the refresh process. Sometimes specific functionality in some apps is tied to a custom header, and sometimes the value isn't updated to what the app expects. Things like that which could throw the process of from another angle.
HAR files are big and it seems like overkill to send them every time. Can't you make just make a client side fetch to an error reporting service? i.e. if the app detects a 400, then it sends a (no auth required) payload of the failed request & response, with secrets sanitized, to another error reporting endpoint.
Is this a CSP thing? Can you get away with <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Reporting-Endpoints" rel="nofollow">https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Re...</a> and window.onerror?<p>Also, do you actually need the HAR file? or just a log of your servers' inputs/outputs from the clients' perspective? You can get that The Boring Way if you don't have a CSP issue, so maybe solve that issue?
> the token refresh happens on the client side from the browser<p>You can totally add logging for that. If you don't have an existing service that can handle it, you can create a logging-only endpoint for that purpose and send the event async to not block other work.
I don’t remember how we debugged it at the time, but I’ve run into very similar symptoms that were caused by clock skew between client & servers. Increasing the validity window to both past & future by a longer period helped resolve it.
commendable that you wanna go this way honestly. i see a lot of companies just push bullshit back onto users in the face of this type of intermittent client side issue. repeating same dumb questions until you give up.<p>as some other commenter said, automating har files might not be ideal as it could collect much too much info, and browsers will make this very difficult to automate.<p>perhaps you cam add client side logging and automate gathering that or ask users for that rather than a har file. like if xyz happens again please send us log from location yzw. not sure if that is possible but it would atleast unburden users from runing devtools on an intermittent issue. if it happens only to few users you can add it optionally to their clientside like a debug/trace mode. if it happens widespread id say add it for all users.<p>good luck and happy to see ur not giving up just yet :D these issues can be quite frustrating to get good data on. keep at it and ull find it eventually.<p>it might also be possible to automate a client at your own side and run it until it hits the issue. no guarantee it will actually hit it though. you can run it from office, home, and try to have many colleagues / people run it in different (maybe personal) setups.
I haven't used it, but you can try if it works for you. It has custom dev tools.<p><a href="https://eruda.liriliri.io/" rel="nofollow">https://eruda.liriliri.io/</a>