Tuesday 22 August 2023

502s Massive Headers and So Many Cookies

We recently came across an (initially very confusing) issue where, on our cloud environments, a couple of fairly inconspicuous - but consistent - pages were making the server return a status code 502, or 520.  These environments had Cloudflare in front of them, which was showing us the 502 response, but our development environments (without Cloudflare) had no issues and the same pages and content loaded just fine.

Now I'll preface this by saying it was certainly something we (now know we) were doing incorrectly, but for the sake of anyone else who happens to come across the same thing I thought I'd share the debugging, findings, and fix.

So the natural first step to narrow down the cause of most things is to remove stuff from the page until it starts working again! By first removing components from the page(s), and then removing particuar portions/lines from those components, we found that it seemed to be next-auth related lines where we were fetching session info using useSession() (frontend) and/or getServerSession() (backend).  At this point we were pretty stumped: these same calls were working just find on some pages, but on a couple of particular pages they seemed to be causing issues... time for a web search to see if we can at least narrow things down!

A quick search for "502" and "next-auth" brings up a few useful results, but in particular this stackoverflow answer seemed to indicate it might be something to do with headers. At least we had somewhere to start! Hosting for this project was in AWS, with EC2 instances behind a load balancer, all of which sits behind Cloudflare.  As Cloudflare is the front endpoint showing the error, the natural instinct is to pit it on something to do with Cloudflare.  Hitting the endpoint from another server within the same network (bypassing Cloudflare) showed that the 502 was still returned. After that, the infra team provided us with the info we needed about header sizing (from the AWS load balancer limits):

Name Default Adjustable
Request line 16 K No
Single header 16 K No
Entire response header 32 K No
Entire request header 64 K No

It didn't take long to find that this was indeed likely the issue - a quick look at the header size for one of the pages showed it was certainly above the limit. Next question was obviously: "why"? That's quite a hefty size for headers. Again, taking a look at the headers for one of the pages, it didn't take long to see the issue:

and there was more to it than just what's shown here!

Wow, that's a lot of (very big) cookies being set in the headers... a very big WTF was in order!

Now for those who aren't familiar, these cookies are set by next-auth and split/chunked (session-token.0,  session-token.1 etc.) at 4kb by default. That's fair enough, we expected to see a couple and potentially very large ones; certainly not up to a dozen or so.

The Realisation

What came next was a lot of debugging, stepping through our code, the next-auth code, and to skip to the end: the realisation that every time getServerSession() is called, a new cookie (chunked, so actually multiple cookies) is set!

In our case, we had a few Sitecore headless components which had their own getServerSideProps() (component-level data fetching) and inside this method each component was calling getServerSession() to get the session data it needed to fetch its data. Ergo for each component using the session, a new cookie was created, blowing out the header size (and potentially causing session issues as well).

The Solution

So we had pinpointed the issue - now what? These components needed the session info (eg. user's account ID) in order to fetch their data, this wasn't just something we could do without. 

The key here is to understand how the component-level data fetching works: if you take a look at the [[...path]].tsx file which you get from a default instantiation of a Sitecore headless project, you will see inside it a getServerSideProps() function, and inside that a call to sitecorePagePropsFactory.create(). This is the starting point for the getServerSideProps() call of each component. 

If you follow the code, you will see the SitecorePagePropsFactory in page-props-factory\index.ts loops through each "plugin" and executes it, using a context, and setting props.  This context is the one which comes through in the getServerSideProps() function at the component level.

In [[...path]].tsx:

Using this knowledge, we can fetch our session in [[...path]].tsx, the starting point for all server-side logic, and pass it through in the "plugin context" to each component, so that we don't have to re-fetch it.

const session = await getServerSession(context.req, context.res, authOptions);
await sitecorePagePropsFactory.create({ session, ...context});

then in each component:

export const getServerSideProps = async (_rendering, _layoutData, context) {
  const fetched = fetchSomeData(context.session.userId);
}

Hope this helps anyone else who comes across this issue (or something like it)!