Most hotel sites I get handed have a sitemap that nobody has looked at since the web designer flipped on a plugin in 2019. It is one giant file, it lists every URL the CMS has ever generated, half of them redirect or 404, and the lastmod date on all 4,000 of them is “today” because the plugin stamps the current date every single time Google fetches it.
For a five-room boutique with a dozen pages, none of that matters much. For a 200-key resort, a group with eight properties, or a hotel with a sprawling weddings-and-events microsite bolted on, that lazy sitemap is quietly costing you indexed pages and hiding problems you would otherwise catch in an afternoon. So let me walk through how I actually structure sitemaps for big, messy hotel sites, and the part nobody talks about: using the sitemap as a free diagnostic instrument.
Why “one big sitemap” stops working on large sites
A sitemap is two things at once. Officially, it is a list of URLs you want crawled, with optional hints. Unofficially, and more usefully, it is a measurement surface. Google Search Console reports indexing per sitemap. So the design question is not just “did I list every URL?” It is “did I structure these files so the indexing report tells me something I can act on?”
When everything lives in one file, Search Console gives you one number: submitted X, indexed Y. If Y is way below X on a 6,000-URL site, congratulations, you now know something is wrong somewhere across 6,000 pages. That is not a diagnosis, that is a shrug.
There are also two hard ceilings worth knowing. A single sitemap file caps at 50,000 URLs and 50MB uncompressed. Plenty of resort sites with paginated availability, multi-language variants, and event galleries blow past that without anyone noticing, and URLs past the limit simply get ignored. The fix for both the diagnostic problem and the ceiling problem is the same structure: a sitemap index.
The sitemap index: your table of contents
A sitemap index file is a sitemap of sitemaps. Instead of pointing Google at one file, you point it at a parent that lists several child sitemaps, each segmented by page type. Your robots.txt references the index, you submit the index in Search Console, and Google reads down into the children.
Here is the segmentation I use as a starting point on a larger property or group site:
| Child sitemap | What goes in it | Why it is separate |
|---|---|---|
sitemap-core.xml | Homepage, about, contact, policies | High-priority, rarely changes, must always be indexed |
sitemap-rooms.xml | Room and suite type pages | Your money pages; you want these watched closely |
sitemap-offers.xml | Packages, seasonal deals, promotions | Churns constantly; great lastmod test bed |
sitemap-amenities.xml | Spa, dining, pool, venues, experiences | Often thin; common indexing trouble spot |
sitemap-local.xml | Area guides, “things to do”, neighborhood pages | Your local and AEO content; track adoption here |
sitemap-blog.xml | Articles and guides | Volume varies; isolate so it does not mask room issues |
sitemap-properties.xml | Per-property landing pages (group sites) | One row per hotel; the spine of a multi-property setup |
The point is not these exact names. The point is that each segment maps to a page template and a content owner, so when one underperforms you know exactly which template, which team, and which fix.
A sitemap segmented by page type turns Search Console into a per-template indexing dashboard. One blob tells you something is wrong. Seven segments tell you the amenities template is the problem and the rooms are fine. That difference is hours versus weeks.
Multi-property and group sites: one index, clear lanes
Group and multi-property setups are where this really pays off. If you run six hotels under one brand domain, or a flagship resort with several sub-brands, you have a choice: model the structure by property, by page type, or both.
What I land on most often is a two-level approach. The root index lists one child index per property, and each property’s index lists that property’s page-type segments. So sitemap-index.xml points to boutique-savannah/sitemap-index.xml, which points to that hotel’s rooms, offers, and amenities sitemaps. Now Search Console indexing data slices two ways at once: per property and per page type. You can see at a glance that the Savannah rooms are fully indexed while the new Charleston property’s amenity pages are stuck. That is the kind of clarity that turns “our site feels invisible” into a specific, fixable ticket.
This matters for the OTA conversation too. The whole reason to obsess over indexing is so that when a guest searches your property name, or asks an AI assistant for boutique hotels in your city, your own pages are in the index to be found, instead of ceding that real estate to the listing sites by default. You will never fully escape the OTAs, and you should not try, but getting your own pages reliably indexed is table stakes for winning back a healthier share of direct bookings. I dug into the search side of that in why your hotel ranks below OTAs for your name.
Lastmod hygiene: the signal everyone fakes
Now the part most plugins get spectacularly wrong. The lastmod element is supposed to mean “this URL’s content last changed on this date.” Google has been explicit that it uses lastmod as a crawl-scheduling hint only when the dates are demonstrably trustworthy. If your CMS stamps today’s date on every URL every time the sitemap regenerates, every page looks freshly updated, the signal is noise, and Google stops trusting it for your whole domain.
So lastmod hygiene comes down to a few rules I enforce on every build:
- Lastmod changes only when the page’s main content changes. Editing a room description updates it. A nightly rebuild that touched nothing does not.
- Use a real, valid date. W3C datetime format. If you cannot generate accurate dates, omit the element entirely rather than lie. An absent lastmod is honest; a fake one is corrosive.
- Do not let template-wide changes flip every date. Swapping a footer logo should not mark all 6,000 pages as updated. Tie lastmod to the page’s own content, not the site’s chrome.
- Keep parent index lastmods accurate too. The index entry for each child sitemap should reflect the most recent change inside it.
Why care this much about a timestamp? Because crawl budget on a big site is finite, and accurate lastmod is how you steer Google toward the handful of pages that actually changed instead of making it re-crawl everything. Honest dates mean your new seasonal package gets re-crawled fast; dishonest dates mean it sits in line behind 5,000 pages that lied about being fresh.
The sitemap is the one place on your site where you get to tell Google “look here, this changed.” If you cry wolf on every URL, you lose the ability to point at anything. Treat lastmod like a budget you are spending, not a checkbox.
Hygiene rules for what belongs in a sitemap
A sitemap is a list of URLs you are vouching for. So only canonical, indexable, 200-status pages you genuinely want ranked belong in it. That sounds obvious and is violated constantly. The usual offenders on hotel sites:
- Expired offers and old seasonal pages that now 404 or redirect. Out.
- Noindex pages like booking-engine steps, thank-you pages, and filtered availability results. Out.
- Redirected URLs still listed at their old address. List the destination, not the redirect.
- Parameter and faceted URLs from date pickers and filters that generate thousands of near-duplicate addresses. Out, and ideally handled with canonicals upstream.
- Non-canonical language or currency variants. List the canonical; handle the rest with hreflang annotations, not by dumping every permutation in.
Every junk URL in your sitemap does two bad things: it wastes the crawl attention you are trying to direct, and it poisons the diagnostic. If 30% of a sitemap is dead URLs, your submitted-versus-indexed ratio is garbage and you cannot tell a real indexing problem from your own mess. Clean first, then measure.
Using the sitemap as an indexing diagnostic
This is the payoff, and the part I genuinely enjoy. Once your sitemaps are segmented by page type and scrubbed of junk, Search Console’s per-sitemap indexing report becomes a precision instrument. Here is the workflow I run on a sprawling site:
- Submit the index and each child sitemap. Yes, submit children individually too; you get cleaner per-segment numbers that way.
- Read submitted-versus-indexed per segment. A core or rooms segment should index at or very near 100%. Anything dragging is a flag.
- Find the laggard segment. Say amenities shows 120 submitted, 41 indexed. Now you are not auditing 6,000 pages; you are auditing one template.
- Open the Page Indexing report for that segment’s pattern. Google tells you the reason bucket: “Crawled - currently not indexed”, “Discovered - not indexed”, “Duplicate without user-selected canonical”, and so on.
- Map reason to fix. “Crawled - currently not indexed” on thin amenity pages usually means the content is too sparse to earn a slot, which is a content-and-reputation problem. “Discovered - not indexed” at scale often means crawl budget or internal-linking starvation.
A worked, illustrative example of how this plays out: a resort I am describing hypothetically submits 6,000 URLs and sees 5,200 indexed. The single-sitemap version of that story is “87%, eh, probably fine.” Segmented, you might find core, rooms, and blog all at ~100% while the local area-guide segment sits at 40%. Suddenly the diagnosis writes itself: the area guides are thin, templated, and barely internally linked, and that is exactly the content AI assistants lean on when answering “where should I stay near X”. Fixing that segment is both an indexing win and an AEO win. The numbers there are made up to show the shape of the analysis, not a real result.
This is why I treat sitemaps as a reporting layer, not a deploy artifact you generate once and forget. On the content and thin-page side, that work overlaps heavily with content and reputation, and the indexing-versus-AI-answers angle is the whole premise of is your hotel invisible to ChatGPT.
A practical setup checklist
If you want to action this without a full audit, here is the short version I would hand a hotel’s web person:
- Generate a sitemap index referencing page-type-segmented children.
- Keep each child under 50,000 URLs and 50MB, gzip the big ones.
- Reference the index in
robots.txtand submit it in Search Console. - Include only canonical, indexable, 200 URLs; purge dead offers and noindex pages.
- Make lastmod honest, tied to real content changes, or omit it.
- Re-pull per-segment indexing numbers monthly and chase the laggards.
None of this requires exotic tooling. It requires deciding that your sitemap is a measurement tool you maintain, not a checkbox a plugin ticks. On a small site that is a nice-to-have. On a multi-property resort with thousands of URLs, it is the difference between knowing where your indexing is leaking and guessing in the dark. If you want the broader technical-and-strategy frame this sits inside, the hotel SEO 2026 starter guide is the place I would start, and a clean sitemap is the foundation everything in hotel SEO is built on.
If your resort or group site has thousands of URLs and you have no idea how many are actually indexed, that is exactly the kind of thing I pull apart in a technical audit. Book a call and we will look at your sitemap, your indexing report, and where your money pages are quietly falling out of the index.