I want to tell you about the most expensive ghost living in your hotel’s tech stack: the same guest, showing up as four different people.
Here is the scene I run into on almost every audit. A guest named [first name] books a room in 2024 through your booking engine. She checks in, and your PMS quietly creates a fresh profile for her. She joins your email list at the front desk with a slightly different email. Six months later she books again, this time through an OTA, and the channel manager pushes yet another version of her into the PMS. Now you have four records, four partial histories, and zero idea that this is one loyal woman who has paid you thousands of dollars and would happily book direct if you ever bothered to recognize her.
That is the problem a unified guest profile solves. And no, this is not the same conversation as “buy a CDP” or “set up a CRM.” This is the unglamorous plumbing underneath both of those: identity resolution. Stitching. Getting the machine to agree that these four ghosts are one human.
I obsess over this because fragmented identity is one of the quietest reasons independent hotels stay stuck overpaying the OTAs. You cannot win a guest back if you do not know you already had her.
What “unified” actually means (and what it doesn’t)
A unified guest profile is one record per real person that pulls together everything that human has done with your property. Stays, cancellations, room preferences, spend, email opens, the review they left, the support ticket about the noisy AC. All of it hanging off one identity.
What it is NOT:
- It is not just merging duplicates inside your PMS. That is one slice.
- It is not a loyalty program. Loyalty is a layer you can build on top once identity is solved.
- It is not magic. You are going to make rules, and some of those rules will be wrong sometimes, and you will tune them.
The reason this matters for the kind of work I do — SEO, AEO, book-direct conversion — is that all of it eventually points a guest at your website. And the moment they arrive, your ability to recognize and reward them decides whether they convert direct or bounce back to the channel they trust. I wrote about the underlying economics in the book-direct math post, but the short version is this: OTA commissions run roughly 15 to 25 percent, and every returning guest you fail to recognize is a guest you may end up re-acquiring at full commission. Forever.
A returning guest you cannot identify is not a returning guest. To your marketing, she is a brand-new stranger you have to pay an OTA to meet all over again.
The three systems that never talk to each other
For most independents I work with, the guest identity lives in three places that were never designed to agree:
1. The PMS. Source of truth for the actual stay. But PMS records are notoriously dirty — front desk staff fat-finger names, create walk-in profiles with no email, and duplicate guests on every visit.
2. The booking engine. Clean-ish email capture at the point of reservation, but it often knows nothing about the guest’s history unless it is wired back to the PMS.
3. The email / marketing tool. Full of contacts, opens, and clicks, but frequently keyed on a different email than the PMS uses, and full of people who never actually stayed.
Each one has a partial truth. Nobody has the whole picture. Your job is to build the bridge.
Match keys: the heart of identity stitching
Identity resolution comes down to one question asked over and over: are these two records the same person? You answer it with match keys — the shared signals you compare across systems.
Here is how I rank them, strongest to weakest:
| Match key | Reliability | The catch |
|---|---|---|
| Loyalty / membership ID | Very high | Only exists if you have a program and they used it |
| Email address (normalized) | High | People use multiple emails; typos happen |
| Phone number (normalized) | High | Shared family numbers; international formatting |
| Name + date of birth | Medium | Common names collide; DOB rarely captured |
| Name + postal address | Medium | People move; addresses get abbreviated differently |
| Name + stay dates | Low | Useful only as a tiebreaker, never alone |
The trick is that you almost never match on one key in isolation. You build a little hierarchy: if loyalty ID matches, merge with confidence. If not, try normalized email. If email is missing, fall back to phone plus last name. Each step down the ladder, you demand more corroboration before you dare merge.
And “normalized” is doing heavy lifting in that table. Before you compare anything you have to clean it.
Normalization: the boring step that makes or breaks everything
Two records with the email [email protected] and [email protected] are the same person, but a naive system sees two strangers. So before matching, I run everything through a normalization pass:
- Emails: lowercase, trim whitespace, and for Gmail specifically, strip dots and anything after a plus sign in the local part.
- Phones: strip spaces, dashes, and parentheses; force a consistent country-code format.
- Names: lowercase, trim, collapse double spaces, and standardize obvious nicknames where you are confident (Bob to Robert is risky; do it carefully or not at all).
- Addresses: expand or abbreviate consistently — Street vs St, Apartment vs Apt.
Skip this and your match rate quietly collapses, and you will swear the whole project doesn’t work when really you just never washed the data.
Deterministic vs probabilistic matching
There are two philosophies here and you will likely use both.
Deterministic matching says two records are the same only if a key matches exactly (after normalization). Same loyalty ID, same email. It is precise and rarely wrong, but it misses guests whose data drifted — the woman who booked under her work email once and her personal email the next time.
Probabilistic (fuzzy) matching scores similarity. Same last name, same phone, same city, slightly different email spelling? That is a strong probable match even with no exact key. You set a confidence threshold: above it, auto-merge; in a gray zone, flag for a human to review; below it, leave them separate.
For a small independent, I usually start almost entirely deterministic. It is safe, explainable, and you will not accidentally merge two different guests into a Frankenstein profile — which, trust me, is a far worse outcome than missing a match. A bad merge means one guest sees another guest’s stay history, and that is a privacy incident, not a data hygiene problem. Start conservative.
Deduplication: cleaning the house before you stitch
Before you connect systems, dedupe within each system, starting with the PMS because it is the dirtiest. Run a report of likely duplicates — same email, same phone, same name with overlapping details — and merge them following your match-key hierarchy.
When you merge, you need a rule for which value wins on conflict. My defaults:
- Most recent wins for contact info like email and phone (people update them).
- Most complete wins for preferences and notes (never throw away a captured detail).
- Never delete history — every stay, every transaction stays attached to the surviving record.
Do this as a deliberate project, not a one-time blitz. Dirty data regenerates. New duplicates appear every week from walk-ins and channel imports, so dedup has to become a recurring habit, ideally automated.
The goal is not a perfect database. There is no such thing. The goal is a database that is right often enough that you can confidently recognize a returning guest, reward them, and stop renting them back from an OTA you already paid once.
The architecture: pick a hub, don’t go peer-to-peer
Do not try to sync all three systems to each other directly. Three systems means six connections to maintain, and it becomes a hairball fast. Instead, pick a hub — one system that holds the canonical unified profile — and have everything flow through it.
For most independents, one of two shapes works:
- PMS as hub. Your PMS becomes the source of truth for guest identity, and your booking engine and email tool read from and write to it. Works well if your PMS has a decent open API.
- A dedicated layer as hub. A CDP or a lightweight middleware sits in the middle, ingests from all three, runs the stitching logic, and pushes the resolved profile back out. This is where a CDP earns its keep — but you do not need one to start.
Either way, you assign each unified profile a single stable identifier — a master guest ID — that never changes even as the underlying records get merged and updated. That master ID is the spine everything hangs off.
A realistic first 30 days
If you are a hotelier reading this and feeling the panic of “where do I even begin,” here is the sequence I would actually run, no CDP required:
- Week 1 — Audit. Pull a sample of 100 guests you know stayed multiple times. Count how many distinct records each one has across your three systems. This number is your “before” and it will horrify you appropriately.
- Week 2 — Normalize and dedupe the PMS. Clean email, phone, and name fields. Merge obvious duplicates with conservative rules.
- Week 3 — Define match keys. Write down your hierarchy on one page. Loyalty ID, then email, then phone-plus-name. Decide your merge-conflict rules.
- Week 4 — Connect two systems first. Usually PMS and email tool, keyed on normalized email. Prove the stitch works on your sample before you add the third.
You will not finish in 30 days. But you will have a working spine and a sample that proves the concept, which is exactly what you need to justify going further.
Why I keep dragging this back to bookings
It would be easy to file identity resolution under “IT housekeeping” and let it rot. I refuse to, because every piece of revenue work I do depends on it.
When your Google Business Profile and your local SEO pull a past guest back to your site, a unified profile lets your book-direct experience greet them as the returning guest they are — with their preferences pre-filled and a reason to book direct instead of bouncing to an OTA. That recognition is a real lever against the way OTAs intercept your search demand. It will not let you escape the OTAs entirely — nothing will, and anyone promising that is lying — but it absolutely helps you shift the mix toward more direct bookings and a healthier channel balance over time.
This is the unsexy foundation that makes the sexy stuff — personalization, win-back campaigns, loyalty — actually function. Build it on sand and everything above it wobbles.
Want help building the spine?
If you are staring at three systems that refuse to agree on who your guests are, that is exactly the kind of plumbing I love untangling. We map your PMS, booking engine, and email tool, define your match keys, dedupe the mess, and stand up one unified profile that your marketing can finally trust. Then we point your book-direct work at it so every recovered guest has a reason to skip the OTA next time. Book a working session with me and let’s get your guests stitched back into the single, valuable humans they actually are.