Cullen Jewellery ◆ Engineering

The CRM

The quiet Rust service that answers the phone,
remembers every customer, and never sleeps.

Where we're going

The tour / 4 stops

1 The lay of the land

What it is, the stack, how it ships, what it actually does all day.

2 Who are you?

Turning a mess of human data into one profile per person.

3 Scale

Doing that matching across hundreds of thousands of records without melting the database.

4 Reliability

Why a redeploy doesn't drop a live phone call.

Stops 2–4 are the real meat. Stop 1 is so they make sense.

Stop 1 · The lay of the land

What even is the CRM?

It's the comms brain for the business.

☎️ Answers & routes calls - a customer rings, it finds the right available staff member and bridges them in a conference.
💬 Handles SMS - inbound/outbound texting, tied to the right person.
🗂 Builds one profile per customer - from orders, bookings & appointments scattered across 4 systems.
🔌 Glues vendors together - Front, Twilio, SimplyBookMe, Shopify, Salestools.

The staff-facing phone client (desktop app) talks to it over a WebSocket. When your screen lights up with "Sarah is calling - here's her last 3 orders", that's this service.

Stop 1 · The stack

Built on Rust

Language

Rust

edition 2024

Web / async

Axum

on Tokio

Database

Postgres

via sqlx

WebSockets (phone client) sqlx - compile-time checked SQL Cloudflare R2 (S3) Infisical (secrets) Sentry OpenTelemetry → Axiom cron jobs in-process generated OpenAPI clients

The Twilio & Front API clients (twilio-rust, front-rust) are generated from OpenAPI specs - no hand-written HTTP glue, the types come for free.

Stop 1 · Always-on chores

The cron jobs

Eight scheduled jobs run inside the same process (production only):

ClientSync
rebuild profiles

FrontEvents
pull comms

AppointmentReminders
nudge customers

BusinessHours
routing windows

PhoneNumberSync
number cache

TwilioPhoneSync
provisioned #s

BlockedSync
spam list

FrontCheck
watchdog

FrontCheck is the interesting one - it watches for Front silently disabling our webhook and alerts staff if it ever does. Foreshadowing: reliability.

Stop 1 · How it ships

Deployment

Build

Multi-stage Docker: rust:alpine → tiny alpine runtime
Dependencies cached in their own layer (fast rebuilds)
objcopy splits debug symbols → uploaded to Sentry, stripped from the binary
Secrets injected at runtime by Infisical

Run

Docker Swarm + Traefik - start-first rollout, auto-rollback
/health check every 10s; the old version only steps down once the new one is healthy
Image tagged by git commit → trivial rollback
Used to run on Fly.io - moved off, it was too unreliable for always-on calls

Start-first + health checks = a bad deploy never takes traffic. More on why that matters at Stop 4.

Stop 1 · The headline feature

A call, start to finish

Ringingcustomer calls in

→

FindingStaffwho's free + preferred?

→

DialingStaffring a person

→

Conferencebridge both legs

→

Voicemailnobody home

Calls are modelled as an explicit state machine - every call is always in exactly one known state. Twilio webhooks drive transitions; the conference is how we bridge customer ↔ staff.

Stop 1 · Why an enum matters

The states are a type

pub enum CallState {
    Ringing,
    RingingOutbound,
    GetConferenceDetails,
    FindingStaff,
    DialingStaff { staff: Uuid },
    WaitingForStaff,
    StaffJoined,
    AddingCallerToConference,
    AllInConference,
    Voicemail,
    Hungup,
}

The compiler won't let us forget a state. Each call runs its own async task (call.run()) and receives actions over a channel - an actor. No shared mutable mess, no locks around a call.

Deep dive · Stop 2

Who are you?

The hardest problem wasn't code. It was people.

Stop 2 · The mess

Four systems, one human

The same customer shows up as different rows in different tools, none of which agree:

Source	What it is	What's reliable
`order`	Shopify / POS purchases	email-ish, name-ish
`sbm`	SimplyBookMe bookings	name crammed in one text field
`aj`	AppointJet - the primary on a booking	richest source
`aj_partner`	the second person on a couple's booking	partial, often shares email

A couple books one appointment → that's two people we must keep apart, often sharing a single email address. 💍

Stop 2 · Cleaning

Step 1 - make it comparable

Before matching anything, every raw row is normalised:

📧 Email → lowercased, aliases stripped; invalid ones dropped

📞 Phone → parsed to E.164 (+61…), AU default; junk ending in 000000 binned

🔤 Names → trimmed; curly ’ vs straight ' normalised so O'Brien matches

No email AND no phone? → dropped. Nothing to match on.

Then split by how much we know:

Full identity 2+ fields → joins the main matching. Weak identity 1 lonely field (a bare email import) → set aside for later.

Stop 2 · Scoring

Step 2 - weigh the evidence

We don't use rigid rules. We score with Fellegi–Sunter - every field is evidence for or against "same person", summed into a probability.

Field	Exact match	Mismatch	Why
📞 Phone	+10.81	−4.32	hard to fake
📧 Email	+10.81	−4.32	strong
Last name	+10.64	−4.32	surnames rarely collide
First name	+6.12	−3.25	"John" collides constantly

Sum the weights → logistic → probability. Merge only above 0.99. The system would rather leave a duplicate than fuse two people.

Stop 2 · The "aha"

The case that breaks naive systems

jane & john @ shared inbox

Same email. Different first and last names.

Naive "match on email" → one merged blob. 💀

Ours: −3.25 − 4.32 on the names overpowers the shared email → kept as two people. ✅

a bare email import

One field, no name, no phone.

Can't join main matching (too little info).

Ours: held back, then attached to whichever existing profile already has that exact email on a real record.

A shared email is a hint, not a verdict.

Stop 2 · The whole pipeline

From four messy sources to one profile

flowchart LR O[("orders")] --> ING S[("SimplyBookMe")] --> ING A[("AppointJet")] --> ING P[("AJ partner")] --> ING ING["ingest"] --> CL["clean and normalise"] CL -->|"2+ fields"| F[["full identity"]] CL -->|"1 field"| W[["weak, held back"]] F --> M["block, score, DSU union
prob over 0.99"] M --> PR["build profiles
consensus name / email / phone"] PR --> RE["re-match existing
respect rejections"] W -.-> AT["attach weak
to best profile"] PR --> AT RE --> CU["safety nets
orphan cleanup, un-merge over-merged"] AT --> CU classDef src fill:#1f3a29,stroke:#D3E6D9,color:#fefaf5 classDef proc fill:#173d24,stroke:#00BB33,color:#fefaf5 classDef sink fill:#3a3026,stroke:#CDA17F,color:#fefaf5 class O,S,A,P src class ING,CL,M,RE,AT proc class PR,CU sink

Continuous: new data flows in, profiles split & merge, "no"s are cached, orphans are swept. How it all stays fast at our size is the next stop. Full write-up: client_sync/MATCHING.md.

Stop 2 · Takeaway

What the data taught us

Garbage is the default. Half the engine is just making rows comparable.
Be probabilistic, be cautious. Score evidence; when unsure, don't merge.
Model the real world. Couples share emails - so a shared email can't be proof.
Make it reversible. Over-merged a profile? There's an un-merge safety valve.

The code is small. The judgement encoded in it is the product.

Learning · Stop 3

Scale

"Just compare everyone to everyone" is a trap.

Stop 3 · The problem

The numbers don't forgive you

Customer records

100k+

orders + bookings + appts

Naive comparisons

~5B+

every pair (n²/2)

Sources that disagree

orders · SBM · AJ · partner

To decide "are these two records the same person?" the obvious approach is compare every record to every other. With n records that's n².

n² on 100k records is ~5 billion comparisons. Per run. Every few minutes.

Stop 3 · The fix

Trick 1 - Blocking

Two records can't be the same person unless they share a contact channel - same email or same phone. So only ever compare records that land in the same bucket.

// build hash indexes once: contact → records
email_idx: HashMap<String, Vec<usize>>
phone_idx: HashMap<String, Vec<usize>>

// then only compare within a shared bucket
for A in records {
    for B in email_idx[A.email] ∪ phone_idx[A.phone] {
        if score(A, B) > 0.99 { dsu.union(A, B) }   // ← match!
    }
}

We go from "everyone × everyone" to "only people who already share a phone or email". The 5 billion collapses to a few comparisons each.

Stop 3 · A spanner in the works

Then we broke our own fix

We started using the order system internally to prep ready-to-ship rings. Every one of those orders booked under a single internal account: one email, one phone.

real contact2 to 3 records

real contact1 to 4 records

internal accountthousands of orders, all one bucket

Blocking only helps if contacts are spread out. One bucket holding thousands of records means comparing every pair inside it: O(m²) all over again. We were right back to billions of comparisons on a single run.

A blocking key is only as good as its worst bucket.

Stop 3 · The rework

Trick 2 - Union-Find (DSU)

Matches are transitive: if A and B match, and B and C match, all three are one person, even if A and C share nothing directly.

Aname + phone

↔

Bphone + email

↔

Cemail + name

A Disjoint-Set Union groups them in near-constant time per merge, and skips any pair already in the same component. This is what let us drop the explicit edge graph (build every link, then traverse it): union-find never has to materialise a billion-edge graph.

A ↔ B ↔ C ⟹ one profile, found cheaply.

Stop 3 · The rework

Trick 3 - Remember your "no"s

The sync runs continuously, so that giant internal bucket is the same expensive comparisons every single run. Most pairs that could match (share a contact) actually don't. Re-deciding that forever is wasted work.

Without

Re-compare the same household pair forever. Same answer. Same cost. Every run.

With a rejection cache

A "no" is written to profile_merge_rejection. We skip that pair - until new evidence arrives for one of them, then we re-check.

Caching the negative result is as valuable as computing the positive one.

Stop 3 · Takeaway

What scale taught us

▣

Block first
narrow the candidates before you compute anything

⋃

Right structure
DSU turns transitive merges into near-O(1)

⊘

Cache the "no"
with an invalidation rule, not forever

A correct algorithm that's O(n²) is a wrong algorithm at our size.

Learning · Stop 4

Reliability

You can't ask a customer mid-call to "hold while we redeploy".

Stop 4 · The stakes

Live calls vs. shipping code

We deploy whenever. But at any moment there might be live phone calls in progress. A process restart can't just vaporise them.

❌ Naive: call state lives in memory → redeploy → call drops, customer hears silence.
✅ Ours: every call's state is persisted to Postgres as it transitions.

The in-memory actor is a cache of a row in the phone_call table - not the source of truth.

Stop 4 · The recovery

`resume_state()` - the comeback

On boot, before taking traffic, the service rebuilds reality:

1 · Loadall calls where completed = false

→

2 · Ask Twiliois this call actually still alive?

→

3 · Reconcileresume the actor, or clean it up

The magic is step 2: we don't trust our own DB blindly. We ask Twilio - the real source of truth for telephony - whether each call is still ringing. Dead ones get cleaned up; live ones get their actor + state machine rebuilt and nudged back on track.

Stop 4 · Reliability has consequences

Slow isn't slow. It's broken.

Lean on someone else's platform and their limits become your correctness bugs. Being late here doesn't degrade gracefully, it gets you switched off.

Front's 5-second axe

Take longer than 5s to answer a webhook and Front disables it, silently. We simply stop receiving comms.

Our average on that endpoint: 18ms. But network variance occasionally spikes, so FrontCheck watches for a disabled webhook and pings staff the instant it happens.

Twilio's no do-overs

Miss a Twilio webhook and a live call can drop. Miss a single status update and the call's state desyncs, ruining it.

There is no retry budget on a phone call. Late equals lost.

A 5-second budget we spend 18 milliseconds of.

Stop 4 · The supporting cast

Reliability is layered

Deploys can't hurt you

Start-first rollout: new version proves /health before the old one steps down
Auto-rollback on failure
No window where zero healthy instances serve traffic

Syncs heal themselves

Front comms sync is cursor-based - resumes from the last event, never double-counts
Respects API rate limits (parses "retry in N ms" and waits)
FrontCheck pings staff if Front ever disables our webhook

Assume the process will die. Make restart boring.

Stop 4 · The receipts

Does it actually work?

Uptime · 2026 YTD

99.998%

Total downtime

240 sec

~4 minutes since Jan 1

Availability tier

4 nines+

Avg response · across the board

<100 ms

Heaviest endpoint · recording processing

~6 s

still beats our other tools' averages

240 seconds down all year, and we answer in under 100ms.

Wrapping up

Three lessons, one service

Messy humans

Real data is the hard part. Score evidence, stay cautious, model couples.

Scale

Don't out-compute a bad shape. Block, pick the right structure, cache the "no".

Reliability

Truth lives in the DB. Reconcile with reality on boot. Make restart boring.

It answers the phone. It remembers the customer. It survives a redeploy. That's the job.

That's the tour ◆

Questions?

Pick a stop and we'll go deeper - calls, matching, deploys, whatever.

Code: crm/src/controllers/ · Deep dive: client_sync/MATCHING.md

The CRM

The tour / 4 stops

1 The lay of the land

2 Who are you?

3 Scale

4 Reliability

What even is the CRM?

Built on Rust

The cron jobs

Deployment

Build

Run

A call, start to finish

The states are a type

Who are you?

Four systems, one human

Step 1 - make it comparable

Step 2 - weigh the evidence

The case that breaks naive systems

jane & john @ shared inbox

a bare email import

From four messy sources to one profile

What the data taught us

Scale

The numbers don't forgive you

Trick 1 - Blocking

Then we broke our own fix

Trick 2 - Union-Find (DSU)

Trick 3 - Remember your "no"s

Without

With a rejection cache

What scale taught us

Reliability

Live calls vs. shipping code

resume_state() - the comeback

Slow isn't slow. It's broken.

Front's 5-second axe

Twilio's no do-overs

Reliability is layered

Deploys can't hurt you

Syncs heal themselves

Does it actually work?

Three lessons, one service

Messy humans

Scale

Reliability

Questions?

`resume_state()` - the comeback