Why we score 40 KPIs per film, not 200 — calibration-honesty in catalogue analysis

Our 200-parameter catalog measures screenplays. Our catalogue scores films we have shipped and seen the receipts for. Conflating the two is the easiest way to lie with confidence.

Two engines run inside SignalGrid. They look like the same shape from a distance, and we have spent the last quarter making sure that nobody on the team, in our marketing copy, or in our customer conversations confuses them. This post is the why.

The first engine — the one we have been talking about the longest — is a 200-parameter catalog of screenplay-level lenses. Pacing across the act-2 turn. Dialogue density during interiority beats. The emotional contour of the climax. Mass-moment density in a Tamil masala film. Whether the antagonist's intelligence is shown or asserted. We ask a large language model to score each of those parameters between zero and ten on a screenplay we have actually read, and we aggregate the result into a calibrated probability the film clears its production budget on first window. That engine is real. It works. It lives in src/lib/scoring/parameters.ts. We use it every time someone uploads a screenplay through the s2o flow at /dashboard/upload.

The second engine — the one that landed in this iteration — is a catalogue scorer. It runs on shipped films we have receipts for. We know Soorarai Pottru's box office because we have the BoxOfficeRecord rows. We know the critic-versus-audience delta because we have hundreds of reviews ingested. We know the controversy timeline because we have the article corpus and the decode pipeline that clusters claims into themes. We know the music-director lift because we have the crew metadata and the comparable-film catalogue. There is real evidence for forty distinct measurements per film, give or take, depending on how much coverage exists. We call those forty measurements realistic KPIs.

Here is the thing we kept tripping over: those two engines do not score the same thing.

The 200-parameter engine scores a screenplay. To run it, you need a screenplay. We do not have screenplays for shipped films. There is no reasonable path by which we will obtain the screenplay of every Tamil release of the last twenty-five years. The studios will not give them to us. The writers will not give them to us. They are not licensed for digital distribution. Even if we paid every rate-card in the industry, we would have permission gaps. Most of the films in our catalogue today, we will never have a screenplay for.

You can see where this is going. The wrong answer to that constraint is to dress up our catalogue analysis as if it were a 200-parameter scoring. We could pick forty heuristics, give each one a parameter ID, and call it a 200-parameter analysis. The marketing copy gets cleaner. The product looks more impressive. The competitor comparison chart looks even better. And the entire thing is a lie of categorisation. The 200-parameter engine is a screenplay-prediction system. It knows what it knows because it read a screenplay. A 40-KPI catalogue scorer knows what it knows because it read the receipts. Pretending the two are the same thing is the easiest way to lose customer trust the moment a sophisticated buyer asks a follow-up question.

The right answer is to ship two engines, label them clearly, and let each be honest about what it is. The 200-parameter engine is a prediction. It produces a score, drivers, confidence intervals, and analog films. The 40-KPI catalogue is an investigation. It produces measurements grounded in cited evidence, each with a confidence chip telling you how much evidence we actually have.

The confidence chip is the thing that took the longest to land, and it is the thing we are most proud of. Every KPI carries one of four levels. High means we have multiple independent sources cited and they agree. Medium means we have evidence but it is partial — maybe two sources, maybe three articles. Low means we have one weak signal and we are surfacing the measurement at all only because it is part of the catalogue. Unknown means we tried to score this KPI and the evidence was insufficient, so we did not score it; we report the gap.

This last category is the one almost no incumbent ships. Most signal-intelligence vendors are structurally allergic to admitting they do not know something. The product looks worse with empty cells in it. The buyer sees the gap and asks an awkward question. So the vendors fill the cells with whatever heuristic gets close enough, label it "AI-powered", and ship. The buyer never asks the question, but they also never trust the system when it matters, because they have no way to tell the strong measurements from the weak ones.

We made the opposite call. Empty cells are good. They are the truth. Filling them with confident-looking nonsense is worse than leaving them empty. The confidence chip is how we ship that truth in a UI without breaking the layout.

The number forty is not magic. It is the number of catalogue-level measurements where we have plausible-to-strong evidence at the moment of writing. Three months ago, the number was eighteen. Six months from now, with deeper scrapes and better entity resolution, the number will probably be sixty. The number floats with the data we have. What does not float is the discipline: we do not score a KPI we cannot ground in cited evidence, and we do not pretend a screenplay-prediction parameter applies to a film whose screenplay we have never seen.

If you came to SignalGrid expecting "200-parameter scoring of every film in the catalogue", you came expecting a thing that does not exist anywhere in the industry, and that we could only sell you by lying. We would rather sell you the engine that actually works, label it accurately, and surface the gaps where they live. The reason customers stay is the same reason buyers leave the incumbents: when the system says "high confidence" and bets the prediction, it is right. And when it says "we don't know", it is honest about that too.

Why we score 40 KPIs per film, not 200 — calibration-honesty in catalogue analysis

More from the blog