The case for cross-vertical entity graphs

Most signal-intelligence tools resolve entities inside their own vertical and stop. We argue the interesting predictions live across the seams.

When we started SignalGrid, the easiest thing to ship would have been ten separate products. One for box office. One for celebrity equity. One for music virality. Each with its own entity table, its own ingest pipeline, its own scoring head. That is how most of the incumbents are built, and it is how most of our pilot customers expected us to build too.

We did not do that, and the reason is worth writing down: the most useful predictions are the ones that depend on entities that live in different verticals.

Consider a question a studio executive actually asks: "If we attach this lead to this director on this script, what does the opening weekend look like?" To answer that, you need to score the script, the lead's current commercial pull, the director's recent track record, and the genre-fit of the combination. The lead is a celebrity-equity entity. The director is too. The script is a script-intelligence entity. The opening-weekend number is a box-office prediction. Four entities, three verticals, one decision. If your system stores those entities in four different tables that do not share a key, you cannot answer the question. You can only answer four smaller questions and ask the executive to integrate them in their head.

Cross-vertical entity resolution is not about elegance. It is about which questions you can answer at all.

The implementation cost is real. Building a single entity graph that spans films, talent, songs, athletes, brands, creators, and political races means agreeing on identity at a level that most companies never have to agree on. Two crawlers can pull the same press release and call the lead by two different names. The Wikipedia infobox and the IMDb credits list can disagree on whether a director is also credited as a writer. When you resolve across verticals, every disagreement compounds.

We pay that cost because we believe the alternative is worse. A vertical-siloed product is a product that can answer the same questions a spreadsheet could. The reason customers tell us they switched from incumbents is not that our box-office model is meaningfully more accurate than the incumbent's box-office model. It is that we can answer the studio's actual question.

The data backbone for this is a single entity graph with stable IDs, full provenance for every assertion (we know which source claimed which fact at which time), and explicit handling of disagreement (we keep the disagreement, we do not pick a winner). Predictions in any one vertical can pull features from any other.

The right way to evaluate a system like this is not to evaluate the verticals individually. It is to evaluate the cross-vertical questions: how well does the system answer the studio's question, the agency's question, the campaign's question? Those questions do not have clean public benchmarks, which is part of why most incumbents do not optimize for them. We think they are the only questions worth optimizing for.

The case for cross-vertical entity graphs

More from the blog