Trying to query users from a TB scale cloud table. Based on thousands of different segments and generating a daily feed for each of these segments. At the end of every day we query the table to say for e.g.
Create a subset of all users living in USA and generate a file for it.<p>We use to query the database every night but the number of segments have become so huge that its no longer feasible to batch query this data every night and populate segment feed.
I am thinking of moving to Streaming architecture where users are assigned to the segment as they come in. For that I am looking to load segment definition into a graph data structure and determine which user qualifies for which segment. Any ideas if a graph will be appropriate. Vertices will contain filters e.g. country:USA, and segment_id. And edges will represent AND case between filters.<p>I do realize Amazon Kinesis has a continuous query model on its stream. Due to some condition/limitations I am limited to Google cloud for this use case.
I will be attempting to do this in Google cloud dataflow pipeline.<p>Any critique/suggestion will be appreciated.