Designing a pipeline

What are elements the inside a pipeline? PCollection, PTransforms, IO Transforms
What does a Runner do? Determines what back-end your pipeline will run on
What are Dataflow transforms called? PTransforms
What is a PTransform's output called? PCollection
Do PTransforms "consume" PCollections? In other words, do they support random access? No, they consider each individual element of a PCollection and can apply different transforms to them
What are tagged outputs? a single transform output to multiple PCollections
What is a Flatten transform? merge multiple PCollections of the same type.
What is a CoGroupByKey transform? relational join of multiple PCollections of the same key type
What is a root transform? A root transform creates a PCollection from either an external data source or some local data you specify.
What are the two kinds of root transform? Read and Create. Read transforms read data from an external source, such as a text file or a database table. Create transforms create a PCollection from an in-memory java.util.Collection.
Can pipelines consume batch or stream? both
What is a bounded PCollection? A fixed data source, they are processed using batch
What is an unbounded PCollection? A data source that constantly updates, they are processed using stream
Can Pipelines share a PCollection? No, they are individually owned by a Pipeline
Can elements in a PCollection be of a different type? No
How do you add elements to a PCollection? You can't, They are immutable. A PTransform needs to process it to create a new PCollection
How does Beam consume streaming data? Beam uses windowing to divide a continuously updating unbounded PCollection into logical windows of finite size. These logical windows are determined by some characteristic associated with a data element, such as a timestamp. Aggregation transforms (such as GroupByKey and Combine) work on a per-window basis — as the data set is generated, they process each PCollection as a succession of these finite windows.
What is a fixed time window? Given a timestamped PCollection which might be continuously updating, each window might capture (for example) all elements with timestamps that fall into a 30 second interval.
What are sliding time windows? A sliding time window also represents time intervals in the data stream; however, sliding time windows can overlap. For example, each window might capture 60 seconds worth of data, but a new window starts every 30 seconds. The frequency with which sliding windows begin is called the period. Therefore, our example would have a window duration of 60 seconds and a period of 30 seconds.