Getting started with Apache Beam for distributed data processing
MapReduce was revolutionary when it was first published in 2004. It provided a programming model for batch processing datasets with terabytes of data. MapReduce was built on three seemingly simple phases: map, sort, and reduce. It used the general-purpose HDFS (Hadoop Distributed File System) file system for I/O and was therefore capable of processing almost any kind of data.