Data Lakehouse

A data lake house combines together the best bits of data warehouses and data lakes.

Data Lake

Data Lake is the name we give to a collection of tools that are often used together to process large amounts of data. Typically it includes a storage system like S3 or HDFS and a processing system like Apache ~~Spark.~~Spark or Hadoop.

Store lots of data - often in its raw "unprocessed" form in pseudo-real-time
Process a subset of data in real-time or in batch modes
Provide language-agnostic language runtimes for data analysis.

Data Warehouse

A data warehouse is usually where data that has been processed is stored and used directly in downstream applications. Data warehouses don't scale easily and typically have a lot more validation and processing associated with them.

Data Lakehouse

A data lakehouse attempts to combine elements of both Data Lake and Data Warehouse - again it is typically the name given to a group of systems architected together to provide this set of functionality. It normally supports Extract, Load and Transform paradigm.

Data Lakehouse

Data Lake

Data Warehouse

Data Lakehouse

References