Book contents
- Frontmatter
- Dedication
- Contents
- List of Figures
- List of Tables
- Preface
- Acknowledgments
- 1 Beginning with Machine Learning
- 2 Introduction to Data Mining
- 3 Beginning with Weka and R Language
- 4 Data Preprocessing
- 5 Classification
- 6 Implementing Classification in Weka and R
- 7 Cluster Analysis
- 8 Implementing Clustering with Weka and R
- 9 Association Mining
- 10 Implementing Association Mining with Weka and R
- 11 Web Mining and Search Engines
- 12 Data Warehouse
- 13 Data Warehouse Schema
- 14 Online Analytical Processing
- 15 Big Data and NoSQL
- Index
- Colour Plates
13 - Data Warehouse Schema
Published online by Cambridge University Press: 26 April 2019
- Frontmatter
- Dedication
- Contents
- List of Figures
- List of Tables
- Preface
- Acknowledgments
- 1 Beginning with Machine Learning
- 2 Introduction to Data Mining
- 3 Beginning with Weka and R Language
- 4 Data Preprocessing
- 5 Classification
- 6 Implementing Classification in Weka and R
- 7 Cluster Analysis
- 8 Implementing Clustering with Weka and R
- 9 Association Mining
- 10 Implementing Association Mining with Weka and R
- 11 Web Mining and Search Engines
- 12 Data Warehouse
- 13 Data Warehouse Schema
- 14 Online Analytical Processing
- 15 Big Data and NoSQL
- Index
- Colour Plates
Summary
Chapter Objectives
✓ To understand the concept of dimension, measure and the fact table
✓ To able to apply different schema of data warehouse designs such as Star Schema, Snowflake Schema and Fact Constellation Schema on real world applications.
✓ To understand the differences between these schemas, their strengths and weakness.
Introduction to Data Warehouse Schema
Logical descriptions of database are known as Schema. It is the blueprint of the entire database. It defines how the data are organized and how the relations among them are associated. Data warehouse schema consists of the name and description of records including associated data items and aggregates. A database uses relational models whereas a data warehouse uses different types of schema, namely, Star, Snowflake, and Fact Constellation.
To start discussion on these schemas, it is important to understand the basic terminology used in this process, which is discussed below.
Dimension
The term ‘dimension’ in data warehousing is a collection of reference information about a measurable event. These events are stored in a fact table and are known as facts. The dimensions are generally the entities for which an organization wants to preserve records. The descriptive attributes are organized as columns in dimension tables by a data warehouse. For example, a student's dimension attributes could consist of first and last name, roll number, age, gender, or an address dimension that would include street name, state, and country attributes.
A dimension table consists of a primary key column that uniquely identifies each record (row) of dimension. A dimension is a framework that consists of one or more hierarchies that classify data. Usually dimensions are de-normalized tables and may have redundant data.
Let us take a quick recap of the concepts of normalization and de-normalization, as they will be used in this chapter. Normalization is a process of breaking up a larger table into smaller tables free of any possible insertion, updation or deletion anomalies. Normalized tables have reduced redundancy of data. In order to get full information, these tables are usually joined.
In de-normalization, smaller tables are merged to form larger tables to reduce joining operations. De-normalization is particularly performed in those cases where retrieval is a major requirement and insert, update, and delete operations are minimal, as in case of historical data or data warehouse. These de-normalized tables will have redundancy of data.
- Type
- Chapter
- Information
- Data Mining and Data WarehousingPrinciples and Practical Techniques, pp. 405 - 415Publisher: Cambridge University PressPrint publication year: 2019