Top
ArticleCity.comArticle Categories Data Warehousing: Tutorial 1 Introduction to Data Warehousing

Data Warehousing: Tutorial 1 Introduction to Data Warehousing

Photo from Unsplash

Originally Posted On: https://durofy.com/data-warehousing-tutorial-1-introduction-to-data-warehousing/

 

Welcome, data enthusiasts! In this digital era where data is the new oil, understanding how to store, analyze, and utilize this precious resource is crucial for every business. That’s where the concept of Data Warehousing comes into play. In this blog post, we’re going to delve into the fascinating world of Data Warehousing, a powerhouse of information storage and management. Whether you’re a seasoned data professional or someone just dipping their toes into the data lake, we’re confident you’ll find value in our discussion. Let’s dive right in!

In this data warehousing tutorial, the definition of a data warehouse, the history & evolution of data warehouse, types, characteristics, advantages, and disadvantages of data warehouses are discussed.

What is Data Warehousing?

In simple terms, a data warehouse is a warehouse of data of an organization where the collected data is stored. This data is analyzed and is turned into information required for the organization. It is a place where data is stored for a longer period of duration. It is also represented as DW in short. Data warehouse is a good place to store historical data which is a result of daily transactional data and other data sources. Data warehouse is an important part of the business intelligence infrastructure of a company.

It is a relational database using which query and analysis can be performed. Data in the data warehouse is used for creating reports. These reports support the decision-making required by the business intelligence department of the company. Data is extracted, transformed, and loaded (ETL) from operational systems to the warehouse which is used for analysis and creating business reports. These tools are also referred to as business intelligence tools.

A number of ETL tools are available for this purpose. The data can be stored in the staging area till it is loaded in the data warehouse. The data in the warehouse can be arranged into a combination of facts and dimensions called start schema. From the warehouse, the data can be further divided into data marts as per the functions of the business.

History and Evolution of Data Warehouse

Data warehouse has taken many years to evolve. It evolved from DSS (Decision Support Systems) and information processing. The whole period started from the year 1960 till 1980. Bill Inmon is called the father of data warehousing concepts. In the early 1960s, magnetic tapes were used to store master files, punched cards were used and applications were built in COBOL. Thus punched cards, paper tapes, magnetic tapes, and disk storage systems were used for holding data. However, the drawback was that the data had to be accessed sequentially and this also used to take more time.

In the mid-1960s, due to the growth of master files, data redundancy increased. This caused a number of problems as the complexity of maintaining programs was increased. There was a need for hardware to support the master files. It was getting difficult and complex to develop new programs. This has aroused the need of data synchronization after updation.

In 1970, there came the concepts of disk storage and DASD (Direct Access Storage Devices). The advantage of this technology was that there was a facility of direct access and the time consumption was also much less.

[DASD led to the development of DBMS (Database Management System) where access, storage, and indexing of data became very easy on DASD. DBMS has the ability to locate data quickly. After DBMS came the Database which was the single source of data for all processing.]

In the mid-1970s, there came a concept of OLTP (Online Transaction Processing) which resulted in faster access of data than before. This came out to be very beneficial for information processing in the business. Computers were used for reservation systems. Other examples include manufacturing control systems and bank teller systems.

In the 1980s, PC or 4GL Technology (Fourth Generation Languages) was developed. In this technology, the end user had direct control over data and systems. There came the concept of MIS (Management Information System) which is used for management decisions and is now called as DSS. At the time, a single database is available for all the purposes.

In 1985, the concept of an extract program was available using which one can search for a query in the database and get the result. It is straightforward to use and delivers high performance. In this process, one searches through a file or a database according to a particular query or criterion. After finding the result, the data is transported to another file or database. Here, the end user can have good control over the system.

Spider Web

Using the extract program was simple to search a query in a database. But it became difficult to process large number of extracts per day. There were extracts of extracts of extracts of extracts and so on. This process is also known as naturally evolving architecture.

Drawbacks:

  • 1. Data Credibility: There were found to be conflicts between the data.
  • 2. Productivity: It was an issue as different skills were required to locate data in multiple files. It was found that same files with different names and different files with same name affect the productivity of result. Also the future reports were not based on first report and every report had to be analysed by observing large number of data.
  • 3. The data was not integrated and also historical data was not found to be sufficient. Thus, it was not possible to transform the data into information.

Then there was a transition from spider’s web environment to data warehouse environment. Spider’s web development can lead to operational database or DSS-integrated informational data warehouse.

Characteristics of Data Warehouse

There are 4 important characteristics of data warehouse. It is subject-oriented, integrated, non-volatile and time-variant.

1. Subject-Oriented

Data warehouse of an organization is subject-oriented. Each company has its own applications and subject areas. For example, an insurance company has customer, policy, premium and claim as some of the subject areas. A retailer has product, sale, vendor etc. as the major subject areas.

2. Integrated

It is the most important characteristic of data warehouse. Data is transferred from operational environment to data warehouse environment after integration.

Before transfer, the operational data is not integrated and requires conversion, formatting, summarizing and re-sequencing etc. It gives an enterprise view of data. The data is looked as if it has come from a single well-defined source.

3. Non-Volatile

The data in the data warehouse is non-volatile which means once the data is entered, it cannot be updated.

4. Time-Variant

Every unit of data is accurate at one moment in time. The data keeps on varying with time but every data is considered to be accurate at the moment it is recorded.

Advantages & Disadvantages of Data Warehouse

Advantages of Data Warehouse

Data Warehouses are a significant asset in today’s data-driven business environment. Here’s why:

  1. Improved Data Quality and Consistency: Data Warehouses allow organizations to clean and standardize their data, leading to enhanced consistency across departments.
  2. Enhanced Business Intelligence: By assembling data from different sources into a single, central repository, Data Warehouses provide valuable insights, improving decision-making processes.
  3. Increased Operational Efficiency: With a Data Warehouse, businesses can swiftly query and analyze large volumes of data, improving operational efficiency and productivity.
  4. Historical Intelligence: Data Warehouses store historical data, enabling businesses to analyze trends and make future predictions.
  5. Scalability and Performance: As businesses grow, so does their data. Data warehouses are designed to scale with that growth, ensuring high performance even as data volume increases. They efficiently handle large amounts of data, ensuring that business processes are not disrupted by data load times or complex queries.

Disadvantages of Data Warehouse

However, like any technology, Data Warehouses come with their own set of challenges:

  1. Costly Implementation and Maintenance: Building and maintaining a data warehouse can be expensive, and may not be feasible for smaller businesses.
  2. Complexity: Data Warehousing involves complex processes and technologies that require skilled technical personnel to manage.
  3. Data Privacy: Aggregating data into one location can increase the risk of data breaches, requiring stringent security measures.
  4. Time-Consuming Data Preparation: Cleaning and transforming data for storage in a Data Warehouse can be a lengthy process, potentially delaying access to real-time insights.
  5. Rigid Structure: Data Warehouses tend to have a rigid structure, making them less flexible to changes. Any modification in data source or business requirements can become a cumbersome process, often requiring substantial time and resources.

It helps in detecting credit card fraud analysis. It is helpful in inventory management. One can get to know the customer profile as well.

The disadvantages of a data warehouse are that it leads to the integration of data from the old legacy environment. It results in large volumes of data.

Kinds of Data Warehouses

There are 4 kinds of data warehouses:

  • 1. Federated Data Warehouse: Unlike traditional data warehouses, Federated Data Warehouses do not store integrated data. Instead, they provide a unified interface for accessing data that remains stored in its original source systems. This approach provides flexibility and can reduce the time and cost associated with data integration. However, it may also lead to challenges with data consistency and quality, as the data is not consolidated and cleaned before being accessed.
  • 2. Active Data Warehouse: This type of data warehouse enables real-time data processing and updating, making it an excellent choice for organizations that require instant insights for quick decision-making. With an active data warehouse, data is continuously updated, allowing for a more reactive approach to business intelligence. However, this real-time processing capability can increase the complexity of data management, and it requires robust systems in place to ensure data quality and integrity.
  • 3. Star Schema Data Warehouse: This type of data warehouse utilizes a design that consists of one or more fact tables referencing any number of dimension tables, thereby creating a star-like schema. This structure facilitates the analysis of large datasets by organizing data into a few clearly defined categories. The central fact table contains the core data that is of most interest (for instance, sales data), with the surrounding dimension tables providing descriptive attributes (like product details or customer information). However, while the star schema is simple to understand and query, it may involve data redundancy, and it’s not ideally suited for handling hierarchical relationships.
  • 4. Datamart Data Warehouse: A Datamart is a smaller, more focused version of a data warehouse that typically addresses a specific area or department (like sales, finance, or marketing) within an organization. It uses Online Analytical Processing (OLAP) to provide multidimensional insights into business operations. With OLAP, users can perform complex analytical queries quickly, and they can analyze data in multiple dimensions, such as viewing sales by region, by product, or over time. While a Datamart is less comprehensive than a full-fledged data warehouse, it is typically quicker to set up and requires less investment, making it a viable option for smaller businesses or for those looking to pilot data warehousing before full-scale implementation. However, care must be taken to ensure that the data in different Datamarts remains consistent and that it can be integrated if a full-scale data warehouse is later implemented.

Basic Architecture of Data Warehouse

A data warehouse has a number of components viz. ETL technology, ODS, data mart etc. A basic data warehouse has data sources, warehouse and users. To process the operational data before it can be entered into data warehouse, the data can be placed on staging area.

To customize the data in the data warehouse, one can use data marts that vary with each function of the business. For example, data mart sales, data mart inventory etc.

ETL Technology

A data warehouse uses ETL technology which constitutes the processes of extraction, transformation and loading of data. Application data from the legacy source systems is transferred using ETL that leads to corporate data.

Different processes are done using ETL technology viz. logical conversion of data, verification of domain, conversion of one DBMS to another, creation of default values when needed, summarization of data, addition of time values to the data key, restructuring of data key, merging records and deletion of extraneous or redundant data.

ODS

ODS stands for Operational Data Store. It is a place where online updation of integrated data is done using OLTP (Online Transaction Processing) response time. Application data is transferred to ODS using ETL which produces data in integrated format. It involves high-performance processing and update processing.

Data Mart

Data mart is where the end user has direct access and control of his or her analytical data. There are different data marts according to the different departments viz. finance data mart, marketing data mart, sales data mart, etc. Data mart has less data than data warehouse. It contains a significant amount of aggregated and summarized data.

Continue reading the next tutorial for architecture environment, monitoring of data warehouse, structure of data warehouse, granularity of data warehouse, structure and components of data warehouse.

Conclusion

In the complex world of data management, having a clear understanding of concepts like ETL, ODS, and Data Marts is crucial. These tools and methodologies enable us to handle vast volumes of data, streamlining the process of extracting, transforming, and loading it where needed. This can lead to more efficient and informed decision-making processes within an organization. In this journey of exploring data warehousing, we’ve scratched the surface of its potential, but there is so much more to discover. As we move deeper into the digital age, the role of data warehousing will continue to evolve, becoming an even more integral part of how businesses operate and strategize. Stay curious and keep learning because the world of data warehousing is vast and continually evolving.

Click here for more!

No Comments

Sorry, the comment form is closed at this time.