Data Warehouse – what it is and what it is used for?
Data warehousing is one of the more common topics in the IT industry. The collected data is an important source of valuable information in many companies, thus increasing their competitive advantage. More and more companies use Business Intelligence (BI) systems in their work, which quickly and easily support the analytical processes. BI systems are based on data warehouses and we will talk about them in today’s article.
What is a data warehouse?
A data warehouse is a tool that collects and stores large amounts of data from different, dispersed sources in one place. The collected data is structured and stored in a thematic division. Data warehouses enable reporting, as well as conducting analyses, which constitute an important source of information in enterprises, thus supporting the process of making business decisions. It is worth mentioning that data warehouses are only used to read data, so the user has no possibility to modify them.
Data Warehouse characteristics
Taking into account the definition of a data warehouse we can read out four main features that characterize it. These are:
- subject orientation – the data collected in the data warehouse is organized around main topics such as sales, product or customer;
- integrity – the data stored in the warehouse are uniform, e.g. in terms of format, nomenclature, coding structures (they are standardised before they reach the warehouse);
- timeliness – the data comes from different time frames, the data warehouse contains both historical and current data;
- non-volatile – the data in the warehouse remains unchanged, the user has no possibility to modify it, so we can be sure that we will get the same results every time.
Architecture and operation
In the architecture of a data warehouse, four basic components can be distinguished: data sources, ETL software, the appropriate data warehouse and analytical applications. The following graphic shows a simplified diagram of the data warehouse structure.
As can be seen from the graphic above, the basis for building each data warehouse is data. The sources of this data are dispersed – they include ERP, CRM, SCM, or Internet sources (e.g. statistical data).
The downloaded data is processed and integrated and then loaded into a proper data warehouse. This stage is called ETL process, from the words: extract, transform and load. According to the individual stages of the process, data is first taken from available sources (extract). In the next step, the data is transformed, i.e. processed in an appropriate way (cleaning, filtering, validation, or deleting duplicate data). The last step is to load the data to the target database, i.e. the data warehouse.
As we mentioned earlier, the data collected in the data warehouse is read-only. Users call data from the data warehouse using appropriate queries, thus obtaining data presented in a more friendly form, i.e. reports, diagrams or visualizations.
As the main task of a data warehouse, the analytical data processing (OLAP, On-Line Analytical Processsig) should be distinguished. It allows to make various types of summaries, reports or charts presenting significant amounts of data, e.g. a sales chart in the first quarter of the year, a report of products generating the highest revenue, etc.
The next task that data warehouses perform is decision support in enterprises (DSS, Decision Support System). Taking into account the huge amount of information that is in the data warehouses, they are a part of the decision support system for companies. Thanks to advanced analyses conducted with the use of these databases, it is much easier to search for dominant trends, models or relations between various factors, which may facilitate the management’s decision making.
Another of the tasks of these specific databases is to centralize data in the company. Data from different departments/levels of the company are collected in one place, thanks to which everyone interested has access to them whenever he or she needs them.M/big>
Centralisation is connected with another role of a data warehouse, which is archiving. Due to the fact that the data collected in the warehouse comes from different periods of time and the warehouse is supplied with new, current data on an ongoing basis, it also becomes an archive of data and information about the company.
Data warehousing is undoubtedly a useful and functional tool that brings many benefits to companies. Implementation of this database into your company may facilitate and speed up some of the processes taking place in companies. Taking into account the enormous amount of data and information that is generated every day, data warehouses are a perfect answer to store this information in one, safe place, accessible to every employee.
- [graphics] https://pixabay.com/photos/archive-boxes-documents-folders-1850170/