Data is one of the most important company resources. It serves as a basis for strategic decisions and offers huge potential to increase sales and profits. However, this treasure of gold is often neglected and creates a hidden existence.
Only through efficiency Data management In the company can be successfully salvaged and put to good use.
In the context of digitization, companies are increasingly faced with more complex and expansive IT systems to support their business operations. Data and information have become critical to successful business operations and form the basis of technological trends such as Industry 4.0. The increasing growth of data provides companies with new capabilities to realize business models and to systematically learn from data for faster and better decision-making and adjustment processes. However, data must not only be collected for this purpose but must be processed into high quality information and transformed into decisions. The ability to analyze data automatically is essential to derive cause-and-effect relationships from data from different sources and to predict future events.
However, in many companies, there are often hurdles that stand in the way of the profitable use of the existing database in the operational context. Heterogeneous system landscapes, which distribute company data to many software applications in different formats, structures, and storage locations, are more the rule than the exception in practice. Data management is often limited to managing issues that arise with incorrectly recorded and outdated information as well as managing duplicate data. Business decisions are then based on manually generated reports that summarize information from various data sources and systems. The most comprehensive and automatic added value, which requires database integration and cleaning, cannot be directly implemented in such a scenario.
A data-centric transformation of an organization requires basic knowledge of data classification, modeling and integration as well as knowledge of different practices and tools for managing and analyzing it. Obtaining and accumulating this knowledge in the company is an essential prerequisite for being able to assess and analyze the data landscape of your company. Based on this, measures can then be taken to consolidate and increase the quality of the data stock, and automated data management processes and tools can be established in the company. The goal should be to obtain direct benefits for existing and new business models from the database developed in this way. The following sections describe the aspects that play a major role in managing data efficiently in a company.
Database management systems, data models and metadata management
The vast majority of all (structured) data in companies today can be found in relational data Databases, which are managed by appropriate relational database management systems. The salient characteristics of these systems are:
- Data application independence. In the early days of data warehousing, data was stored in simple operating system files. The internal structure of these files varies from program to program, depending on the format and character set that the programmer came up with (eg byte position 1: last name; byte position 17: first name; byte position 37: street, etc). For other programs or programmers who did not know this format, the data was nothing more than a string of zeros and ones. Eventually during the Apollo moon missions of the 1960s, when hundreds of suppliers and tens of thousands of parts had to be managed, it was recognized that this form of programming could no longer be mastered. The result of this evolution has been database management systems that not only manage the actual data, but also the metadata (data about the data), which contains the structure of the managed data. Since then, any number of programs can be written “against” a database containing the data of all the data needed in the programs.
- Relational data model. The data is stored in tabular form, with each row corresponding to a record (such as people with an address) and each column to an attribute (such as a house number). The data types of the attributes are precisely defined (for example, strings of characters/texts of a specified length or numbers with a specified number of decimal places). Each table receives a unique key (eg, employee or article number) as an attribute or set of attributes, whose values uniquely identify each data record. The relationships between these tables are then created using what are called foreign keys: between the department table and the employee table, for example. A relationship is created, for example, in that the employee data records each contain the key of the department to which the employee in question belongs as an attribute. The structure of all tables and relationships in the database (“database schema”) can then be shown very clearly using a so-called entity relationship diagram, which shows the “entities” (departments, employees, etc.) stored in a database using dependency arrows relating to each other .
- SQL Query Language (Structured Query Language), This allows programs or programmers to create database tables (“data definition”) and write data to them (“data manipulation”) in a uniform way. Both tables (structures) and the data they contain can of course also be changed or deleted via SQL.
- The acid principle (atomicity, consistency, isolation, durability). The database management system ensures that
- Database operations are always completed.
- The database is always in a consistent state
- Parallel accesses are independent of each other
- Changes are saved permanently.
The questions that the company should ask itself in this context are for example b.
Does the data already exist in a relational database system?
Even today, there are still legacy systems in which data is managed in simple files, eg B. can be queried using the old programming language COBOL. Retired programmers who should be brought out of retirement because they are the only ones who know data structure and also master COBOL are here for example. B- the ring.
Does the database schema meet current requirements?
Databases are often used for example B. They are managed via an ERP system or CRM which assumes a specific data structure. If a company’s business processes change, the systems used must adapt to the new processes. With older systems, the required functionality often cannot be implemented because it has not been taken into account in the underlying database schema. The staff then has to work passively “along with” the system (eg in office documents) in order to record the required data. The only way out here is usually to switch to a new system.
Are the business rules that regulate certain data formats (such as article numbers) (i.e. metadata), documented and are they also observed?
One often finds z. Tables with very creative key attributes (such as including certain subsets, properties, etc.) that make automated processing difficult.
Data lifecycle management, data quality, and data management
Data lifecycle management means (very roughly) managing the processes by which data is created, processed, or archived/deleted. The questions that arise in this context are for example, for example:
- How is the data collected (eg by one or more persons or automatically)? If this detection is done efficiently, i.e. z. B. Automatic interception of incorrectly formatted or twice entered information? If not, for example inconsistent formats in phone domains or even scripts that make automated processing (eg by phone system) more difficult. Duplicates can also appear here if a check is not made when entering whether the data record already exists in the system.
- What happens to the data while it is being used? Was missing information added or incorrect information corrected? For example, address data can age over time if address changes are not recorded. Is the data needed for the analysis available at all?
- What happens to the data after the end of their lives? will eg b. Have the legal requirements requiring deletion of data after a certain operation or a certain period of time been complied with? Old data still haunts the database, for example b. It is only supplied with an “delete” flag, which means that the data stock is growing rapidly and becoming confusing?