DATA AND ITS USES

Topics Covered:

Where does business data come from?
How is business data stored and organized?
How is business data used in decision-making?
Business intelligence, data analysis, and data visualization

Modern business thrives on the collection and use of data. Data is pervasive throughout the business process, from completing payroll to informing strategic planning. But what is data? Where does data come from? How do collect it, and how can we use it? In this chapter, we’ll explore answers to these questions and more.

While there are many definitions available, here we’ll consider data as simply values representing a set of facts or concepts (Glossary: Data vs. Information, n.d.). Given this, it’s no wonder data use is ubiquitous: it covers virtually everything! Every purchase, every message, every process, and every click, all create data that we can use.

To understand how data supports business activity, it is useful to think in terms of the data lifecycle. The data lifecycle is simply the broad sequence of stages data passes through from the moment it is created to the moment it is archived or deleted. Although terminology varies across industries, the core elements are pretty standard, and consist of creation and/or acquisition, storage and organization, analysis and interpretation, and use and feedback. Data is generated through transactions, sensors, systems, and human activities. It is then it is stored in files and/or databases. Then, we analyzed to produce insight. Finally, we interpret those analyses and use them to inform decisions (which, in turn, shape the next round of data creation). This cyclical structure reminds us that data is never static, because its meaning and value evolve over time as organizations interact with it.

Importantly, the data lifecycle underscores an idea will thread throughout this chapter: data is not inherently valuable on its own. Its value depends entirely on the accuracy of its creation, the quality of its storage and organization, the methods used to analyze it, and the decisions and actions it ultimately informs. Poorly collected or poorly interpreted data can mislead as easily as good data can enlighten. In this sense, each stage of the data lifecycle shapes the next, forming a continuous loop where businesses learn, act, and generate new information. The structure of this chapter follows this flow, beginning with where data comes from, moving through storage and analysis, and ultimately ending with how data supports decisions.

3.1 Sources of Data in Business

Businesses collect and utilize data from a variety of different sources, which can easily be as unique as the businesses themselves. Understanding these data sources is often the first step to using the data to guide decision-making. As such, we’ll discuss several common data sources and provide examples for each.

All businesses have access to data produced by the company itself, which we’ll call internal data. There are many examples of this type of data. Consider that each transaction a business completes provides a new data point, which can cover sales, payroll, and even inventory management. Much of this transaction data is recorded automatically, allowing for complete and real-time data relating to the operations of a company. In addition to simple transactional data, companies also gather information regarding their overall finances, such as revenues, assets, and liabilities. This information can be used for internal decision-making, but is also sometimes reported externally investors and/or is required to be tracked by regulators. In a different vein, data on internal structure such as management structure, employee information, job titles, salaries, performance evaluations, and more are meticulously recorded and tracked within each company. While this information may come in many formats, all of it is useful data that can assist the company in many of its human resource initiatives. As we say before, these are only a few examples of the many types of internal operations data that many companies collect on a daily basis.

In addition to data regarding internal operations, many companies track information on the interactions that they have with their customers, providing another source of data. A company may track customer data including their name, address, amount spent, which items were purchased, time of purchase, and more. In addition to direct sales data, companies also track data on customer outreach, such as marketing or engagement campaigns. Each engagement with an e-mail campaign, mailed coupon, and web advertisement is tracked and recorded as data, providing insight into what material customers engage with. Costs of outreach are also recorded, providing information on the return on investment (ROI) of varied engagement strategies.

As computers have continued to advance, they continue to provide evolving sources of new data. For example, web analytics can now provide data on not only page views, but also on navigation paths, average session duration, scroll rates, and more. Computer systems themselves can report on server performance, peak times of activity, and even store data related to cybersecurity (such as repeated password violations, access to restricted systems, and more).

While many of the above data encompasses types of data that are recorded directly by a company (or an entity contracted by a company), companies also utilize external data, or data that is acquired outside the company. This can be as simple as accessing census information provided by the government to provide context about the demographics of a target population, or as complicated as arranging data sharing agreements or purchasing data from third-party brokers to supplement the data collected internally. As this data is available to all companies, it is especially useful when combined with internal data to guide strategic decision-making.

For planning and decision-making purposes, businesses often seek to incorporate many sources of data. For example, taking advantage of a new market requires knowledge of the market opportunities (informed by external data) as well as capacity and tracking of how well the company can address the need (informed via internal data). This combination of information can be complex, requiring the combination of data from multiple sources utilizing a variety of analysis techniques. Data quality, or the accuracy and completeness of the information for the use, also plays a large role when aggregating data from so many potential sources, and must be evaluate for each new data source and use case. As you can imagine, this myriad of challenges can easily be overwhelming! In spite of the complexities, companies continue to seek more and more sources of data to guide strategy and incorporate into their business practice.

3.2 Storage and Organization

It may surprise you to hear, but there are almost as many ways for data to be stored as there are sources from which we pull data! In this section, we’ll discuss several common data storage solutions and how they relate to one another. We’ll also discuss how several of these are implemented in an industrial context. First, though, we’ll need to discuss data and its structure.

3.2.1 Data Structures and Flat Files

Often when we think of data, we think of something like a list of numbers or a table in Microsoft Excel. And rightly so! But data can also include things such as an audio-recording from a podcast, a video/image, text from an email/website, or a data stream from a sensor. While both of these are considered “data”, there seems to be something different about the way information is stored in each. In our terms, we can refer to the first examples as structured data. Again, this is what many typically think of as data, and typically is information that can fit neatly into tables. The second type of data is called unstructured data, and can be somewhat more complicated to capture and analyze, as it doesn’t necessarily follow any predetermined rules (such as having a certain number of rows or columns, or even having rows or columns to begin with!). While unstructured data is still considered data, by its very nature it is difficult to systematically organize. Because of this, as we discuss data storage, we’ll primarily focus on examples of storage for structured data (though we’ll have some examples for unstructured data as well!).

The most basic type of data storage consists simply of a table of information (think: a sheet in Excel). This format of data organization is often called a flat file. These files can store data in different ways using different techniques to differentiate where data are located within them. For instance, a delimited file uses a character (e.g., a comma to create a comma separated values (CSV) file, or a tab for tab-delimited data) to differentiate between one data value and another, while a fixed-width dataset differentiates data values by ensuring that they are always a certain length (e.g., name is always 20 characters, age is always 3 characters). These formats can also take on more complicated form, such as a spreadsheet which can track not only the data values but formats applied to them (e.g., bold, highlighted).

As data structures become larger and more complicated with many flat files for the many uses and collections of data, flat files became more and more difficult to maintain. Businesses began to utilize databases, or a collection of related data that is stored at one or more locations. The utilization of databases a logical framework that companies could use to not only record information, but also link it to previous records, update information in a single location, and retrieve it much faster than was previously available. While flat files are still in common use (often for viewing an analysis purposes), many companies have turned to databases as the primary organizational structure for their data.

3.2.2. Databases

While databases can theoretically consist of any types of logically structured data organization, most discussions of databases revolve around the software that allow users to interact with the database, called a database management system (DBMS). While there are many examples of DBMS (e.g., MySQL, PostgreSQL, MongoDB), these are largely combined by their utility in creating, storing, maintaining, and providing access to data stored in their software.

Many database types utilize similar language when referring to the information stored within them. Here are a few common terms…

Tables – Similar to flat files, tables (sometimes called relations) consist of rows and columns of data (e.g., customer contact information)
Records (row) –Indicate a distinct entries (e.g., different customers)
Fields (column) – Indicate different attributes, categories, or variables each record could have (e.g., name, e-mail address)
Primary key – A field containing a unique value (e.g., id number) for every row in the table. There can only be one primary key per table.
Foreign key – A field containing values that refer to a primary key for another table. There can be multiple foreign keys per table (as each record can connect to multiple other tables).

While there are several types of database organization, a common method of organization is through the use of a relational database. A relational database stores structured data into a series of tables which may have relationships between them (defined using foreign keys). Most DBMS utilize a form of structured query language (SQL) to interact with data in relational databases managed by the system. The SQL allows users to interact with the database through searching, updating, and/or retrieving information stored within it. Though SQL itself is a simple language, queries built with SQL can become complex in execution.

While relational databases can work well for structured data, when storing unstructured data or dealing with data that are particularly large or rapidly changing, non-relational databases may be appropriate. While these designs may lose some structure and integrity, they provide a flexible storage solution that may scale to large or fluid datasets better than their relational counterparts. These databases are often referred to as NoSQL (i.e., not only SQL) databases, and store information in non-tabular forms. This loosening of structure works particularly well with unstructured data, though can come at the cost of reliability. Because of this, database administrators at companies often select the database structure that best optimizes the use-cases for that particular company’s needs.

3.2.3 Database Storage

As technology regarding data have continued to evolve, so too has the needs for secure and accessible data storage (think: the physical computers the data will be stored on, vs the above section which discussed the software and logical structure). Traditionally, companies were required to maintain databases (and backups) themselves, storing information on local servers and performing all organization, maintenance, and security in-house. Now, cloud-based storage systems offer additional options for companies to consider.

There are many considerations companies can make when determining physical vs cloud database storage. Physical storage offers direct access to information, low relatively latency, and higher access speeds, but requires companies to be in complete control of their information, including providing maintenance, create backups, and protecting information security. Though potentially beneficial, this solution could be either a pro or con depending on the infrastructure companies have to support these tasks. Cloud storage, on the other hand, offers to provide many of these services for an ongoing fee. Because cloud storage companies (e.g., Amazon AWS, Google Cloud, Microsoft Azure) specialize in this, they have a virtually unlimited amount of resources available to support database storage which can allow companies to scale data storage as needed, without purchasing additional servers or staff (simply paying an increased fee). Also, given their specialization, these companies are experienced with maintaining security and redundancy within their solutions. Note that these solutions are not mutually exclusive. Businesses can also implement a hybrid model, allowing companies to store the bulk of information in the cloud while maintaining local storage of particularly sensitive or timely data.

In the end, the “best” data structure and storage solutions depend on the needs of the company. This decision can be a make-it or break-it foundation for data use in the company, as overly simplified structures will not provide the data capacity and information robustness needed for large-scale data, while overly complex structures can create unnecessary barriers to data access and utility. Such mismatches exemplify how the alignment between data solutions and institutional need is a critical consideration in positioning data as a usable resource for decision-makers.

3.3 Decision-Making with Data

Incorporating data into decision-making can be one of the most valuable cultural shifts a company can make. As companies seek to take advantage of this valuable resource, however, confusion can arise about how to make decisions based upon data. In this section, we’ll discuss the process of making decisions using data, and how to avoid several pitfalls that may arise along the decision-making process.

3.3.1 The Value of Data

To start, we must first consider how data can provide value to companies. Consider the following example: Company A has been told that “data” is extremely valuable for their business, and so has spent the last three years collective vast amounts of data on everything from inventory to employee device usage to web traffic. In examining data access records, none of the data has been accessed after it has been stored.

In this case, would you consider the data to be valuable to Company A? As for me, the answer is a firm: heck no! The data has cost money to obtain and store, but has never been used in a way that helps to inform or improve the business practice, and thus has never brought value to the business itself. I love this example for two reasons. First, because this example (farfetched as it may seem) is actually quite commonplace. It has evolved from a broad understanding that data is “valuable”, but a lack of understanding of the mechanism by which we turn data into value (and lack of knowledge that this process is necessary). Second, I love this example because it illustrates the crux of the issue: data, in isolation, is not valuable. The true value of data is in the decisions that you can improve through the use of that data.

3.3.2 The Decision-Making Process

Through this lens, data becomes a tool to support human decisions. That is to say, data (as valuable as it is) should not be the only factor in a decision! No matter how good your data are, they cannot fully capture the complete context, implications, or trade-offs of a decision. Results from data simply bring numerical summaries of patterns and/or relationships that can (depending on the accuracy of the data) improve precision and reduce subjectivity. Thus, as we seek to make sound, data informed decisions, we combine the experience, intuition, and contextual understanding of decision-makers with the precision and (relative) objectivity that data can provide.

As we discuss decision-making with data, it is important to be clear that this is a process rather than an event. Though each company may have a unique approach, an example process is presented below.

Problem is identified. Every decision begins with a problem/question/possibility to be investigated. In order for data to have the highest impact, these problems are clearly identified ahead of time. Decision-makers must determine specifically what they are trying to decide, what metrics they believe to be relevant, any particularly salient contextual factors that should be accounted for, and more. Poor definition of a question can undercut the power of data to inform the decision.

Data are collected/aggregated. After we are firm in the question and metrics, we obtain the relevant data as it relates to the problem. This process could be as simple as accessing data from our databases (i.e., we already have the data), or as complex as designing an entire study and data collection protocol to begin an extensive data collection necessary. Of course, companies need to use the importance of the decision (and the data needed to inform it) to find a balance between the time/resources companies are willing to allocate to obtaining data and the relevance, precision, accuracy, and objectivity the data will bring. For low importance decisions, we may simply have to use what information we have (even if not perfect), and such a tradeoff would be considered when making a final decision.

Data are prepared/analyzed. After the data have been obtained, they are cleaned (i.e., prepared for the next step) and analyzed (i.e., meaningfully summarized). This process may involve the use of various statistics and/or graphics to find patterns that relate to the guiding questions. In this step, it is important to acknowledge that there is rarely a single approach to be taken in analyzing data. Judgement calls play an important role in decisions such as balancing statistical complexity with understandability, determining which patterns to highlight graphically (and how to do so), and determining the correct interpretation of results in light of the broader context.

Results are shared. Once thedata have been analyzed, the results of that analysis are communicated to decision-makers. This step ensures that decision-makers are aware of all parts of the process, including what data were collected (and from where), how they were analyzed, and what the results indicate. It is important that this information is included clearly, as information on error margins, limitations, and uncertainties that remain are critical for decision-makers to understand as they weight the relevance that data results have on the decision. The importance of this step in the process cannot be overstated, as miscommunication can lead to ill-informed decisions just as readily as misinterpretation of an analysis or statistic.

Decisions are made. After results have been shared, decision-makers integrate this information into their broader contextual knowledge/experience to determine a course of action. Again, these discussions incorporate the data and analyses (in light of strengths and weaknesses) while evaluating other factors (potential risks vs rewards, market conditions, etc) to make a decision.

Situation is monitored. After the decision is made and implemented, we can continue to collect data on the impact that the decision has had. This monitoring step is critical in ensuring that decisions have had the intended effects, and allows for early mitigation of unintended consequences should they arise. An iterative approach like this is focused on continuous improvement regarding the issue at hand, but can also be used for more. As we reflect on the way in which this decision was made and the impact that it had, we can identify the strengths and weaknesses of the process and use them to improve when making future decisions.

The process outlined above illustrates how decision-making using data is a process (rather than a fixed event). Though companies vary in specifics, these generic steps show many of the key components in data informed decision-making.

3.3.3 Where Data Improves Decisions

There are a wide variety of types of decisions that data can help inform. To determine if a particular situation is one that can be data informed, the situation must have data that are both relevant and available. This means that the data must exist, there must be time and resources available to gather/analyze it, and the results must be such that they contribute to the understanding of the situation. Cases that meet this definition are as variable as data themselves (so we won’t try to list them comprehensively here), but below we’ll discuss a few broad ways we can think about the data and situation to help get you started thinking.

It can be helpful to understand the ways in which you would like data to support a particular decision. Broadly speaking, data can support decisions in three ways: describing what has happened, explaining what has happened, or predicting what will happen. Descriptive analysis describes what has happened in order to provide information on where things stand, while explanatory analysis takes this a step further, assessing patterns and relationships to explain the factors that contributed to a certain outcome. Predictive analysis goes even further, using these patterns to predict what will happen in the future (based on what has happened previously). Each of these approaches can provide valuable insight into decisions.

In addition to considering the ways we would like to use data, it can also be helpful to examine the specific events in which data can be useful (or, said another way, we can consider when we make business decisions). As implied above, data can be used when considering specific strategic or opportunity driven decisions, especially when comparing between multiple well defined options. This can also arise at specific points in time, when an event (positive or negative) causes a re-evaluation of the status quo to be necessary. With all this said, data can also be useful for decision-making even when there isn’t an explicit decision to make! Consider the use of a monitoring report containing basic descriptive analysis: such a report may form the basis for a variety of continuing operational decisions (or even major overhauls, depending on the nature of the results) even if there was no inciting question to begin with.

3.4 Business Intelligence, Data Analysis, and Data Visualization

In the previous section we’ve discussed how organizations can implement data based decisions utilizing a combination of stakeholder experience/judgement with empirical results based on data to come to a decision. In this section, we’ll address a part of that process in more detail. Namely, how do we make sense of data once we have access to it? In the process of decision making, many of these operations and techniques are hidden behind the scenes. With this said, understanding of the process can help analysts and stakeholders alike to improve the utility of data and their summaries.

3.4.1 Data Analysis

Core to all of this is the concept of data analysis, a term which we’ve used previously but have yet to formally define. Data analysis is simply the process of organizing and interpreting data to address a question. While this term can sound intimidating, it really is just about “making sense” of the numbers you have in a way that ultimately connects to the problem!

For the simplest example possible, consider how we could summarize earnings at a lemonade stand. We could calculate the average daily earning (calculating a mean, a descriptive analysis), we could see if it changes based on weekday vs weekend (calculating two means, an explanatory analysis), and we could use the information to predict what we’ll earn tomorrow (a predictive analysis)! All of these could be done by you. Today. And all of these count as “data analysis”!

While access to more advanced analytics and techniques allow us to address more subtle or nuanced questions (e.g., What role would weather or traffic patterns play? Do these roles change based on weekday vs weekend? Given daily variability in earnings, what is a reasonable range of earnings to expect?), it is important to realize that all goals are the same: to simplify the data in a way that can be used. As George Box says, “All models are wrong, some are useful.” The goal in analysis isn’t necessarily to be perfectly right (or use the fanciest statistic). The goal is to be useful.

Software for Data Analysis

Today, analysts rely on a wide range of tools to support this work. While simple calculations can be completed by hand or in spreadsheets, most modern business datasets are far too large, complex, or fast-changing to be analyzed manually. For example, a spreadsheet can comfortably handle a few thousand rows of data. Can you imagine typing this in to your calculator by hand?! Even so, such data would be relatively small-scale for businesses. Many companies now generate millions of records per day from transactions, websites, sensors, or internal systems! Because of this, effective analysis often requires the utilization of software capable of storing, processing, and analyzing large datasets efficiently. Popular options include…

R – A free, open-source statistical programming language widely used in academia and industry for analysis, visualization, and modeling.
Python – A general-purpose programming language with powerful data libraries (e.g., pandas, NumPy, scikit-learn), making it a flexible tool for analytics and machine learning.
Spreadsheet tools (Excel, Google Sheets) – Still heavily used for smaller datasets, early exploration, and basic modeling.
Statistical software (SPSS, SAS, Stata) – Traditional tools for structured analysis, especially in regulated industries.

These tools differ in flexibility, accessibility, cost, and learning curve. What unites them is that they allow analysts to work at a scale that manual analysis simply cannot handle (e.g., processing large datasets, running complex calculations, producing results quickly). As organizations continue to generate more data, these software tools become essential for transforming raw information into useful insight.

3.4.2 Analytic Concepts

As we discuss maximizing the utility of our analysis, it is important to understand some fundamental verbiage that is used in the field. Let’s start with the distinction between a population and a sample. A population is a collection of the entire group of cases of interest for interpretation. For example, to understand how much money we made last quarter, we may gather all records of last quarter’s sales (all relevant records). A sample is a collection of cases that do not represent the entire group of interest for the interpretation. For example, we may be interested in collecting information regarding customer sentiment on a particular new product line, but are only able to contact (or get responses from) a portion of our customer base. In this case, the “entire customer base” would be the population, and the individuals we get data from would constitute the sample.

While both populations and samples can be useful, they have differing strengths and weaknesses that are important to consider. By definition, analysis of population data (descriptive or explanatory) does not seek to determine broader predictable trends or patterns: it simply describes what happens. Whether descriptive or explanatory in nature, statistics of this type (describing what happened) are often called descriptive statistics. Conversely, as a sample from a broader group isn’t of interest in-and-of itself (otherwise it would be a population), we use statistics based on samples try to make connections to broader populations. This process is called inference, and statistics designed for this purpose are often called inferential statistics. While samples can be very useful, they also have unique limitations (e.g., random variability, population representativeness) which arise from not having access to all participants in the population (many inferential methods deal with these through statistical assumptions). Consideration of these limitations is important when determining the information provided by data from samples in the decision-making process.

Understanding the mechanics of descriptive and inferential statistics is only part of effective analysis. The next step is turning those results into something meaningful. This is where statistical storytelling becomes essential. Statistical storytelling is the practice of interpreting numerical results in a way that connects them back to the original question, clarifies why they matter, and communicates what can (and can’t) be concluded. Even the most precise statistical output is only useful if it is placed within a narrative that helps stakeholders understand its implications. For example, a descriptive statistic may show that a value increased, but the story explains whether that increase is expected, whether it is meaningful, and what might be driving it. Similarly, an inferential result may indicate a relationship between variables, but the story helps interpret the strength, direction, limitations, and business consequences of that relationship.

Effective statistical storytelling depends on joining two forms of understanding: knowledge of the data and analysis, and knowledge of the context and goals. On one side, analysts must recognize what the data can legitimately support, and deeply understand its strengths, weaknesses, limitations, and the assumptions baked into the methods used (both the sampling and analytic methods). This makes clear how far the results can be trusted and where uncertainty or ambiguity remains. On the other side, analysts must understand the practical situation. They have to know things like what decision is being informed, what the organization is trying to accomplish, and how the results will ultimately be used. Without this context, even correct statistical findings may be framed in ways that are irrelevant or misleading. The value of statistical storytelling lies in bringing these two perspectives together. Effective stories use a realistic understanding of the data and methods to generate conclusions that are not only accurate, but also aligned with the needs, constraints, and goals of the decision-making environment.

3.4.3 Data Visualization

This may catch you by surprise, but long tables of numbers and analytic results aren’t always the most interpretable method of communicating information. Even well-structured analyses can remain opaque if presented only as text or numeric output. Patterns that are immediately obvious when viewed visually such as an upward trend, an unusual spike, or a relationship between two variables can remain hidden within a spreadsheet or purely numeric summary. For this reason, data visualization plays a central role in translating analysis into insight.

At its core, data visualization is the practice of converting numerical information into visual forms such as charts, graphs, maps, or dashboards. This can be used in two ways. First, analysts can use these numeric summaries to highlight trends in data which may be difficult to otherwise identify. For example, multiple datasets may look extremely similar statistically while remaining visually distinct (e.g., Anscombe’s Quartet). The second utilization of visualization is for the communication of information to others (e.g., in a report or presentation). For such things, the goal is not merely decoration. Rather, the goal is clarity. Good visualizations reduce cognitive load and allow viewers to grasp complex ideas quickly by leveraging their natural ability to perceive visual patterns. Whether we are summarizing changes over time, comparing groups, highlighting distributions, or revealing relationships, visualization can transform otherwise abstract quantities into something more intuitive.

In modern organizations, visualization goes beyond static charts. Many companies employ dashboard systems, which consolidate multiple graphics, metrics, and summaries into a single interactive display. Dashboards allow users to interact through monitoring key indicators, filtering results, drilling down into specific groups, and tracking changes as information can be updated in near real time. Because dashboards emphasize clarity, consistency, and accessibility, they serve as an essential bridge between the technical work of analysis and the everyday work of business decision-making. They also increase access to relevant information, as people without statistical training can still explore data meaningfully through interactive visuals.

3.4.4 Business Intelligence

While data analysis and visualization help transform raw information into insight, business intelligence (BI) refers to the systems that make those insights accessible, organized, and usable across an entire organization. BI emphasizes the infrastructure behind sensemaking, such as the tools and processes that gather data from different sources, standardize it, and deliver it in a consistent, interpretable format. In many ways, BI acts as the “information backbone” of a business, ensuring that the right people receive the right data at the right time.

The most visible component of BI is often a dashboard, which presents key metrics and visual summaries in a centralized location. But it is important to remember that such dashboards are built upon a suite of automated processes which source, validate, clean, analyze, and report data without manual effort. Most BI platforms also provide interactive capabilities that allow users to explore data on their own, filtering results, comparing groups, or drilling down into underlying details.

Modern BI systems increasingly incorporate automation, machine learning, and artificial intelligence to enhance insight generation. Machine learning models can detect patterns that may be difficult to spot manually, identify emerging trends, or generate forecasts used within dashboards. AI-driven features (e.g., such as automated insights, anomaly detection, and natural-language summaries) help users understand complex information quickly. For example, a BI tool might highlight an unusual spike in customer activity, suggest likely reasons based on past data, and/or summarize the week’s key changes in plain English. While these capabilities do not replace human analysis or judgment, they expand what BI systems can surface, making it easier for teams to recognize when something requires attention.

Popular BI platforms (e.g., Power BI, Tableau) differ in features and design philosophy, but all share a common goal: to organize data in a way that supports timely, informed action. When implemented well, BI supports transparency, promotes consistent definitions across teams, and helps reduce the bottlenecks that occur when only a few individuals have access to analytical tools.

Ultimately, business intelligence does not make decisions for an organization. Rather, it ensures that the analytical foundation behind decisions is reliable, accessible, and grounded in shared information. BI systems connect the earlier stages of the data lifecycle to the decision-making process, helping turn data into a practical, everyday resource rather than an isolated technical asset.

Chapter Summary

As we bring this chapter to a close, it’s important to recognize how each piece – from data creation to storage, analysis, visualization, and business intelligence – works together to support thoughtful, informed decision-making in modern organizations. Remember, data alone accomplishes nothing. Rather, it becomes valuable only when it is accurate, organized, interpreted responsibly, and connected to real questions businesses face. Effective use of data requires both technical tools and human judgment, combining what the numbers say with an understanding of context, goals, and implications. As businesses continue to generate and rely on ever larger volumes of information, the ability to navigate this data lifecycle thoughtfully will increasingly distinguish organizations that merely collect data from those that truly benefit from it.

Review Questions

3.1. What is the data lifecycle, and how does each stage contribute to ensuring that data is useful?

3.2. How do internal and external data sources differ, and what are some advantages of combining them for strategic decision-making?

3.3. Compare structured and unstructured data. In what cases would businesses need storage solutions capable of handling both?

3.4. What are the steps in the decision-making process? For each step, who should be involved?

3.5. What is the difference between a population and a sample, and how does this distinction influence the way we interpret analysis of these respective groups?

3.6. Describe the difference between descriptive, explanatory, and predictive analytics. Provide a business example of each.

3.7. Why is statistical storytelling necessary? What can go wrong when results are communicated without appropriate context or limitations?

3.8. What role(s) does data visualization play in the decision-making process?

3.9. How do dashboards support everyday business operations, and how do they differ from static visualizations? Provide an example of when each would be useful.

3.10. What is business intelligence, and how does it relate to both data storage and data analysis?

Projects:

3.1. Mapping the Data Lifecycle in a Real Business: Choose a business you are familiar with (a workplace, campus office, restaurant, or online retailer). Identify one specific process (e.g., ordering supplies, registering students, scheduling appointments) and describe how data moves through the four stages of the data lifecycle: creation, storage, analysis, and use. Produce a one-page diagram or explanation showing what happens at each stage.

3.2. Internal vs. External Data Exercise: Pick a company or organization you know well. Make a two-column chart listing at least five examples of internal data and five examples of external data the company might use. Write a short paragraph explaining how combining these sources could improve one specific decision (e.g., marketing, staffing, pricing, budgeting).

3.3. Structured vs. Unstructured Data Sorting: Create a simple list of 12–15 business-related data items (e.g., “customer birthdate,” “audio from a support call,” “inventory count,” “product photo,” “sales receipt text”). Categorize each as structured, unstructured, or semi-structured. Write 1-2 paragraphs explaining how your classifications affect how each type of data would be stored or analyzed.

3.4. Decision-Making Walkthrough: Invent a small business scenario where a decision must be made (e.g., extending store hours, hiring seasonal staff, adjusting prices). Write a short report walking through the six steps of the decision-making process described in the chapter. Keep the scenario simple but realistic.

3.5. Visualization Sketch and Interpretation: Using data of your choice (real or made-up), sketch two simple graphs by hand (e.g., a bar chart and a line chart). Then write a short explanation addressing four components: why you chose each graph type, what question each graph answers, wow the visual makes the pattern easier to understand, and what parts may be useful to be interactive (e.g., in a dashboard).

Coding Activity: The Difference Visualizations (and Computers) can Make

Instructions

In this activity, you will compare how a trend appears in raw numerical data versus in a graphical visualization. The goal is to understand why data visualization is such a powerful tool for spotting patterns that may not be obvious from tables alone.

Part 1: Inspect the Raw Data

Below is a snippet from the flights dataset (a built-in Python dataset). It shows the number of airline passengers per month in 1949.

Raw Data Table (First 24 Rows of the Dataset)

year	month	passengers
1949	Jan	112
1949	Feb	118
1949	Mar	132
1949	Apr	129
1949	May	121
1949	Jun	135
1949	Jul	148
1949	Aug	148
1949	Sep	136
1949	Oct	119
1949	Nov	104
1949	Dec	118
1950	Jan	115
1950	Feb	126
1950	Mar	141
1950	Apr	135
1950	May	125
1950	Jun	149
1950	Jul	170
1950	Aug	170
1950	Sep	158
1950	Oct	133
1950	Nov	114
1950	Dec	140

Now, try to determine:

Is passenger volume rising?
Are there clear seasonal patterns?
Are some months consistently higher?
Is it easy or difficult to see a trend from this table alone?

Also, take a moment to imagine doing this by hand for the full dataset. You would need to scan each number individually, mentally track which months are higher or lower, guess whether the differences are meaningful, and repeat this for each of 144 months. It’s not impossible, but it’s certainly inefficient, subjective, and extremely error-prone. This exercise demonstrates why businesses outgrow paper tables and spreadsheets very quickly.

Part 2: Generate the Visualization

Now run this code to see the same data presented visually.

import seaborn as sns

import matplotlib.pyplot as plt

# Load a built-in dataset with time-like data

flights = sns.load_dataset(“flights”)

# Line chart of monthly passengers over years

plt.figure(figsize=(8,4))

sns.lineplot(data=flights, x=”year”, y=”passengers”)

plt.title(“Airline Passenger Trend Over Time”)

plt.xlabel(“Year”)

plt.ylabel(“Passengers”)

plt.show()

Consider the patterns you now see, and compare to what you noticed earlier.

Part 3: Reflection

Write a short reflection (i.e., more than a paragraph, less than two pages) answering the following.

What patterns did you see instantly in the graph that were hard to detect in the raw data table?
How did the visualization change your understanding of the trend?
Which format is more useful for identifying changes over time, and why?
If this were a business dataset, what decisions could the visualization support?
Say we added data for 10 more years. Which method would handle this better? Why?
How might a BI dashboard use this type of chart? What interactive components could be added to make it more useful?