Discussion: Data (300 words)

Attachments

CHAPTER

5
Database Systems
and Big Data

Rafal Olechowski/Shutterstock.com

Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203

Know?Did Yo
u

• The amount of data in the digital universe is expected
to increase to 44 zettabytes (44 trillion gigabytes) by
2020. This is 60 times the amount of all the grains of
sand on all the beaches on Earth. The majority of
data generated between now and 2020 will not be
produced by humans, but rather by machines as they
talk to each other over data networks.

• Most major U.S. wireless service providers have
implemented a stolen-phone database to report and
track stolen phones. So if your smartphone or tablet

goes missing, report it to your carrier. If someone else
tries to use it, he or she will be denied service on the
carrier’s network.

• You know those banner and tile ads that pop up on
your browser screen (usually for products and
services you’ve recently viewed)? Criteo, one of
many digital advertising organizations, automates the
recommendation of ads up to 30 billion times each day,
with each recommendation requiring a calculation
involving some 100 variables.

Principles Learning Objectives

• The database approach to data management has
become broadly accepted.

• Data modeling is a key aspect of organizing data and
information.

• A well-designed and well-managed database is an
extremely valuable tool in supporting decision making.

• We have entered an era where organizations are
grappling with a tremendous growth in the amount of
data available and struggling to understand how to
manage and make use of it.

• A number of available tools and technologies allow
organizations to take advantage of the opportunities
offered by big data.

• Identify and briefly describe the members of the hier-
archy of data.

• Identify the advantages of the database approach to
data management.

• Identify the key factors that must be considered when
designing a database.

• Identify the various types of data models and explain
how they are useful in planning a database.

• Describe the relational database model and its funda-
mental characteristics.

• Define the role of the database schema, data definition
language, and data manipulation language.

• Discuss the role of a database administrator and data
administrator.

• Identify the common functions performed by all data-
base management systems.

• Define the term big data and identify its basic
characteristics.

• Explain why big data represents both a challenge and
an opportunity.

• Define the term data management and state its overall
goal.

• Define the terms data warehouse, data mart, and data
lakes and explain how they are different.

• Outline the extract, transform, load process.

• Explain how a NoSQL database is different from an
SQL database.

• Discuss the whole Hadoop computing environment and
its various components.

• Define the term in-memory database and explain its
advantages in processing big data.

Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203

Why Learn about Database Systems and Big Data?
Organizations and individuals capture prodigious amounts of data from a myriad of sources every day.
Where does all this data come from, where does it go, how is it safeguarded, and how can you use it to
your advantage? In this chapter, you will learn about tools and processes that enable users to manage
all this data so that it can be used to uncover new insights and make effective decisions. For example,
if you become a marketing manager, you can access a vast store of data related to the Web-surfing
habits, past purchases, and even social media activity of existing and potential customers. You can use
this information to create highly effective marketing programs that generate consumer interest and
increased sales. If you become a biologist, you may use big data to study the regulation of genes and
the evolution of genomes in an attempt to understand how the genetic makeup of different cancers
influences outcomes for cancer patients. If you become a human resources manager, you will be able
to use data to analyze the impact of raises and changes in employee-benefit packages on employee
retention and long-term costs. Regardless of your field of study in school and your future career, using
database systems and big data will likely be a critical part of your job. As you read this chapter, you will
see how you can use databases and big data to extract and analyze valuable information to help you
succeed. This chapter starts by introducing basic concepts related to databases and data management
systems. Later, the topic of big data will be discussed along with several tools and technologies used
to store and analyze big data.

As you read this chapter, consider the following:

• Why is it important that the development and adoption of data management, data modeling, and
business information systems be a cross-functional effort involving more than the IS organization?

• How can organizations manage their data so that it is a secure and effective resource?

A database is a well-designed, organized, and carefully managed collection of
data. Like other components of an information system, a database should help
an organization achieve its goals. A database can contribute to organizational
success by providing managers and decision makers with timely, accurate,
and relevant information built on data. Databases also help companies ana-
lyze information to reduce costs, increase profits, add new customers, track
past business activities, and open new market opportunities.

A database management system (DBMS) consists of a group of pro-
grams used to access and manage a database as well as provide an interface
between the database and its users and other application programs. A DBMS
provides a single point of management and control over data resources,
which can be critical to maintaining the integrity and security of the data. A
database, a DBMS, and the application programs that use the data make up a
database environment.

Databases and database management systems are becoming even
more important to organizations as they deal with rapidly increasing
amounts of information. Most organizations have many databases; how-
ever, without good data management, it is nearly impossible for anyone
to find the right and related information for accurate and business-critical
decision making.

Data Fundamentals

Without data and the ability to process it, an organization cannot successfully
complete its business activities. It cannot pay employees, send out bills, order
new inventory, or produce information to assist managers in decision making.
Recall that data consists of raw facts, such as employee numbers and sales fig-
ures. For data to be transformed into useful information, it must first be orga-
nized in a meaningful way.

database: A well-designed,
organized, and carefully managed
collection of data.

database management
system (DBMS): A group of
programs used to access and manage
a database as well as provide an
interface between the database and its
users and other application programs.

194 PART 2 • Information Technology Concepts

Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203

Hierarchy of Data
Data is generally organized in a hierarchy that begins with the smallest piece of
data used by computers (a bit), progressing up through the hierarchy to a data-
base. A bit is a binary digit (i.e., 0 or 1) that represents a circuit that is either
on or off. Bits can be organized into units called bytes. A byte is typically eight
bits. Each byte represents a character, which is the basic building block of
most information. A character can be an uppercase letter (A, B, C, …, Z), a low-
ercase letter (a, b, c, …, z), a numeric digit (0, 1, 2, …, 9), or a special symbol
(., !, þ, �, /, etc.).

Characters are put together to form a field. A field is typically a name, a
number, or a combination of characters that describes an aspect of a business
object (such as an employee, a location, or a plant) or activity (such as a sale).
In addition to being entered into a database, fields can be computed from
other fields. Computed fields include the total, average, maximum, and mini-
mum value. A collection of data fields all related to one object, activity, or
individual is called a record. By combining descriptions of the characteristics
of an object, activity, or individual, a record can provide a complete descrip-
tion of it. For instance, an employee record is a collection of fields about one
employee. One field includes the employee’s name, another field contains the
address, and still others the phone number, pay rate, earnings made to date,
and so forth. A collection of related records is a file—for example, an
employee file is a collection of all company employee records. Likewise, an
inventory file is a collection of all inventory records for a particular company
or organization.

At the highest level of the data hierarchy is a database, a collection of inte-
grated and related files. Together, bits, characters, fields, records, files, and
databases form the hierarchy of data. See Figure 5.1. Characters are combined
to make a field, fields are combined to make a record, records are combined to
make a file, and files are combined to make a database. A database houses not
only all these levels of data but also the relationships among them.

Data Entities, Attributes, and Keys
Entities, attributes, and keys are important database concepts. An entity is a
person, place, or thing (object) for which data is collected, stored, and main-
tained. Examples of entities include employees, products, and customers.
Most organizations organize and store data as entities.

FIGURE 5.1
Hierarchy of data
Together, bits, characters, fields,
records, files, and databases form
the hierarchy of data.

Database

Hierarchy of data Example

FilesFilesFiles

RecordsRecordsRecordsRecordsRecords

Fields

Each character is
represented as

8 bits

Personnel file

Department file

Payroll file

(Project database)

(Personnel file)

(Record containing
employee #, last and
first name, hire date)

(Last name field)

098 – 40 – 1370 Fiske, Steven 01-05-2001

Fiske

(Letter F in ASCII)1000110

098 – 40 – 1370 Fiske, Steven 01-05-2001
549 – 77 – 1001 Buckley, Bill 02-17-1995
005 – 10 – 6321 Johns, Francine 10-07-2013

bit: A binary digit (i.e., 0 or 1) that
represents a circuit that is either on
or off.

character: A basic building block of
most information, consisting of upper-
case letters, lowercase letters, numeric
digits, or special symbols.

field: Typically a name, a number,
or a combination of characters that
describes an aspect of a business
object or activity.

record: A collection of data fields
all related to one object, activity, or
individual.

file: A collection of related records.

hierarchy of data: Bits, characters,
fields, records, files, and databases.

entity: A person, place, or thing for
which data is collected, stored, and
maintained.

CHAPTER 5 • Database Systems and Big Data 195

Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203

An attribute is a characteristic of an entity. For example, employee num-
ber, last name, first name, hire date, and department number are attributes
for an employee. See Figure 5.2. The inventory number, description, number
of units on hand, and location of the inventory item in the warehouse are
attributes for items in inventory. Customer number, name, address, phone
number, credit rating, and contact person are attributes for customers. Attri-
butes are usually selected to reflect the relevant characteristics of entities
such as employees or customers. The specific value of an attribute, called a
data item, can be found in the fields of the record describing an entity.
A data key is a field within a record that is used to identify the record.

Many organizations create databases of attributes and enter data items to
store data needed to run their day-to-day operations. For instance, database
technology is an important weapon in the fight against crime and terrorism, as
discussed in the following examples:

● The Offshore Leaks Database contains the names of some 100,000
secretive offshore companies, trusts, and funds created in locations
around the world. Although creating offshore accounts is legal in most
countries, offshore accounts are also established to enable individuals and
organizations to evade paying the taxes they would otherwise owe. The
database has been used by law enforcement and tax officials to identify
potential tax evaders.1

● Major U.S. wireless service providers have implemented a stolen-phone
database to report and track stolen 3G and 4G/LTE phones. The providers
use the database to check whether a consumer’s device was reported lost
or stolen. If a device has been reported lost or stolen, it will be denied
service on the carrier’s network. Once the device is returned to the
rightful owner, it may be reactivated. The next step will be to tie foreign
service providers and countries into the database to diminish the export
of stolen devices to markets outside the United States.2

● The Global Terrorism Database (GTD) is a database including data on
over 140,000 terrorist events that occurred around the world from 1970
through 2014 (with additional annual updates). For each terrorist event,
information is available regarding the date and location of the event, the
weapons used, the nature of the target, the number of casualties, and,
when identifiable, the group or individual responsible.3

● Pawnshops are required by law to report their transactions to law
enforcement by providing a description of each item pawned or sold
along with any identifying numbers, such as a serial number. LEADS
Online is a nationwide online database system that can be used to fulfill
this reporting responsibility and enable law enforcement officers to track
merchandise that is sold or pawned in shops throughout the nation. For

FIGURE 5.2
Keys and attributes
The key field is the employee
number. The attributes include last
name, first name, hire date, and
department number.

Employee #

005-10-6321

549-77-1001

098-40-1370

Last name First name Hire date Dept. number

257

632

59801-05-2001

02-17-1995

10-07-2013Francine

Bill

StevenFiske

Buckley

Johns

ATTRIBUTES (fields)

KEY FIELD

E
N

T
IT

IE
S

(
re

co
rd

s)

attribute: A characteristic of an
entity.

data item: The specific value of an
attribute.

196 PART 2 • Information Technology Concepts

Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203

example, if law enforcement has a serial number for a stolen computer,
they can enter this into LEADS Online and determine if it has been sold or
pawned, when and where the theft or transaction occurred and, in the
case of an item that was pawned, who made the transaction.4

As discussed earlier, a collection of fields about a specific object is a record.
A primary key is a field or set of fields that uniquely identifies the record. No
other record can have the same primary key. For an employee record, such as
the one shown in Figure 5.2, the employee number is an example of a primary
key. The primary key is used to distinguish records so that they can be
accessed, organized, and manipulated. Primary keys ensure that each record in
a file is unique. For example, eBay assigns an “Item number” as its primary
key for items to make sure that bids are associated with the correct item. See
Figure 5.3.

In some situations, locating a particular record that meets a specific set of
criteria might be easier and faster using a combination of secondary keys rather
than the primary key. For example, a customer might call a mail-order com-
pany to place an order for clothes. The order clerk can easily access the custo-
mer’s mailing and billing information by entering the primary key—usually a
customer number—but if the customer does not know the correct primary key,
a secondary key such as last name can be used. In this case, the order clerk
enters the last name, such as Adams. If several customers have a last name of
Adams, the clerk can check other fields, such as address and first name, to find
the correct customer record. After locating the correct record, the order can be
completed and the clothing items shipped to the customer.

The Database Approach
At one time, information systems referenced specific files containing relevant
data. For example, a payroll system would use a payroll file. Each distinct
operational system used data files dedicated to that system.

Today, most organizations use the database approach to data manage-
ment, where multiple information systems share a pool of related data.

FIGURE 5.3
Primary key
eBay assigns an Item number as a primary key to keep track of each item in its database.

w
w
w
.e
ba
y.
co
m

primary key: A field or set of fields
that uniquely identifies the record.

database approach to
data management: An approach
to data management where multiple
information systems share a pool of
related data.

CHAPTER 5 • Database Systems and Big Data 197

Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203

Critical
Thinking
Exercise

A database offers the ability to share data and information resources. Federal
databases, for example, often include the results of DNA tests as an attribute
for convicted criminals. The information can be shared with law enforcement
officials around the country. Often, distinct yet related databases are linked to
provide enterprise-wide databases. For example, many Walgreens stores
include in-store medical clinics for customers. Walgreens uses an electronic
health records database that stores the information of all patients across all
stores. The database provides information about customers’ interactions with
the clinics and pharmacies.

To use the database approach to data management, additional software—
a database management system (DBMS)—is required. As previously discussed,
a DBMS consists of a group of programs that can be used as an interface
between a database and the user of the database. Typically, this software acts
as a buffer between the application programs and the database itself.
Figure 5.4 illustrates the database approach.

Vehicle Theft Database
You are a participant in an information systems project to design a vehicle
theft database for a state law enforcement agency. The database will provide
information about stolen vehicles (e.g., autos, golf carts, SUVs, and trucks),
with details about the vehicle theft as well as the stolen vehicle itself.
These details will be useful to law enforcement officers investigating the vehicle
theft.

Review Questions
1. Identify 10 data attributes you would capture for each vehicle theft incident.

How many bytes should you allow for each attribute?
2. Which attribute would you designate as the primary key?

FIGURE 5.4
Database approach to data
management
In a database approach to data
management, multiple information
systems share a pool of related data.

Database
management

system

Payroll
data

Inventory
data

Invoicing
data

Other
data

Payroll
program

Reports

Inventory
control

program

Management
inquiries

Invoicing
program

Database Interface Application
programs

Users

Reports

Reports

Reports

198 PART 2 • Information Technology Concepts

Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203

Critical Thinking Questions
1. Should the database include data about the status of the theft investigation? If

so, what sort of data needs to be included?
2. Can you foresee any problems with keeping the data current? Explain.

Data Modeling and Database Characteristics

Because today’s businesses must keep track of and analyze so much
data, they must keep the data well organized so that it can be used
effectively. A database should be designed to store all data relevant to
the business and to provide quick access and easy modification. More-
over, it must reflect the business processes of the organization. When
building a database, an organization must carefully consider the following
questions:

● Content. What data should be collected and at what cost?
● Access. What data should be provided to which users and when?
● Logical structure. How should data be arranged so that it makes sense to

a given user?
● Physical organization. Where should data be physically located?
● Archiving. How long must this data be stored?
● Security. How can this data be protected from unauthorized access?

Data Modeling
When organizing a database, key considerations include determining
what data to collect, what the source of the data will be, who will have
access to it, how one might want to use it, and how to monitor database
performance in terms of response time, availability, and other factors.
AppDynamics offers its i-nexus cloud-based business execution solution
to clients for use in defining the actions and plans needed to achieve
business goals. The service runs on 30 Java virtual machines and eight
database servers that are constantly supervised using database perfor-
mance monitoring software. Use of the software has reduced the mean
time to repair system problems and improved the performance and
responsiveness for all its clients.5

One of the tools database designers use to show the logical relationships
among data is a data model. A data model is a diagram of entities and their
relationships. Data modeling usually involves developing an understanding
of a specific business problem and then analyzing the data and information
needed to deliver a solution. When done at the level of the entire organiza-
tion, this procedure is called enterprise data modeling. Enterprise data
modeling is an approach that starts by investigating the general data and
information needs of the organization at the strategic level and then moves
on to examine more specific data and information needs for the functional
areas and departments within the organization. An enterprise data model
involves analyzing the data and information needs of an entire organization
and provides a roadmap for building database and information systems by
creating a single definition and format for data that can ensure compatibility
and the ability to exchange and integrate data among systems. See
Figure 5.5.

data model: A diagram of data enti-
ties and their relationships.

enterprise data model: A data
model that provides a roadmap for
building database and information
systems by creating a single definition
and format for data that can ensure
data compatibility and the ability to
exchange and integrate data among
systems.

CHAPTER 5 • Database Systems and Big Data 199

Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203

The IBM Healthcare Provider Data Model is an enterprise data model
that can be adopted by a healthcare provider organization to organize and
integrate clinical, research, operational, and financial data.6 At one time,
the University of North Carolina Health Care System had a smorgasbord of
information system hardware and software that made it difficult to integrate
data from its existing legacy systems. The organization used the IBM
Healthcare Provider Data Model to guide its efforts to simplify its informa-
tion system environment and improve the integration of its data. As a result,
it was able to eliminate its dependency on outdated technologies, build an
environment that supports efficient data management, and integrate data
from its legacy systems to create a source of data to support future analytics
requirements.7

Various models have been developed to help managers and database
designers analyze data and information needs. One such data model is an
entity-relationship (ER) diagram, which uses basic graphical symbols to
show the organization of and relationships between data. In most cases,
boxes in ER diagrams indicate data items or entities contained in data tables,
and lines show relationships between entities. In other words, ER diagrams
show data items in tables (entities) and the ways they are related.

ER diagrams help ensure that the relationships among the data entities in a
database are correctly structured so that any application programs developed are
consistent with business operations and user needs. In addition, ER diagrams can
serve as reference documents after a database is in use. If changes are made to
the database, ER diagrams help design them. Figure 5.6 shows an ER diagram for
an order database. In this database design, one salesperson serves many custo-
mers. This is an example of a one-to-many relationship, as indicated by the one-
to-many symbol (the “crow’s-foot”) shown in Figure 5.6. The ER diagram also
shows that each customer can place one-to-many orders, that each order includes
one-to-many line items, and that many line items can specify the same product

FIGURE 5.5
Enterprise data model
The enterprise data model provides
a roadmap for building database
and information systems.

Supports

Supports

Systems and data

Enables capture of business opportunities

Increases business effectiveness

Reduces costs

Enables simpler system interfaces

Reduces data redundancy

Ensures compatible data

The enterprise

Data model

entity-relationship
(ER) diagram: A data model that
uses basic graphical symbols to show
the organization of and relationships
between data.

200 PART 2 • Information Technology Concepts

Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203

(a many-to-one relationship). This database can also have one-to-one relation-
ships. For example, one order generates one invoice.

Relational Database Model
The relational database model is a simple but highly useful way to organize data
into collections of two-dimensional tables called relations. Each row in the table
represents an entity, and each column represents an attribute of that entity. See
Figure 5.7.

FIGURE 5.6
Entity-relationship (ER)
diagram for a customer order
database
Development of ER diagrams helps
ensure that the logical structure of
application programs is consistent
with the data relationships in the
database.

Serves

Salesperson

Product

Customer

Orders

Places

Line
items

Includes Specifies

Invoice

Generates

relational database model:
A simple but highly useful way to
organize data into collections of
two-dimensional tables called
relations.

FIGURE 5.7
Relational database model
In the relational model, data is
placed in two-dimensional tables, or
relations. As long as they share at
least one common attribute, these
relations can be linked to provide
output useful information. In this
example, all three tables include the
Dept. number attribute.

Data Table 1: Project Table

Project Description Dept. number

155 Payroll 257

498 Widgets 632

226 Sales manual 598

Data Table 2: Department Table

Dept. Dept. name Manager SSN

257 Accounting 005-10-6321

632 Manufacturing 549-77-1001

598 Marketing 098-40-1370

Data Table 3: Manager Table

SSN Last name First name

005-10-6321 Johns Francine

549-77-1001 Buckley Bill

098-40-1370 Fiske Steven

Hire date Dept. number

10-07-2013 257

02-17-1995 632

01-05-2001 598

CHAPTER 5 • Database Systems and Big Data 201

Copyright 2018 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-203

Each attribute can be constrained to a range of allowable values called its
domain. The domain for a particular attribute indicates what values can be
placed in each column of the relational table. For instance, the domain for
an attribute such as type employee could be limited to either H (hourly) or
S (salary). If someone tried to enter a “1” in the type employee field, the data
would not be accepted. The domain for pay rate would not include negative
numbers. In this way, defining a domain can increase data accuracy.

Manipulating Data
After entering data into a relational database, users can make inquiries and
analyze the data. Basic data manipulations include selecting, projecting, and
joining. Selecting involves eliminating rows according to certain criteria. Sup-
pose the department manager of a company wants to use an employee table
that contains the project number, description, and department number for all
projects a company is performing. The department manager might want to
find the department number for Project 226, a sales manual project. Using
selection, the manager can eliminate all rows except the one for Project 226
and see that the department number for the department completing the sales
manual project is 598.

Projecting involves eliminating columns in a table. For example,
a department table might contain the department number, department name,
and Social Security number …

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *