Source:
https://en.wikipedia.org/wiki/Analytics
File: GLOSSARY-BIG DATA
Analytics: https://en.wikipedia.org/wiki/Analytics
Analytics is the discovery,
interpretation, and communication of meaningful patterns in data. Especially
valuable in areas rich with recorded information, analytics relies on the
simultaneous application of statistics, computer programming and operations
research to quantify performance.
In a fast-moving space like big data, it’s critical to
separate the jargon from meaning and (more importantly) to recognize the
difference between the hype and the true value proposition. The following
glossary covers many of the most common – and sometimes misunderstood – big
data terms and concepts.
- Algorithm
- Analytics
Platform
- Apache
Hive
- Artificial
Intelligence (AI)
- Behavioral
Analytics
- Big
Data
- Big
Data Analytics
- Business
Intelligence
- Cascading
- Cloud
Computing
- Cognitive
computing
- Concurrency/Concurrent
computing
- Cluster
Analysis
- Comparative
Analysis
- Connection
Analytics
- Correlation
Analysis
- Data
Analyst
- Data
Architecture
- Data
Cleansing
- Data
Gravity
- Data
Mining
- Data
Model / Data Modeling
- Data
Warehouse
- Descriptive
Analytics
- ETL
- Hadoop
- Exabyte
- Internet
of Things (IOT)
- Machine
Learning
- Metadata
- MongoDB
- Natural
Language Processing
- Pattern
Recognition
- Petabyte
- Predictive
Analytics
- Prescriptive
Analytics
- R
- Semi-structured
Data
- Sentiment
Analysis
- Structured
Data
- Terabyte
- Unstructured
Data
In a f ALGORITHM
An
algorithm is mathematical “logic” or a set of rules used to make calculations.
Starting with an initial input (which may be zero or null), the logic or rules
are coded or written into software as a set of steps to be followed in
conducting calculations, processing data or performing other functions,
eventually leading to an output.
Teradata Take: Within
the context of big data, algorithms are the primary means for uncovering
insights and detecting patterns. Thus, they are essential to realizing the big
data business case.
ANALYTICS PLATFORM
An
analytics platform is a full-featured technology solution designed to address
the needs of large enterprises. Typically, it joins different “tools and
analytics systems together with an engine to execute, a database or repository
to store and manage the data, data mining processes, and techniques and
mechanisms for obtaining and preparing data that is not stored. This solution
can be conveyed as a software-only application or as a cloud-based software as
a service (SaaS) provided to organizations in need of contextual information
that all their data points to, in other words, analytical information based on
current data records.”Source: Techopedia
APACHE HIVE
Apache
Hive is an open-source data warehouse infrastructure that provides tools for
data summarization, query and analysis. It is specifically designed to support
the analysis of large datasets stored in Hadoop files and compatible file
systems, such as Amazon S3. Hive was initially developed by data engineers at
Facebook in 2008, but is now used by many other companies.
ARTIFICIAL INTELLIGENCE (AI)
AI is an
old branch of computer science software for simulating human decision-making.
It mimics "learning" and "problem solving" through advanced
algorithms and machine learning. AI has grown popular across many industries,
with use case examples that include personalization of marketing offers and
sales promotions, anti-virus security, equities trading, medical diagnosis,
fraud detection and self-driving cars. Big data coupled with deep neural
networks and fast parallel processing are currently driving AI growth.
Teradata Take: Teradata’s
Sentient Enterprise vision recommends widespread use of automated machine
learning algorithms. Business leaders should focus on specific use cases, not
the term “AI” itself. After all, algorithms are not human: they don’t think and
they are not truly intelligent or conscious. It requires fresh data and program
maintenance to improve accuracy and reduce risk in the applications of AI, thus
it’s best to be skeptical of Hollywood renderings of AI and general marketing
hype about AI.
BEHAVIORAL ANALYTICS
Behavioral
Analytics is a subset of business analytics that focuses on understanding what
consumers and applications do, as well as how and why they act in certain ways.
It is particularly prevalent in the realm of eCommerce and online retailing,
online gaming and Web applications. In practice, behavioral analytics seeks to
connect seemingly unrelated data points and explain or predict outcomes, future
trends or the likelihood of certain events. At the heart of behavioral
analytics is such data as online navigation paths, clickstreams, social media
interactions, purchases or shopping cart abandonment decisions, though it may
also include more specific metrics.
Teradata Take: But
behavioral analytics can be more than just tracking people. Its principles also
apply to the interactions and dynamics between processes, machines and
equipment, even macroeconomic trends.
BIG DATA
“Big data
is an all-encompassing term for any collection of data sets so large or complex
that it becomes difficult to process them using traditional data-processing
applications.” Source: Wikipedia
Teradata take: What is
big data? Big data is often described in terms of several “V’s” – volume, variety,
velocity, variability, veracity – which speak collectively to the complexity
and difficulty in collecting, storing, managing, analyzing and otherwise
putting big data to work in creating the most important “V” of all – value.
BIG DATA ANALYICS
“Big data
analytics refers to the strategy of analyzing large volumes of data … gathered
from a wide variety of sources, including social networks, videos, digital
images, sensors and sales transaction records. The aim in analyzing all this
data is to uncover patterns and connections that might otherwise be invisible,
and that might provide valuable insights about the users who created it.
Through this insight, businesses may be able to gain an edge over their rivals
and make superior business decisions.” Source: Techopedia
Teradata Take: What is
big data analytics? Big data analytics isn’t one practice or one tool. Big data
visualizations are needed in some situations, while connected analytics are the
right answer in others.
BUSINESS INTELLIGENCE
“Business
intelligence (BI) is an umbrella term that includes the applications,
infrastructure and tools, and best practices that enable access to and analysis
of information to improve and optimize decisions and performance.” Source:
Gartner “Companies use BI to improve decision making, cut costs and identify new
business opportunities. BI is more than just corporate reporting and more than
a set of tools to coax data out of enterprise systems. CIOs use BI to identify
inefficient business processes that are ripe for re-engineering.” Source:
CIO.com
CASCADING
Cascading
is a platform for developing Big Data applications on Hadoop. It offers a
computation engine, systems integration framework, data processing and scheduling
capabilities. One important benefit of cascading is that it offers development
teams portability so they can move existing applications without incurring the
cost to rewrite them. Cascading applications run on and can be ported between
different platforms, including MapReduce, Apache Tez and Apache Flink.
CLOUD COMPUTING
Cloud
computing refers to the practice of using a network of remote servers to store,
manage and process data (rather than an on-premise server or a personal
computer) with access to such data provided through the Internet (the cloud).
Programs, applications and other services may also be hosted in the cloud,
which frees companies from the task and expense of building and maintaining
data centers and other infrastructure. There are a few types of common cloud
computing models. Private clouds provide access to data and services via
dedicated data centers or servers for specific audiences (e.g., a company’s
employees). They may offer customized infrastructure, storage and networking
configurations. Often used by small and medium-sized businesses with
fluctuating computing requirements, public clouds are typically based on shared
hardware, offering data and services on-demand usually through “pay-as-you-go”
models that eliminate maintenance costs. Hybrid clouds combine aspects of both
private and public clouds. For example, companies can use the public cloud for
data, applications and operations that are not considered mission critical and
the private cloud to ensure dedicated resources are available to support core
processes and essential computing tasks.
Teradata take: Effective
cloud computing capabilities have become essential elements in the most
effective Big Data environments.
CLUSTER ANALYSIS
Cluster
analysis or clustering is a statistical classification technique or activity that
involves grouping a set of objects or data so that those in the same group
(called a cluster) are similar to each other, but different from those in other
clusters. It is essential to data mining and discovery, and is often used in
the context of machine learning, pattern recognition, image analysis and in
bioinformatics and other sectors that analyze large data sets.
COGNITIVE COMPUTING
Cognitive
computing is a subset of artificial intelligence. It combines natural language
processing with machine learning, rules, and interactive “stateful”
programming. It is often used in spoken question-and-answer dialogs.
Interactive cognitive systems “remember” the context of the current dialog and
use that information to refine the next answer. Cognitive computing requires
constant program maintenance and new data to improve the knowledge base.
Examples of cognitive technology include Apple Siri, Amazon Alexa and IBM
Watson.
Teradata Take: Cognitive
computing is still in the early stages of maturity. It requires enormous
investment, skill and patience for businesses to apply it effectively.
Cognitive systems typically make many mistakes when interacting with humans. We
expect cognitive computing to mature rapidly for specific tasks in the next
decade. But, again, it’s best to be wary of Hollywood and marketing hype about
cognitive computing.
COMPARATIVE ANALYSIS
Comparative
analysis refers to the comparison of two or more processes, documents, data
sets or other objects. Pattern analysis, filtering and decision-tree analytics
are forms of comparative analysis. In healthcare, comparative analysis is used
to compare large volumes of medical records, documents, images, sensor data and
other information to assess the effectiveness of medical diagnoses.
CONNECTION ANALYTICS
Connection
analytics is an emerging discipline that helps to discover interrelated
connections and influences between people, products, processes machines and
systems within a network by mapping those connections and continuously
monitoring interactions between them. It has been used to address difficult and
persistent business questions relating to, for instance, the influence of
thought leaders, the impact of external events or players on financial risk,
and the causal relationships between nodes in assessing network performance.
CONCURRENCY/CONCURRENT COMPUTING
Concurrency
or concurrent computing refers to the form of computing in which multiple
computing tasks occur simultaneously or at overlapping times. These tasks can
be handled by individual computers, specific applications or across networks.
Concurrent computing is often used in Big Data environments to handles very
large data sets. For it to work efficiently and effectively, careful
coordination is necessary between systems and across Big Data architectures
relative to scheduling tasks, exchanging data and allocating memory.
CORRELATION ANALYSIS
Correlation
analysis refers to the application of statistical analysis and other
mathematical techniques to evaluate or measure the relationships between
variables. It can be used to define the most likely set of factors that will
lead to a specific outcome – like a customer responding to an offer or the
performance of financial markets.
DATA ANALYST
The main
tasks of data analysts are to collect, manipulate and analyze data, as well as
to prepare reports, which may be include graphs, charts, dashboards and other
visualizations. Data analysts also generally serve as guardians or gatekeepers
of an organization's data, ensuring that information assets are consistent,
complete and current. Many data analysts and business analysts are known for
having considerable technical knowledge and strong industry expertise.
Teradata Take: Data
analysts serve the critical purpose of helping to operationalize big data
within specific functions and processes, with a clear focus on performance
trends and operational information.
DATA ARCHITECTURE
“Data
architecture is a set of rules, policies, standards and models that govern and
define the type of data collected and how it is used, stored, managed and
integrated within an organization and its database systems. It provides a
formal approach to creating and managing the flow of data and how it is
processed across an organization’s IT systems and applications.” Source:
Techopedia
Teradata Take: Teradata
Unified Data Architecture is the first comprehensive big data architecture.
This framework harnesses relational and non-relational repositories via SQL and
non-SQL analytics. Consolidating data into data warehouses and data lakes
enables enterprise-class architecture. Teradata’s unifies big data architecture
through cross-platform data access for all analytic tools and the ability to
“push-down” functions to the data, rather than moving data to the function. See
data gravity.
DATA CLEANSING
Data
cleansing, or data scrubbing, is the process of detecting and correcting or
removing inaccurate data or records from a database. It may also involve
correcting or removing improperly formatted or duplicate data or records. Such
data removed in this process is often referred to as “dirty data.” Data
cleansing is an essential task for preserving data quality. Large organizations
with extensive data sets or assets typically use automated tools and algorithms
to identity such records and correct common errors (such as missing zip codes
in customer records).
Teradata take: The
strongest Big Data environments have rigorous data cleansing tools and
processes to ensure data quality is maintained at scale and confidence in data
sets remains high for all types of users.
DATA GRAVITY
Data
gravity appears when the amount of data volume in a repository grows and the
number of uses also grows. At some point, the ability to copy or migrate data
becomes onerous and expensive. Thus, the data tends to pull services,
applications and other data into its repository. Primary examples of data gravity
are data warehouses and data lakes. Data in these systems have inertia.
Scalable data volumes often break existing infrastructure and processes, which
require risky and expensive remedies. Thus, the best practice design is to move
processing to the data, not the other way around.
Teradata Take: Data
gravity has affected terabyte- and petabyte-size data warehouses for many
years. It is one reason scalable parallel processing of big data is required.
This principle is now extending to data lakes which offer different use cases.
Teradata helps clients manage data gravity.
DATA MINING
“Data
mining is the process of analyzing hidden patterns of data according to
different perspectives for categorization into useful information, which is
collected and assembled in common areas, such as data warehouses, for efficient
analysis, data mining algorithms, facilitating business decision making and
other information requirements to ultimately cut costs and increase revenue.
Data mining is also known as data discovery and knowledge discovery.” Source:
Techopedia
DATA MODEL / DATA MODELING
“Data
modeling is the analysis of data objects that are used in a business or other
context and the identification of the relationships among these data objects. A
data model can be thought of as a diagram or flowchart that illustrates the
relationships between data.” Source: TechTarget
Teradata Take: Data
models that are tailored to specific industries or business functions can
provide a strong foundation or “jump-start” for big data programs and
investments.
DATA WAREHOUSE
“In
computing, a data warehouse (DW or DWH), also known as an enterprise data
warehouse (EDW), is a system used for reporting and data analysis. DWs are central
repositories of integrated data from one or more disparate sources. They store
current and historical data and are used for creating trending reports for
senior management reporting such as annual and quarterly comparisons. The data
stored in the warehouse is uploaded from the operational systems (such as
marketing, sales, etc.) Source: Wikipedia
DESCRIPTIVE ANALYTICS
Considered
the most basic type of analytics, descriptive analytics involves the breaking
down of big data into smaller chunks of usable information so that companies
can understand what happened with a specific operation, process or set of
transactions. Descriptive analytics can provide insight into current customer
behaviors and operational trends to support decisions about resource
allocations, process improvements and overall performance management. Most
industry observers believe it represents the vast majority of the analytics in use
at companies today.
Teradata Take: A strong
foundation of descriptive analytics – based on a solid and flexible data
architecture – provides the accuracy and confidence in decision making most
companies need in the big data era (especially if they wish to avoid being
overwhelmed by large data volumes). More importantly, it ultimately enables
more advanced analytics capabilities – especially predictive and prescriptive
analytics.
ETL
Extract,
Transform and Load (ETL) refers to the process in data warehousing that
concurrently reads (or extracts) data from source systems; converts (or
transforms) the data into the proper format for querying and analysis; and
loads it into a data warehouse, operational data store or data mart). ETL
systems commonly integrate data from multiple applications or systems that may
be hosted on separate hardware and managed by different groups or users. ETL is
commonly used to assemble a temporary subset of data for ad-hoc reporting,
migrate data to new databases or convert database into a new format or type.
EXABYTE
An
extraordinarily large unit of digital data, one Exabyte (EB) is equal to 1,000
Petabytes or one billion gigabytes (GB). Some technologists have estimated that
all the words ever spoken by mankind would be equal to five Exabytes.
HADOOP
Hadoop is
a distributed data management platform or open-source software framework for
storing and processing big data. It is sometimes described as a cut-down
distributed operating system. It is designed to manage and work with immense
volumes of data, and scale linearly to large clusters of thousands of commodity
computers. It was originally developed for Yahoo!, but is now available free
and publicly through Apache Software Foundation, though it usually requires
extensive programming knowledge to be used.
INTERNET OF THINGS (IOT):
A concept
that describes the connection of everyday physical objects and products to the
Internet so that they are recognizable by (through unique identifiers) and can
relate to other devices. The term is closely identified with machine-to-machine
communications and the development of, for example, “smart grids” for
utilities, remote monitoring and other innovations. Gartner estimates 26
billion devices will be connected by 2020, including cars, coffee makers.
Teradata Take: Big data
will only get bigger in the future and the IOT will be a major driver. The
connectivity from wearables and sensors mean bigger volumes, more variety and
higher-velocity feeds.
MACHINE LEARNING
“Machine
learning is a type of artificial intelligence (AI) that provides computers with
the ability to learn without being explicitly programmed. It focuses on the
development of computer programs that can teach themselves to grow and change
when exposed to new data. The process of machine learning is similar to that of
data mining. Both systems search through data to look for patterns. However,
instead of extracting data for human comprehension – as is the case in data
mining applications – machine learning uses that data to improve the program's
own understanding. Machine learning programs detect patterns in data and adjust
program actions accordingly.” Source: TechTarget
Teradata Take: Machine
learning is especially powerful in a big data context in that machines can test
hypotheses using large data volumes, refine business rules as conditions change
and identify anomalies and outliers quickly and accurately.
METADATA
“Metadata
is data that describes other data. Metadata summarizes basic information about
data, which can make finding and working with particular instances of data
easier. For example, author, date created and date modified and file size are
very basic document metadata. In addition to document files, metadata is used
for images, videos, spreadsheets and web pages.” Source: TechTarget
Teradata Take: The
effective management of metadata is an essential part of solid and flexible big
data “ecosystems” in that it helps companies more efficiently manage their data
assets and make them available to data scientists and other analysts.
MONGODB
MongoDB
is a cross-platform, open-source database that uses a document-oriented data
model, rather than a traditional table-based relational database structure.
This type of database structure is designed to make the integration of
structured and unstructured data in certain types of applications easier and
faster.
NATURAL LANGUAGE PROCESSING
A branch
of artificial intelligence, natural language processing (NLP) deals with making
human language (in both written and spoken forms) comprehensible to computers.
As a scientific discipline, NLP involves tasks such as identifying sentence
structures and boundaries in documents, detecting key words or phrases in audio
recordings, extracting relationships between documents, and uncovering meaning
in informal or slang speech patterns. NLP can make it possible to analyze and
recognize patterns in verbal data that is currently unstructured.
Teradata Take: NLP holds
a key for enabling major advancements in text analytics and for garnering
deeper and potentially more powerful insights from social media data streams,
where slang and unconventional language are prevalent.
PATTERN RECOGNITION
Pattern
recognition occurs when an algorithm locates recurrences or regularities within
large data sets or across disparate data sets. It is closely linked and even
considered synonymous with machine learning and data mining. This visibility
can help researchers discover insights or reach conclusions that would
otherwise be obscured.
PETABYTE
An
extremely large unit of digital data, one Petabyte is equal to 1,000 Terabytes.
Some estimates hold that a Petabyte is the equivalent of 20 million tall filing
cabinets or 500 billion pages of standard printed text.
PREDICTIVE ANALYTICS
Predictive
analytics refers to the analysis of big data to make predictions and determine
the likelihood of future outcomes, trends or events. In business, it can be
used to model various scenarios for how customers react to new product
offerings or promotions and how the supply chain might be affected by extreme
weather patterns or demand spikes. Predictive analytics may involve various
statistical techniques, such as modeling, machine learning and data mining.
PRESCRIPTIVE ANALYTICS
A type or
extension of predictive analytics, prescriptive analytics is used to recommend
or prescribe specific actions when certain information states are reached or
conditions are met. It uses algorithms, mathematical techniques and/or business
rules to choose among several different actions that are aligned to an
objective (such as improving business performance) and that recognize various
requirements or constraints.
R
R is an
open-source programming language for statistical analysis. It includes a
command line interface and several graphical interfaces. Popular algorithm
types include linear and nonlinear modeling, time-series analysis,
classification and clustering. According to Gartner research, more than 50% of
data science teams now use R in some capacity. R language competes with
commercial products such as SAS and Fuzzy Logix.
Teradata Take: Many R
language algorithms yield inaccurate results when run in parallel. Teradata
partnered with Revolution Analytics to convert many R algorithms to run
correctly in parallel. Teradata Database runs R in-parallel via its scripting
and language support feature. Teradata Aster R runs in-parallel as well. Both
solutions eliminate open source R’s limitations around memory, processing and
data.
SEMI-STRUCTURED DATA
Semi-structured
data refers to data that is not captured or formatted in conventional ways,
such as those associated with a traditional database fields or common data
models. It is also not raw or totally unstructured and may contain some data
tables, tags or other structural elements. Graphs and tables, XML documents and
email are examples of semi-structured data, which is very prevalent across the
World Wide Web and is often found in object-oriented databases.
Teradata Take: As
semi-structured data proliferates and because it contains some rational data,
companies must account for it within their big data programs and data
architectures.
SENTIMENT ANALYSIS
Sentiment
analysis involves the capture and tracking of opinions, emotions or feelings
expressed by consumers in various types of interactions or documents, including
social media, calls to customer service representatives, surveys and the like.
Text analytics and natural language processing are typical activities within a
process of sentiment analysis. The goal is to determine or assess the sentiments
or attitudes expressed toward a company, product, service, person or event.
Teradata Take: Sentiment
analysis is particularly important in tracking emerging trends or changes in
perceptions on social media. Within big data environments, sentiment analysis
combined with behavioral analytics and machine learning is likely to yield even
more valuable insights.
STRUCTURED DATA
Structured
data refers to data sets with strong and consistent organization. Structured
data is organized into rows and columns with known and predictable contents.
Each column contains a specific data type, such as dates, text, money or
percentages. Data not matching that column’s data type is rejected as an error.
Relational database tables and spreadsheets typically contain structured data.
A higher semantic level of structure combines master data and historical data
into a data model. Data model subject areas include topics such as customers,
inventory, sales transactions, prices and suppliers. Structured data is easy to
use and data integrity can be enforced. Structured data becomes big data as
huge amounts of historical facts are captured.
Teradata Take: All
important business processes and decisions depend on structured data. It is the
foundation of data warehouses, data lakes and applications. When integrated
into a data model, structured data provides exponential business value.
TERABYTE
A
relatively large unit of digital data, one Terabyte (TB) equals 1,000
Gigabytes. It has been estimated that 10 Terabytes could hold the entire
printed collection of the U.S. Library of Congress, while a single TB could
hold 1,000 copies of the Encyclopedia Brittanica.
UNSTRUCTURED DATA
Unstructured
data refers to unfiltered information with no fixed organizing principle. It is
often called raw data. Common examples are web logs, XML, JSON, text documents,
images, video, and audio files. Unstructured data is searched and parsed to
extract useful facts. As much as 80% of enterprise data is unstructured. This
means it is the most visible form of big data to many people. The size of
unstructured data requires scalable analytics to produce insights. Unstructured
data is found in most but not all data lakes because of the lower cost of storage.
Teradata Take: There is
more noise than value in unstructured data. Extracting the value hidden in such
files requires strong skills and tools. There is a myth that relational
databases cannot process unstructured data. Teradata's Unified Data Architecture
embraces unstructured data in several ways. Teradata Database and competitors
can store and process XML, JSON, Avro and other forms of unstructured data.
THE V’S:
Big data
– and the business challenges and opportunities associated with it – are often
discussed or described in the context of multiple V’s:
·
Value: the most important “V” from the perspective
of the business, the value of big data usually comes from insight discovery and
pattern recognition that lead to more effective operations, stronger customer
relationships and other clear and quantifiable business benefits
·
Variability: the changing nature of the data companies
seek to capture, manage and analyze – e.g., in sentiment or text analytics,
changes in the meaning of key words or phrases
·
Variety: the diversity and range of different data
types, including unstructured data, semi-structured data and raw data
·
Velocity: the speed at which companies receive, store
and manage data – e.g., the specific number of social media posts or search
queries received within a day, hour or other unit of time
·
Veracity: the “truth” or accuracy of data and
information assets, which often determines executive-level confidence
·
Volume: the size and amounts of big data that
companies manage and analyze
*****
BUILDING
A TEAM FOR BIG DATA SUCCESS
Most
business people know that big data success takes more than just the latest
technology. The right big data strategy (aligned to broader bigger-picture
corporate objectives), strong big data processes (in reporting and governance,
for example) and big data cultures (with strong commitments to data-driven
decision making) are critical ingredients, too.
Still, big
data strategy discussions too often focus – even obsess on – the ginormous data
volumes, the dizzying range of data infrastructure options and the shiny new
technology du jour. And they almost always overlook one crucial variable: the
people who generate the critical insights that reveal game-changing
opportunities.
PEOPLE AS BIG
DATA DIFFERENCE MAKERS
In fact, having
the right people and teams may be the big data best practice. But, according to
a 2014 IDG Enterprise survey of 750 IT decision makers, 40% of big data
projects challenged by a skills shortage.
It’s
not just a specific technical big data skill set or single discipline that
companies need, but rather a range of expertise and knowledge. Yes, technical
chops are a must-have. But a broader understanding of big data best practices
in specific operational contexts – from sales and service, to finance and the
supply chain – are also essential.
Required
big data skills and roles for a successful big data strategy and organization:
EXECUTIVE
SPONSORS
Senior leaders who can craft a clear vision and rally the troops as to why
big data so important, how it can be used to transform the business and what the major impacts will
be; such leaders are necessary to build
data cultures as well.
Because many big data initiatives are every bit as transformative as
other strategic, enterprise-wide change programs, strong senior leadership is
an absolute requirement for success. The potential for disruption – in both the
positive and negative senses of that word – is high.
Therefore, effective executive sponsorship (very much including the
C-suite) may be the biggest and most important big data best practice of them
all.
BUSINESS
ANALYSTS
People who know the right questions to ask
relative to specific operations and functions, with a real focus on performance
trends; they will regularly interrogate big data to identify how specific
matrices fir within the broader strategic context relate to mega trends.
Who they are and why business need them. Even
as big data analytics technology and platform have matured, there has been an
increasing recognition that people and skills are just as important in winning
big data. And the essential resource in big data just might be the business
analyst.
DATA
SCIENTISTS
Viewed in some circles as “the sexiest job of
21st century” data scientists are most likely to have advanced degrees and
training in math and statistics, they will often lead the deep data-driving
expeditions and bold explorations into the largest and most diverse data sets,
seeking the subtlest patterns.
MARKETING PROFESSIONALS
Because so much of the potential value of the big data comes
from consumers facing-operations, marketing executive can and should get ramped
up (and rapidly), on the full range of big data practices to optimize Digital
advertising, customer
segmentation and promotional offerings.
Where the big data action (and value) is. Once upon a time big data was viewed as
domain of IT. Today, however, to deliver
game-changing potential means it’s very much a front-line, customer facing
phenomenon. And that means marketing, sales and service.
*****
Big Data: Big
Data news, analysis, research, how-to, opinion, and video
How scientists are using big data to discover rare
mineral deposits
Searching the earth for valuable mineral deposits has never been
easy, but big data is allowing scientists to gleam the signal from the noise.
Thinkstock (Thinkstock)
MORE LIKE THIS
·
·
VIDEO
Big data is shaking up
the ways our entrepreneurs start their businesses, our healthcare professionals
deliver care, and our financial services render their transactions. Now, big
data’s reach has expanded so far that it’s revolutionizing the way our
scientist search for gas, oil, and even valuable minerals.
Searching under the
surface of the earth for valuable mineral deposits has never been easy, but by
exploiting recent innovations in big data that allow scientist to gleam the
signal from the noise, experts are now capable of discovering and categorizing new
minerals more efficiently than ever before.
A new type
of mining
By mining big data, or by
crunching huge sums of numbers to predict trends, scientist are now capable of
mapping mineral deposits in new and exciting ways. Network theory, which has
been used with great success in fields ranging from healthcare to national
security, is one big data tool that scientist are coming to rely on more and
more.
As they outlined in
their research paper, researchers recently categorized
minerals as nodes and the coexistence of different types of minerals as
“lines”, or connections. By visualizing their data like this, they created an
extraordinarily useful mapping process which could help determine which areas
had a higher likelihood of possessing large mineral deposits.
While researchers used to
suffer from limitations on how much data they could process in a given time
period, today’s computers can effortlessly handle the math while the
researchers are freed up to focus on more specialized task. As minerals often
form in clusters under the surface of the earth, researchers can tap into their
computer’s predictive analytical capabilities to gain a better understanding of
which areas may be dry and which may be literal goldmines.
While geologist used to
often rely on luck when figuring out where mineral deposits lay, they can now
take their fates into their own hands. The benefits of big data aren’t
constrained to mere minerals, either; scientist have successfully used big data
in similar fashions to find deposits of gold and oil, as well as
other resources.
Using data
to lower cost
While geologist, miners,
and virtually everyone else seeking to make a living off of the earth’s
minerals have relied on data in the past, only recently have innovations made
the process of using big data so cheap that it’s available to nearly everyone.
Goldcorp’s CEO stunned the industry in 2000 when he released the company’s
proprietary data to the public in an effort to harness the public’s innovative
capabilities.
By offering a prize of a
little over a million dollars, Goldcorp ended up discovering more than $6 billion in underground deposits,
entirely because of contributors to his competition who relied on big data to
map the area and find the valuable treasures stowed away below. As big data’s
potential continues to grow, crowdsourcing operations like these will become
more commonplace, as companies such as QP software, come to realize the incredible
value of their data and understand that they can use the public to make use of
it.
The sophisticated
application of big data to create 3D maps is only one of the ways its
fundamentally reshaping the prospecting industry. As companies develop new and
greater abilities to categorize the minerals they detect
underground, mining operations will find it cheaper and easier than ever before
to locate the highly-valued prizes they seek. These kinds of developments will
come to fill in the gaps that exist with current data-analysis techniques, to
the great benefit of the industry and its consumers.
As big data’s ability to
network and visualize huge sums of information continues to grow, more mineral
deposits which have never before been unearthed are likely to be discovered. As
advances in chemistry make it easier to determine the makeup of the minerals
they uncover, companies will rapidly come to discover deposits in areas which
they previously overlooked, or which earlier test determined to be unworthy of
their time.
Big data is showing no
signs of slowing down as it continues on its crusade to reshape the world as we
know it. By tapping into this wondrous phenomenon, industries of all stripes
are revolutionizing how they collect and use information to their benefit.
While big data doesn’t hold the answer to every problem facing the world, its
already made itself invaluable to the public, and will likely continue to grow.
This article is
published as part of the IDG Contributor Network. Want to Join?
Next read this:
·
Big Data
Gary Eastwood has over 20 years' experience as a science and
technology journalist, editor and copywriter; writing on subjects such as
mobile & UC, smart cities, ICT, the cloud, IoT, clean technology,
nanotechnology, robotics & AI and science & innovation for a range of
publications. Outside his life as a technology writer and analyst, Gary is an
avid landscape photographer who has authored two photography books and
ghost-written two others.
Follow
****
Data governance in the world of “data everywhere”
With data sources, uses and solutions on the rise, data
governance is becoming even more important. But what are the phases of a
successful and scalable data governance program?
Thinkstock
(Thinkstock)
MORE LIKE THIS
·
VIDEO
In the world of “data
everywhere”, Data Governance is becoming even more important.
Organizations that develop a data warehouse ‘single source of truth’ need data
governance to ensure that a Standard Business Language (SBL) is developed and
agreed to, and the various sources of data are integrated with consistent and
reliable definitions and business rules. Decisions around who can use
what data and validations that the data being used and how it’s used meets
regulatory and compliance requirements are important.
As the enterprise data
management solutions grow and broaden, incorporating Enterprise Application
Integration (EAI), Master Data Management (MDM), increasing use of external
data, real time data solutions, data lakes, cloud, etc. Data Governance is even
more important. While there may be value in having data, if it’s not
accurate, no-one can use it and it isn’t managed, then the value of the data,
wherever it resides, diminishes greatly.
The foundational and
implementation activities needed to initiate and successfully scale a Data
Governance capability remain the same:
·
a discovery phase to assess
sentiment, define the current and future data landscape, identify stakeholders,
prioritize opportunities (and business value) and focus areas, and start to
develop goals and a Data Governance roadmap
·
a foundational implementation
phase to put the organization around data governance in place, communicate and
educate stakeholders, secure executive support, define metrics for success and
begin with an initial project, process or data set
·
a scalable implementation that
includes tools, workflows and a focus on continuous improvement
Upcoming articles will
describe approaches to each of these phases. Working through these phases
with the desired future state in mind, and with a high level roadmap to get
there, will provide you with a greater probability of establishing a data
governance capability that will scale in the long run.
This article is
published as part of the IDG Contributor Network. Want to Join?
Next read this:
·
Best PracticesNancy
Couture has more than 30 years of experience leading enterprise data management
at Fortune 500 companies and midsize organizations in both healthcare and the
financial services industries. Nancy is delivery enablement lead for Datasource Consulting in Denver.
Follow
****
G CONTRIBUTOR NETWORK Want to Join?
By Gary Eastwood, star Advisor, CIO | JUL 18, 2017 6:00 AM PT
Opinions expressed by
ICN authors are their own.
How big data Is driving technological innovation
While businesses have analyzed data for decades, recent
developments in computing have opened new doors and unleashed big data’s
potential.
Thinkstock (Thinkstock)
MORE LIKE THIS
VIDEO
Big data analytics, or
the collection and analysis of huge sums of data to discover underlying trends
and patterns, is increasingly shaking the foundations of the business world. As
the field continues to grow at an explosive rate, many innovators are asking
themselves how they can exploit big data to optimize their businesses.
While businesses have
analyzed data for decades, recent developments in computing have opened new
doors and unleashed big data’s potential. A report from SNS research
details the breadth of big data’s impact; it’s now a $57 billionmarket,
and is expected to continue to grow.
So how exactly is big
data driving changes in the marketplace, and what does the future of this
exciting industry hold?
Big data
demands a skilled workforce
Savvy firms are using big
data to foster increased consumer engagement, target new audiences with their
advertisements, and hone the efficiency of their operations. A company can’t
make use of this exciting new technology, however, if they don’t have the
necessary human capital to exploit it.
Businesses are
increasingly looking for skilled workers intimately familiar with data
collection and analysis. These talented data gurus are being scooped up in
droves by firms hoping to one day be on the Fortune 500 list, with some firms
even employing training to ensure their teams are up
to snuff. While college-educated employees are already highly valued, the
workplace of tomorrow will demand even greater academic credentials and
familiarity with tech from its workers.
Consider Teradata’s 2017
Data and Analytics Trend Report, which highlights the fact that
nearly half of global businesses are facing a dearth of employees with data
skills. As the gargantuan big data market continues to grow, firms will need
more innovative workers who aren’t intimidated by disruptive technologies.
As big data’s
capabilities grow to be more impressive, companies will need to ensure their
workforce is up to the task of analyzing it to make better decisions. The last
thing any innovative firm needs is to be left in the dust due to the lackluster
performance of its human employees.
Big data’s
disruptive impact
The disruptive nature of
big data has led it to revolutionize a number of key industries. The financial
industry, which is predicted to become heavily automated in the coming years,
now relies on software which can crunch astonishingly large amounts of data to
predict market trends and detect inefficiency’s in company’s financial
operations.
Emerging industries
like credit services, autonomous vehicles and smart
homes, too, are being fueled by the emergence of big data. The impressive smart
cars of tomorrow rely on the collection and interpretation of localized data to
avoid crashes and optimize their routes, for instance.
Many existing business
behemoths owe their place in the market to big data, as well. Netflix, a
service so proliferated it’s almost taken for granted, reshaped the home
entertainment industry largely thanks to its collection and analysis of user
data. The company can determine which of its shows will be the most successful
in any given market, predict which pilots it should fund, and even forecast how
many entertainment awards it may win by crunching ever-growing amounts of data.
Utilizing
big data for business success
As more companies and
governmental organizations sees the benefits of big data, there’s no doubt
they’ll pour more funding into it to better exploit it. Insurance companies
eager to determine who among their clients is the most likely to get into
accidents will employ increasingly advanced algorithms to detect risk. Tech
giants like Apple and Google will employ analytics to determine how their
latest gadgets might sell among their existing customers. Big data’s
opportunities are virtually limitless.
IBM’s innovation report points to how emerging
industries like 3D-printing and wearable tech will use big data to detect flaws
in their operations or gauge user’s opinions on the products they buy. One of
the most important factors in a business’ success, they point out, is how CEO’s
and CIO’s invest early in analytics to better optimize their firms and forecast
the future.
As the internet of things
continues to grow at a dizzying pace, firms will find more sources of valuable
data waiting to be collected and interpreted. There’s an entire marketplace waiting
to be exploited by those companies wise enough to invest now in analytical
forecast.
While the visions of the
digital world’s future are often grim, highlighting increased levels of
automation and pointing to existing markets which may be disrupted, they seldom
capture the full potential of big data. This extraordinary phenomenon will soon
find itself being used in the manufacturing, marketing, and delivering of
virtually every product and service.
Individuals and companies
who don’t want to be left behind should appreciate the future of the
information marketplace, and prepare for it while they can. There’s a brave new
world waiting to be capitalized on, and it belongs to those who big data.
This article is
published as part of the IDG Contributor Network. Want to Join?
Next read this:
·
Big Data
Gary Eastwood has over 20 years' experience as a science and
technology journalist, editor and copywriter; writing on subjects such as
mobile & UC, smart cities, ICT, the cloud, IoT, clean technology,
nanotechnology, robotics & AI and science & innovation for a range of
publications. Outside his life as a technology writer and analyst, Gary is an
avid landscape photographer who has authored two photography books and
ghost-written two others.
****
IDG CONTRIBUTOR
NETWORK
Peers and technology content sites are the top resources that IT
leaders rely on to stay current with new technologies.* The IDG Contributor
Network unites these two tremendous resources by providing IT pros and other
experts with a platform to share their ideas and insights. We invite
qualified contributors to apply.
What is it?
The IDG Contributor Network is a collection of blogs written by
YOU and leading IT practitioners about the technology, business opportunities
and challenges you face everyday.
We invite you to become a contributor or participate by joining
the conversations your peers spark.
If writing for the IDG Contributor Network isn't right for you,
perhaps you know a colleague or peer who might want to apply. Invite your
colleagues to join the conversation
What topics?
While our readers are interested in a wide variety of technology
topics, we're especially interested in covering the following subjects:
·
Careers/staffing
·
Mobile enterprise
·
Business continuity/disaster recovery
·
Security technologies
·
Microsoft Windows/Office/365/Server
·
Enterprise applications (SharePoint, SFDC, Google Apps)
·
Android
·
Apple/Mac/iOS/Mac IT
·
Linux
·
Programming and big data
·
Network/systems management
·
Unified Communications/VoIP
Who can join?
Technologists, analysts, experts. Do you have insights to share
with your peers? Tell us about yourself and your topic expertise in the
application below.
Our blog contributors must be the authors of original content.
We do not accept previously published articles, posts written by ghostwriters,
or promotional posts.
Read our Vendor Policy
Requests for additional information can be emailed here.
Application
To propose a contributor, please complete
the application below.* Required
*****
IDG CONTRIBUTOR NETWORK Want to Join?
BUILDING A DIGITAL ENTERPRISE: http://www.cio.com/blog/building-a-digital-enterprise/
****
Financial services: http://www.cio.com/category/financial-it/
****
Venture capital: http://www.cio.com/category/venture-capital/
****
Using Big
Data to Hack Autism
Researchers
scour datasets for clues to autism—needles in a genetic haystack
of 20,000 people
·
By Simon Makin, Spectrum on July 6, 2017
·
·
·
· ****
·
·
Share on Facebook
Share on Twitter
Share
DATA ANALYTICS (DA):
DEFINITION
data analytics (DA)
Contributor(s): Craig Stedman
Data
analytics initiatives can help businesses increase revenues, improve
operational efficiency, optimize marketing campaigns and customer service
efforts, respond more quickly to emerging market trends and gain a competitive
edge over rivals -- all with the ultimate goal of boosting business
performance. Depending on the particular application, the data that's analyzed
can consist of either historical records or new information that has been
processed for real-time
analytics uses. In addition, it can come from a mix of internal
systems and external data sources.
Types
of data analytics applications
At a high
level, data analytics methodologies include exploratory data analysis (EDA),
which aims to find patterns and relationships in data, and confirmatory data
analysis (CDA), which applies statistical techniques to determine whether
hypotheses about a data set are true or
false. EDA is often compared to detective work, while CDA is akin to the work
of a judge or jury during a court trial -- a distinction first drawn by
statistician John W. Tukey in his 1977 book Exploratory Data Analysis.
Data
analytics can also be separated into quantitative data analysis and qualitative
data analysis. The former involves analysis of numerical data with quantifiable
variables that can be compared or measured statistically. The qualitative
approach is more interpretive -- it focuses on understanding the content of
non-numerical data like text, images, audio and video, including common
phrases, themes and points of view.
At the
application level, BI and reporting provides business executives and other
corporate workers with actionable information about key
performance indicators, business operations, customers and more. In
the past, data queries and reports typically were created for end users by BI
developers working in IT or for a centralized BI team; now, organizations
increasingly use self-service
BI tools that let execs, business analysts and operational
workers run their own ad hoc queries and build reports themselves.
More
advanced types of data analytics include data
mining, which involves sorting through large data sets to identify
trends, patterns and relationships; predictive
analytics, which seeks to predict customer behavior, equipment
failures and other future events; and machine
learning, an artificial intelligence technique that uses automated
algorithms to churn through data sets more quickly than data
scientistscan do via conventional analytical modeling. Big
data analytics applies data mining, predictive analytics and
machine learning tools to sets of big data that often contain unstructured and
semi-structured data. Text
mining provides a means of analyzing documents, emails and
other text-based content.
Data
analytics initiatives support a wide variety of business uses. For example,
banks and credit card companies analyze withdrawal and spending patterns to
prevent fraud and identity
theft. E-commerce companies and marketing services providers do clickstream
analysis to identify website visitors who are more likely to
buy a particular product or service based on navigation and page-viewing
patterns. Mobile network operators examine customer data to forecast churn so
they can take steps to prevent defections to business rivals; to boost customer
relationship management efforts, they and other companies also engage in CRM
analytics to segment customers for marketing campaigns and
equip call center workers with up-to-date information about callers. Healthcare
organizations mine patient data to evaluate the effectiveness of treatments for
cancer and other diseases.
Inside
the data analytics process
Data
analytics applications involve more than just analyzing data. Particularly on
advanced analytics projects, much of the required work takes place upfront, in
collecting, integrating and preparing data and then developing, testing and
revising analytical models to ensure that they produce accurate results. In
addition to data scientists and other data analysts, analytics teams often
include data
engineers, whose job is to help get data sets ready for analysis.
PRO+
Content
Find more PRO+ content and
other member only offers, here.
·
E-Handbook
·
Buyer's Handbook
·
E-Handbook
Once the data that's needed is in place, the next step is to find and fix data quality problems that could affect the accuracy of analytics applications. That includes running data profiling and data cleansing jobs to make sure that the information in a data set is consistent and that errors and duplicate entries are eliminated. Additional data preparation work is then done to manipulate and organize the data for the planned analytics use, and data governance policies are applied to ensure that the data hews to corporate standards and is being used properly.
At that
point, the data analytics work begins in earnest. A data scientist builds an
analytical model, using predictive
modeling tools or other analytics software and programming
languages such as Python, Scala, R and SQL. The model is initially
run against a partial data set to test its accuracy; typically, it's then
revised and tested again, a process known as "training" the model
that continues until it functions as intended. Finally, the model is run in
production mode against the full data set, something that can be done once to
address a specific information need or on an ongoing basis as the data is
updated.
In some
cases, analytics applications can be set to automatically trigger business
actions -- for example, stock trades by a financial services firm. Otherwise,
the last step in the data analytics process is communicating the results
generated by analytical models to business executives and other end users to
aid in their decision-making. That usually is done with the help of data
visualization techniques, which analytics teams use to create charts
and other infographics designed to make their findings easier to understand.
Data visualizations often are incorporated into BI
dashboard applications that display data on a single screen and
can be updated in real time as new information becomes available.
***
No comments:
Post a Comment