Data Terms | Data Terms & Concepts

A 11
B 6
C 13
D 13
E 7
F 5
G 1
H 3
I 4
L 6
M 15
N 1
O 1
P 14
Q 1
R 7
S 21
T 4
V 5
W 2

A

Aggregation

Data aggregation is any process whereby data is gathered and expressed in a summary form. When data is aggregated, atomic data rows – typically gathered from multiple sources – are replaced with ……

Allocation

The concept of data allocation is closely related to the granularity of the data. Data allocation (technique also referred to as filling gaps) is useful when dealing with data which has a different……

Analysis

Data analysis is a process of inspecting, cleansing, transforming and modeling data with the goal of discovering useful information, informing conclusion and supporting decision-making. Data analys……

Analytics

Data analytics is the process of examining and interpreting data to extract meaningful insights, draw conclusions, and support decision-making. It involves various techniques, statistical methods, ……

Anonymize

Data anonymization seeks to protect private or sensitive data by deleting or encrypting personally identifiable information from a database. Data anonymization is done for the purpose of protecting……

Appending

Data Appending is the process of filling all the missing gaps in a given database. It matches contact information of business prospects across various industries and lets you connect with them stra……

Archaeology

The art and science of recovering computer data encoded and/or encrypted in now obsolete media or formats.Data archaeology can also refer to recovering information from damaged electronic formats a……

Architecture

Data architecture is a discipline that documents an organization’s data assets, maps how data flows through its systems and provides a blueprint for managing data. The goal is to ensure that data i……

Archiving

Data archiving moves data that is no longer actively used to a separate storage device for long-term retention. Archive data consists of older data that remains important to the organization or mus……

Assurance

Data Assurance is a scalable, high-volume, configurable data-comparison product that compares row data and schema between primary (source) and replicate (target) databases, reports discrepancies, a……

Auditing

Data auditing is the process of conducting a data audit to assess how company’s data is fit for given purpose. This involves profiling the data and assessing the impact of poor quality data on the ……

Back to Top ↑

B

Backup

Data backup is a process of duplicating data to allow retrieval of the duplicate set after a data loss event. Today, there are many kinds of data backup services that help enterprises and organizat……

Bank

In Database Management and information architecture, a Data Bank or Databank is a repository of information about one or more subjects, that is, a database which is organized in a way that facilita……

Big Data

Big data is an umbrella term used to describe extremely large data sets that are difficult to process and analyze in a reasonable amount of time using traditional methods. Big data consists of stru……

Binning

Data Binning, Bucketing, or Discretization is a data smoothing and pre-processing method to group original continuous data into small, discrete bins, intervals, or categories. Each bin is considere……

Blending

Data blending is a process whereby big data from multiple sources are merged into a single data warehouse or data set. Data blending allows business analysts to cope with the expansion of data that……

Bucketing

Data bucketing, also known as data clustering or bucket-based partitioning, involves dividing data into smaller, equally-sized units called buckets. Unlike partitioning, which is based on a specifi……

Back to Top ↑

C

Caching

Caching – pronounced ‘cashing’ – is the process of storing data in a cache, which is a temporary storage area that facilitates faster access to data with the goal of improving application and sys……

Canonical Data Model

A Canonical Data Model (CDM) is a standardized framework used to unify and streamline data integration among multiple systems within a business or organization. It serves as a common reference poin……

Catalog

A Data Catalog is a collection of metadata, combined with data management and search tools, that helps analysts and other data users to find the data that they need, serves as an inventory of avail……

Categorization

Data categorization refers to the process of organizing and classifying data into predefined categories or groups based on specific criteria. This practice is crucial for data management, analysis,……

Change Data Capture

Change Data Capture (CDC) is a method for capturing and tracking changes made to data in real-time. It allows businesses to identify and capture every individual change made to the data, including ……

Classification

Data classification is the process of organizing data into categories that make it easy to retrieve, sort and store for future use. A well-planned data classification system makes essential data ea……

Cleansing

Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, in……

Composable Data

‘Composable data’ refers to the concept of making data components or elements modular and easily combinable to create flexible and dynamic datasets. The term draws inspiration from the idea of comp……

Compression

Data compression is a reduction in the number of bits needed to represent data. Compressing data can save storage capacity, speed up file transfer and decrease costs for storage hardware and networ……

Consolidation

Data consolidation refers to the collection and integration of data from multiple sources into a single destination. During this process, different data sources are put together, or consolidated, i……

Corruption

Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer, transm……

Curation

Data curation is the process of creating, organizing and maintaining data sets so they can be accessed and used by people looking for information. It involves collecting, structuring, indexing and ……

Customization

Data customization is the potter’s wheel that transforms that clay into unique and useful forms, tailored to your specific needs and goals. It’s about shaping information to fit your exact requirem……

Back to Top ↑

D

DataOps

DataOps is a collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and data consumers across an organization…….

Datafication

Datafication refers to the collective tools, technologies and processes used to transform an organization to a data-driven enterprise. This buzzword describes an organizational trend of defining th……

Decoding

Data Decoding is a crucial process in the realm of software development, where data encoded in one format is converted back into its original form or into a format that can be understood and utiliz……

Deduplication

In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn low……

Degradation

Data decay, also known as data degradation, refers to the gradual deterioration of data quality over time. It is a common phenomenon that affects all types of data, including customer data, marketi……

Democratization

Data democratization is the ability for information in a digital format to be accessible to the average end user. The goal of data democratization is to allow non-specialists to be able to gather a……

Denormalization

Data denormalization is the process of restructuring a relational database by adding redundant data to one or more tables. The aim is to improve query performance by reducing the need for joins. Th……

Deserialization

Computer data is generally organized in data structures such as arrays, records, graphs, classes, or other configurations for efficiency. When data structures need to be stored or transmitted to an……

Design

‘Data design’ refers to the process of structuring, organizing, and planning the storage and retrieval of data in a way that meets specific requirements and objectives. It involves making decisions……

Dictionary

A data dictionary is a centralized repository of metadata that provides a comprehensive description of the data used in a database, information system, or project. It serves as a reference for unde……

Discovery

Data discovery refers to the process of identifying, locating, and understanding relevant information within a dataset or a data environment. It involves exploring and analyzing data to uncover pat……

Discretization

Data discretization is defined as a process of converting continuous data attribute values into a finite set of intervals with minimal loss of information and associating with each interval some sp……

Domain

In data management and database analysis, a data domain is the collection of values that a data element may contain. The rule for determining the domain boundary may be as simple as a data type wit……

Back to Top ↑

E

Encryption

Data encryption in data engineering refers to the process of converting plaintext data into a secure and unreadable format (ciphertext) using encryption algorithms. This practice is employed to pro……

Engineering

Data engineering refers to the building of systems to enable the collection and usage of data. This data is usually used to enable subsequent analysis and data science; which often involves machine……

Enrichment

Data enrichment is the process of improving the accuracy and reliability of your raw customer data. Teams enrich data by adding new and supplemental information and verifying the information agains……

Entry

Data entry is the process of digitizing data by entering it into a computer system for organization and management purposes. It is a person-based process and is ‘one of the important basic’ tasks n……

Exchange

Data exchange is the process of taking data structured under a source schema and transforming it into a target schema, so that the target data is an accurate representation of the source data. Data……

Export

Data export refers to the process of extracting and saving data from a system, database, or application to a file or another external destination. This exported data can then be used for various pu……

Extraction

Data extraction is the act or process of retrieving data out of (usually unstructured or poorly structured) data sources for further data processing or data storage (data migration). The import int……

Back to Top ↑

F

Fabric

Data fabric is an architecture that facilitates the end-to-end integration of various data pipelines and cloud environments through the use of intelligent and automated systems. Over the last decad……

Federation

Data federation is a technology that virtually unifies data from different sources and makes it accessible under a uniform data model. The underlying data stores in a federated data store continue ……

Filtering

Data filtering is the process of choosing a smaller part of your data set and using that subset for viewing or analysis. Filtering is generally (but not always) temporary – the complete data set is……

Fragmentation

Fragmentation is a process of dividing the whole or full database into various subtables or sub relations so that data can be stored in different systems. The small pieces or sub relations or subta……

Fusion

Data fusion is the process of integrating multiple data sources to produce more consistent, accurate, and useful information than that provided by any individual data source….

Back to Top ↑

G

Governance

Data governance (DG) is the process of managing the availability, usability, integrity and security of the data in enterprise systems, based on internal data standards and policies that also contro……

Back to Top ↑

H

Harmonization

Data Harmonization is the process of combining data from multiple sources and ensuring that it is consistent and compatible. Harmonization aims to create a unified view of data, regardless of its s……

Hashing

Data hashing is a process that involves converting input data (or a ‘message’) into a fixed-length string of characters, which is typically a hash code or hash value. The primary purpose of data ha……

Homogenization

Data homogenization refers to the process of standardizing or normalizing data to ensure consistency and uniformity across different sources, formats, or representations. The goal is to create a un……

Back to Top ↑

I

Indexing

Data indexing refers to the process of organizing and cataloging data so that it can be quickly retrieved and analyzed. This technique creates a sort of roadmap that lists the location of data on a……

Ingestion

Data Ingestion is the process of obtaining, importing, and processing data for later use or storage in a database. This can be achieved manually, or automatically using a combination of software an……

Integration

Data integration refers to the process of bringing together data from multiple sources across an organization to provide a complete, accurate, and up-to-date dataset for BI, data analysis and other……

Integrity

Data integrity refers to the accuracy, consistency, and completeness of data throughout its lifecycle. It’s a critically important aspect of systems which process or store data because it protects ……

Back to Top ↑

L

Lake

A data lake is a centralized repository designed to store, process, and secure large amounts of structured, semistructured, and unstructured data. It can store data in its native format and process……

Lakehouse

A data lakehouse is a data management architecture that combines the key features and the benefits of a data lake and a data warehouse. Data lakehouse platforms merge the rigorous data management f……

Lifecycle

Data lifecycle management (DLM) is an approach to managing data throughout its lifecycle, from data entry to data destruction. Data is separated into phases based on different criteria, and it move……

Lineage

All the data in an organization has a story; data lineage is about telling the story of that data as it travels through the various systems and platforms. So, data lineage is metadata over time, in……

Linking

Data linking is taking information about a person or an entity from various sources and collating them under different parameters to come up with a trend or pattern. This unique research tool has a……

Loading

Data loading is the process of copying and loading data or data sets from a source file, folder or application to a database or similar application. It is usually implemented by copying digital dat……

Back to Top ↑

M

Management

Data management is the practice of collecting, keeping, and using data securely, efficiently, and cost-effectively. The goal of data management is to help people, organizations, and connected thing……

Mapping

Data mapping is a vital process within data management that involves defining relationships between data elements and facilitating effective data integration, transformation, and analysis. This pra……

Mart

A data mart is a subset of a data warehouse focused on a particular line of business, department, or subject area. Data marts make specific data available to a defined group of users, which allows ……

Masking

Data masking is a method of creating a structurally similar but inauthentic version of an organization’s data that can be used for purposes such as software testing and user training. The purpose i……

Master data

Master data represents ‘data about the business entities that provide context for business transactions’. The most commonly found categories of master data are parties (individuals and organisation……

Master data management

Master data management (MDM) is a process that creates a uniform set of data on customers, products, suppliers and other business entities across different IT systems. One of the core disciplines i……

Matching

Data matching is the process of finding identical entries from one or more collections of data and unifying the data records. It could be performed between datasets to ensure that data from various……

Merging

Data merging is the process of combining two or more datasets into a single dataset. It is a critical step in modern data pipelines when working with data from multiple sources or with different fo……

Mesh

Data mesh is a cultural and organizational shift for data management focusing on federation technology that emphasizes the authority of localized data management. Data mesh is intended to enable ea……

Migration

Data migration is the process of transferring data from one storage system or computing environment to another.

There are many reasons your enterprise might need to undertake a data migr……

Mining

Data mining is the process of analyzing hidden patterns of data according to different perspectives in order to turn that data into useful and often actionable information. Data is collected and as……

Modeling

Data modeling is a representation of the data structures in a table for a company’s database and is a very powerful expression of the company’s business requirements. This data model is the guide u……

Monetization

Data Monetization refers to the process of using data to obtain quantifiable economic benefit. Internal or indirect methods include using data to make measurable business performance improvements a……

Monitoring

Data monitoring is the process of analyzing and evaluating data to ensure the availability of high-quality data meeting business purposes and standards. It plays a crucial role in avoiding the degr……

Munging

The process of converting unstructured data into a structured format is typically referred to as ‘data wrangling’ or ‘data munging’. Both terms refer to the process of transforming and mapping data……

Back to Top ↑

N

Normalization

Database normalization is the process of structuring a database according to what’s called normal forms, with the final product being a relational database, free from data redundancy. More specific……

Back to Top ↑

O

Obfuscation

Data obfuscation, also known as data masking or data anonymization, is a process of altering or transforming data in order to make it unintelligible or less meaningful to unauthorized individuals. ……

Back to Top ↑

P

Parsing

Data parsing is a method of restructuring and converting unstructured data. If you’re transforming HTML into plain text, for example, that’s data parsing. It’s a process that transforms unstructure……

Partitioning

Database partitioning (also called data partitioning) refers to breaking the data in an application’s database into separate pieces, or partitions. These partitions can then be stored, accessed, an……

Personalization

Data personalization refers to the customization of content, services, or experiences based on an individual’s preferences, behaviors, demographics, or other relevant data. The aim of data personal……

Pipeline

A data pipeline is a set of network connections and processing steps that moves data from a source system to a target location and transforms it for planned business uses. Data pipelines are common……

Planning

Data planning refers to the strategic process of developing a comprehensive and organized approach to managing and leveraging data within an organization. It involves defining goals, establishing m……

Point

A discrete unit of information is called a data point. Any single fact is a data point, broadly speaking. A data point can be quantitatively or graphically represented and is typically produced fro……

Preprocessing

Data preprocessing, a component of data preparation, describes any type of processing performed on raw data to prepare it for another data processing procedure. It has traditionally been an importa……

Preservation

Data preservation involves formal activities that are governed by policies, regulations and strategies directed towards protecting and prolonging the existence and authenticity of data and its meta……

Profiling

Data profiling refers to the process of examining, analyzing, reviewing and summarizing data sets to gain insight into the quality of data. Data quality is a measure of the condition of data based ……

Proliferation

‘Data proliferation’ is an umbrella term concerned with the large number of files and amount of data stored by entities such as governments and businesses. The massive amount of data coming in dail……

Protection

Data protection is the process of safeguarding important information from corruption, compromise or loss.

The importance of data protection increases as the amount of data created and st……

Provenance

Data provenance (also referred to as “data lineage”) is metadata that is paired with records that details the origin, changes to, and details supporting the confidence or validity of data. Data pro……

Pseudonymization

Pseudonymization is the process of replacing personally identifiable fields within a data record with one or more artificial identifiers or pseudonyms. Commonly used to protect the privacy of perso……

Purging

Data purging is the process of permanently deleting data from a system or database. This process is important in modern data pipelines to ensure that data that is no longer needed or is outdated is……

Back to Top ↑

Q

Quality

Data quality is a measure of the condition of data based on factors such as accuracy, completeness, consistency, reliability and whether it’s up to date. Measuring data quality levels can help orga……

Back to Top ↑

R

Rebalancing

Data rebalancing refers to the process of redistributing data across nodes or partitions in a distributed system to ensure optimal utilization of resources and balanced load. As data is added, remo……

Reconciliation

Data Reconciliation is a process that involves comparing and matching data from different sources, such as databases, files, or systems, to identify discrepancies, resolve conflicts, and ensure con……

Recovery

Data recovery is a software-driven process that enables you to recover and restore lost, deleted, inaccessible, corrupted, or damaged files so you can get back to work quickly….

Redundancy

Accurate and reliable data is vital for productivity and effective decision-making in an organization. Data redundancy can protect and sustain reliable data or present significant drawbacks. Unders……

Remanence

Data remanence is the residual representation of digital data that remains even after attempts have been made to remove or erase the data. This residue may result from data being left intact by a n……

Replication

Data replication is the process of creating and maintaining multiple copies of the same data in different locations as a way of ensuring data availability, reliability and resilience across an orga……

Reporting

This is the process of gathering data from various sources, analyzing it, and presenting the findings in a concise form. Typically, the purpose of such reports is to allow decision-makers to assess……

Back to Top ↑

S

Sampling

Data sampling is a statistical analysis technique used to select, manipulate and analyze a representative subset of data points to identify patterns and trends in the larger data set being examined……

Sandbox

A data sandbox, in the context of big data, is a scalable and developmental platform used to explore an organization’s rich information sets through interaction and collaboration. It allows a compa……

Science

Data science combines math and statistics, specialized programming, advanced analytics, artificial intelligence (AI), and machine learning with specific subject matter expertise to uncover actionab……

Scraping

Data Scraping refers to the process of extracting data from a website or another source, typically to transform and store it in a structured form. While the concept has existed for a long time, it’……

Scrubbing

Data scrubbing refers to the procedure of modifying or removing incomplete, incorrect, inaccurately formatted, or repeated data in a database. The key objective of data scrubbing is to make the dat……

Segmentation

Data Segmentation is the process of taking the data you hold and dividing it up and grouping similar data together based on the chosen parameters so that you can use it more efficiently within mark……

Serialization

Data serialization is the process of converting complex data structures, such as objects or dictionaries, into a format that can be stored or transmitted, such as a byte stream or JSON string. This……

Set

A data set is a collection of related, discrete items of related data that may be accessed individually or in combination or managed as a whole entity….

Shapes

(Long, Short, Wide, Narrow and Big) - The concept of ‘shapes of data’ refers to different structures and organizations that datasets can take. Understanding the shape of data is fundamental in data……

Sharding

Data Sharding, also known as horizontal partitioning, is a technique used to break down large datasets into smaller, more manageable pieces called shards. Each shard contains a subset of the data, ……

Shredding

Data shredding, also known as data partitioning or data fragmentation, is the process of breaking down large datasets into smaller, more manageable pieces for easier processing and analysis. Shredd……

Shuffling

Data shuffling is a process in modern data pipelines where the data is randomly redistributed across different partitions to enable parallel processing and better performance. Shuffling is generall……

Silo

A data silo is a repository of data that’s controlled by one department or business unit and isolated from the rest of an organization, much like grass and grain in a farm silo are closed off from ……

Snapshot

Data Snapshot is a technology that allows businesses to capture and store a static copy of their data at a specific point in time. It provides a reliable and consistent view of data, enabling busin……

Spill

A data spill as a “security incident that results in the transfer of classified information onto an information system not authorized to store or process that information. Data spills may also be r……

Split

Data splitting is when data is divided into two or more subsets. Typically, with a two-part split, one part is used to evaluate or test the data and the other to train the model….

Staging

Data Staging refers to the process of preparing and organizing data for further processing and analysis. It involves extracting raw data from various sources, transforming it into a consistent form……

Standardization

Data Standardization involves applying a set of predefined rules and procedures to transform raw data into a consistent and uniform format. It involves cleaning, structuring, and organizing data to……

Stewarding

Data stewarding, also known as data stewardship, is the practice of managing and overseeing an organization’s data assets to ensure their quality, integrity, and proper usage. Data stewards play a ……

Streaming

Data streaming is the continuous transfer of data from one or more sources at a steady, high speed for processing into specific outputs. Data streaming is not new, but its practical applications ar……

Synchronization

Data synchronization is the process of maintaining the consistency and uniformity of data instances across all consuming applications and storing devices. It ensures that the same copy or version o……

Back to Top ↑

T

Tagging

Data tagging is the process of adding metadata to your file data in the form of key value pairs. These values give context to your data, so that others can easily find it in search and execute acti……

Thinking

Data thinking is a product design framework with a particular emphasis on data science. It integrates elements of computational thinking, statistical thinking, and domain thinking. In the context o……

Tokenization

Data tokenization is a substitution technique in which private or sensitive data elements are replaced with randomly generated alphanumeric strings. These strings or tokens have no value and can’t ……

Transformation

Data transformation is the process of converting data from one format, such as a database file, XML document or Excel spreadsheet, into another.

Transformations typically involve convert……

Back to Top ↑

V

Validation

Data validation is the practice of checking the integrity, accuracy and structure of data before it is used for a business operation. Data validation operation results can provide data used for dat……

Verification

Data verification is the process of checking and confirming that the data entered or stored in a system or database is accurate, complete, and consistent with the source data.

The goal o……

Versioning

Data Versioning is a technique used in data management to track changes made to data over time. It involves creating and storing different versions of data, allowing businesses to access and analyz……

Virtualization

Data virtualization is an umbrella term used to describe an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data……

Visualization

Data visualization is the practice of translating information into a visual context, such as a map or graph, to make data easier for the human brain to understand and pull insights from. The main g……

Back to Top ↑

W

Warehouse

A data warehouse is a repository of data from an organization’s operational systems and other sources that supports analytics applications to help drive business decision-making. Data warehousing i……

Wrangling

Data wrangling is a process that data scientists and data engineers use to locate new data sources and convert the acquired information from its raw data format to one that is compatible with autom……

Back to Top ↑