Glossary of Metadata Terms” simplifies the language around data, making advanced metadata concepts accessible to professionals and enthusiasts alike.
Annotation
Metadata that provides additional information about a data set, often used for explanation or to give context.
API (Application Programming Interface)
A set of routines, protocols, and tools for building software applications, specifying how software components should interact.
Archival Metadata
Information describing items in an archive, designed to help users find archival materials and to help archivists manage collections.
Big Data
Extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.
Business Intelligence (BI)
The strategies and technologies used by enterprises for the data analysis of business information to support better business decision making.
Cataloging
The process of creating metadata representing information resources, such as books, sound recordings, moving images, etc.
Cloud Storage
A model of computer data storage in which the digital data is stored in logical pools, said to be on “the cloud”, representing multiple physical servers.
Controlled Vocabulary
A predefined set of offered that limits the variability of data entry.
Crosswalks
Tools that map elements, terms, and vocabularies from one metadata schema to another to facilitate interoperability.
Curation
The activities associated with maintaining, preserving, and adding value to digital research data throughout its lifecycle.
Data Aggregation
A type of data and information mining process where data is searched, gathered, and presented in a summarized, report-based form to achieve specific objectives or processes and/or conduct human analysis.
Data Anonymization
The process of either encrypting or removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous.
Data Cleansing
The process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database.
Data Cubes
A multi-dimensional array of values, typically visualized as a three-dimensional block, used in business reporting to analyze and summarize data across multiple dimensions.
Data Dictionary
A centralized repository of metadata that defines the attributes and types of data elements within a database or collection.
Data Enrichment
Enhancing existing information by merging with other pieces of data, often from different sources, to add value to the original dataset.
Data Federation
The practice of viewing and managing several unrelated data sources as if they were a single entity, without the need for data integration.
Data Governance
The overall management of the availability, usability, integrity, and security of data used in an organization.
Data Hygiene
The process of cleaning data, removing errors, and ensuring that data is consistent and accurate.
Data Indexing
The process of organizing data according to a specific schema or plan to improve search and retrieval speeds within a database.
Data Integration
The process of combining data from different sources to provide a unified view.
Data Lake
A storage repository that holds a vast amount of raw data in its native format until it is needed.
Data Lifecycle
The sequence of stages that data goes through from creation and initial storage to the time when it becomes obsolete and is deleted.
Data Lifecycle Management (DLM)
The policies, processes, and procedures used to manage data throughout its useful life and ensure its compliance with internal and external regulations and policies.
Data Lineage
The life cycle of data, including its origins, what happens to it, and where it moves over time.
Data Mart
A subset of a data warehouse focused on a particular subject area or line of business.
Data Mart vs. Data Warehouse
A data mart is a subset of a data warehouse often confined to a specific business line or team, whereas a data warehouse is a system used for reporting and data analysis at a central level.
Data Masking
The process of creating a structurally similar but inauthentic version of an organization’s data that can be used for purposes such as software testing and user training.
Data Mining
The practice of examining large pre-existing databases in order to generate new information.
Data Model
An abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world.
Data Modeling
The process of creating a data model for an information system by applying formal data modeling techniques.
Data Preservation
Information necessary to maintain and provide access to a wide range of digital resources over time.
Data Profiling
The process of examining the data available in an existing data source and collecting statistics and information about that data.
Data Provenance
Information that helps determine the derivation history of a data record.
Data Quality
The degree to which data is accurate, complete, reliable, and consistent with the intention of use.
Data Scrubbing
The process of amending or removing data in a database that is incorrect, incomplete, improperly formatted, or duplicated.
Data Silo
A collection of data held by one group that is not easily or safely accessible by other groups.
Data Staging
An area used for data processing (typically within an ETL process) where data is prepared for analysis or loading into a more permanent data store.
Data Stewardship
The management and oversight responsibility to ensure enterprise-wide data across an organization is managed appropriately.
Data Transformation
The process of converting data from one format or structure into another format or structure.
Data Visualization
The graphical representation of information and data to provide an accessible way to see and understand trends, outliers, and patterns.
Data Warehousing
A system used for reporting and data analysis, and is considered a core component of business intelligence.
Data Warehousing Techniques
Methods and processes involved in designing, implementing, and maintaining a data warehouse.
Descriptive Metadata
Metadata that describes resources for purposes such as discovery and identification, such as title, abstract, author, and keywords.
Digital Asset Management (DAM)
Practices and tools for managing, storing, organizing, and distributing digital assets.
Digital Object Identifier (DOI)
A persistent identifier or handle used to uniquely identify objects, standardized by the International Organization for Standardization (ISO).
Document Management System (DMS)
Computer systems and software used to manage, track, and store documents electronically and reduce paper.
Dublin Core
A set of vocabulary terms used to describe web resources such as video, images, web pages, etc.
EAD (Encoded Archival Description)
A standard for the encoding of archival finding aids using XML.
Entity-Relationship Model
A data model for describing the data or information aspects of a business domain or its process requirements.
ETL (Extract, Transform, Load)
A process in database usage and especially in data warehousing that involves:
- Extract – The process of reading data from a database.
- Transform – The process of converting the extracted data from its previous form.
- Load – The process of writing the data into the target database.
FAIR Principles
A set of guiding principles to make data Findable, Accessible, Interoperable, and Reusable.
Folksonomy
A system of classification derived from the practice and method of collaboratively creating and managing tags to annotate and categorize content.
Geospatial Metadata
Metadata that describes geographic information and properties, including maps and GPS data.
Granularity
The level of detail contained in a dataset; refers to the extent to which a database is subdivided into smaller pieces.
Hierarchical Data Format (HDF)
A set of file formats (HDF4, HDF5) designed to store and organize large amounts of data.
Interoperability
The ability of different systems and organizations to effectively share, use, and interpret data across varied processes.
ISO 19115
An international standard for describing geographic information and services.
JSON (JavaScript Object Notation)
A lightweight data-interchange format that is easy for humans to read and write and easy for machines to parse and generate.
Linked Data
A method of publishing structured data so that it can be interlinked and become more useful.
Load
The process of writing the data into the target database.
LOD (Linked Open Data)
A method of publishing structured data so that it can be interlinked and become more useful through semantic queries.
Machine Learning Metadata
Metadata that describes the data used for training machine learning models, the parameters of the models, and the evaluation of models.
MARC (Machine-Readable Cataloging)
A standard for the representation and communication of bibliographic and related information in machine-readable form.
Master Data Management (MDM)
A technology-enabled discipline in which business and IT work together to ensure the uniformity, accuracy, stewardship, semantic consistency, and accountability of the enterprise’s official shared master data assets.
Metadata
Data that provides information about other data. It describes characteristics such as content, format, origin, and structure.
Metadata Extraction
The process of automatically identifying and extracting metadata from within data files or external sources.
Metadata Framework
An organized model that defines and categorizes the types and relationships of metadata to effectively describe data assets.
Metadata Harvesting
The process of collecting metadata from various sources to be stored and managed centrally.
Metadata Injection
The process of dynamically inserting metadata into a data management framework, often to automate processes.
Metadata Management
The administration of data that describes other data, with the objective to provide better control over and utilization of the main data.
Metadata Model
The abstract framework that describes the structure of metadata, typically specifying types of metadata and their interrelationships.
Metadata Publishing
The process of making metadata available to users and systems, typically through repositories or registries that can be accessed programmatically or through user interfaces.
Metadata Refresh
The process of updating existing metadata to reflect changes in the underlying data or to improve metadata quality.
Metadata Registries
Centralized systems where metadata definitions are stored and maintained, ensuring consistency across data sets and systems.
Metadata Repository
A centralized location where metadata is stored and managed.
Metadata Schema
A structure of metadata attributes and their interrelationships; an example would be Dublin Core, a schema for descriptive metadata.
Metadata Versioning
The practice of keeping multiple versions of metadata records to track changes over time and manage different stages of data lifecycle.
METS (Metadata Encoding and Transmission Standard)
A standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital vehicle.
MODS (Metadata Object Description Schema)
A schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications.