leadership_readme

Thoughts on working together, and supporting organizations

View the Project on GitHub NewAlexandria/leadership_readme

The Structure of Corporate Data Science

For now, this is quick list of operational practices and skills that relate to data team and data science.

Team Structures

Data Ops / Engineering

Business Analyst

Data Science

Data Flow Stages

Layers

Sometimes you will see any of these service or pipeline concepts organized into Physical, Logical, Integration, and Application layers. Observability becomes implicit in this, as well.

Data Science Operation Types

baked within the above list are several core techniques, each which comes with their commensurate set of tools and history of research articles:

Data Services

📖 If you’re looking for hands-on examples, I recommend Andreas Kertz’s cookbook for data pipeline processes.

Workflows

Models in the Data Pipeline

ML pipeline flow example, and roles

Roles and Productionalizing Models

ML pipeline flow example, and roles

CDPs and DMPs

Though these kinds of services / platforms are outside the scope of data science, generally, any company that have a robust relationship to data will touch these topics.

Customer Data Platform

Several core concepts to a CDP include

Data Management Platform

DMPs are primarily for enrichment of data. Commonly, your own data is referred to as ‘1st party’ data, and the enriched data can be ‘2nd party’ or ‘3rd party’. 2nd party data is the 1st party data of another company. 3rd party data has been aggregated from the network and other sources.

Sometimes, fusions and ML extrapolations can be considered part of a company’s ‘1st party’ data, but this should be validated and agreed upon by all partners. The tolerance for certainty is a critical factor in downstream use, or design of experiments.

FOSS Tools