Post

Chapter 2.2 – Features, Labels, and Models

Understanding features, labels, and models through the lens of automation. Mapping ML concepts to familiar inputs, logic, and outputs.

Chapter 2.2 – Features, Labels, and Models

Decoding the ML Language: Inputs, Logic, and Outputs


After learning about data quality, I wondered: What does a model actually look at, and what is it trying to predict?

Turns out, ML uses the same pattern as automation—Inputs → Logic → Outputs.

  • Features = Inputs (data the model sees)
  • Model = Logic (learned, not hand-coded)
  • Labels = Outputs (what you want to predict)

Key insight: These aren’t new concepts—just new names for a familiar pattern. In ML, the model learns the logic from examples, instead of you writing it.


1. Why Understanding This Helps

Every ML system—spam filter or ChatGPT—breaks down into:

  • Features: What info does the system have?
  • Labels: What is it trying to predict?
  • Model: What logic connects them?

Once you see this, ML problems become easier:

  • Start with “What am I predicting?” (labels)
  • Then, “What inputs help predict it?” (features)
  • Clear labels = clear success criteria

Automation analogy: Like knowing variables, logic, and outputs in code. You can’t build good infrastructure without knowing what inputs drive what outputs.


2. The Fundamental Pattern: Inputs → Logic → Outputs

Here’s how the Inputs → Logic → Outputs pattern looks in both automation and machine learning:

Automation (Terraform)

  • Inputs: variables (instance_type, environment, instance_count)
  • Logic: your code (resource definitions, rules)
  • Outputs: what gets created (infrastructure, computed values)
See Automation Example (Terraform)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Inputs (variables)
variable "instance_type" { default = "t3.medium" }
variable "environment" { default = "prod" }
variable "instance_count" { default = 3 }

# Logic (your code)
resource "aws_instance" "server" {
  count         = var.instance_count
  instance_type = var.instance_type
  tags = {
    Environment = var.environment
    CostCenter  = var.environment == "prod" ? "production" : "development"
  }
}

# Outputs (what gets created)
output "server_ips" {
  value = aws_instance.server[*].private_ip
}

Machine Learning

  • Features: inputs (files_changed, environment, time_of_day, team)
  • Model: learned logic (patterns from data)
  • Label: output (e.g., “High Risk”)
See Machine Learning Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Features (inputs)
features = {
  'files_changed': 150,
  'environment': 'prod',
  'time_of_day': 14,
  'team': 'Platform'
}

# Model (learned logic)
model = trained_deployment_risk_model

# Label (predicted output)
prediction = model.predict(features)
# Output: **"High Risk"**

Engineering insight: In automation, you write the logic. In ML, the model learns the logic from examples.

No matter the domain, the core pattern is the same: you start with inputs, apply logic (written or learned), and get outputs.


3. Features: The Inputs to Your Model

What is a Feature?

A feature is a measurable property you feed into a model—think Terraform variables, but for ML.

Feature Types

  • Numeric: quantifiable values (e.g., file count, time)
  • Categorical: categories (e.g., environment, team)
  • Boolean: true/false flags (e.g., is_weekend)
See deployment risk feature examples
Feature NameTypeExample ValueWhat It Represents
files_changedNumeric150Size of deployment
environmentCategorical“prod”Where deploying
time_of_dayNumeric14 (2 PM)When deploying
day_of_weekNumeric2 (Tuesday)Day of deployment
teamCategorical“Platform”Who is deploying
previous_failuresNumeric3Historical failures
deployment_duration_avgNumeric12.5 (minutes)Team’s avg speed

Each feature is something the model can use to make predictions.

Good Features vs Bad Features

The best features are measurable, available before prediction, and make logical sense for the outcome.

See table: Good Features vs Bad Features
FeatureTypePredictive PowerWhy/Why Not?
files_changedNumeric✅ HighLarge deployments are riskier
environmentCategorical✅ HighProd deployments need more care
previous_failuresNumeric✅ HighHistory predicts future risk
developer_shirt_colorCategorical❌ NoneNo relationship to deployment risk
commit_message_lengthNumeric❌ LowDoesn’t indicate actual risk

Focus on features that are relevant, reliable, and available before prediction.

Automation Analogy: Just as you only use variables in Terraform that actually affect your infrastructure, in ML you want features that truly impact the outcome—not just any data you can collect.

See Quick Checklist: Picking the Right Features | Principle | Good Example | Bad Example | | ----------------------------- | ----------------------------------------------- | ----------------------------------------------------------- | | Available at prediction time? | `files_changed` (known before deploying) | `deployment_outcome` (this is what we're predicting!) | | Correlated with the outcome? | `time_of_day` (night deployments are riskier) | `developer_coffee_preference` (tempting, but no) | | Reliable and consistent? | `environment` (always known) | `server_mood` (doesn't exist) | | Avoids leakage? | `historical_error_rate` (from past deployments) | `post_deployment_error_count` (only known after deployment) |

If your features pass these checks, you’re on the right track.

Warning: Data leakage is when you accidentally include information that wouldn’t be available in production. It’s like using terraform state as an input to terraform plan—you’re basically cheating by looking at the answer. Your model will look amazing in testing and fail miserably in production.


4. Labels: The Outputs You’re Predicting

Your model is only as good as your labels. Labels are the answers you want your model to predict.

What is a Label?

  • The value you want to predict (e.g., “High Risk”).
  • In automation, this is like a Terraform output.
See real-world label examples
ScenarioFeatures (Inputs)Label (Output)
Email spam detectionEmail text, sender, links“Spam” / “Not Spam”
Image classificationPixel values“Cat” / “Dog” / “Bird”
Deployment riskFiles changed, time, team“High” / “Medium” / “Low”
Server failure predictionCPU, memory, disk usage“Will fail” / “Won’t fail”

Label Types (Classification vs Regression)

This section and table compare the two main types of labels in machine learning: classification (categories) and regression (numeric values).

Label TypeWhat it PredictsExample Label(s)Example Use Case
ClassificationCategory/class“High Risk”, “Low Risk”Deployment risk
RegressionNumeric value23.5 (minutes)Deployment duration

Automation Analogy: Labels are like Terraform outputs—classification is like conditional outputs, regression is like numeric outputs.

Good vs Bad Labels: Quick Checklist | Principle | Good Example | Bad Example | | --------------------- | ---------------------------------------------- | ---------------------------------------------- | | Clear & Unambiguous | deployment_succeeded: True/False | deployment_quality: Good/Bad | | Consistently Defined | All "High Risk" deployments meet same criteria | Different people label "High Risk" differently | | Verifiable | deployment_failed: True (check the logs) | deployment_felt_risky: True (opinion) | | Balanced Distribution | 40% Low, 35% Medium, 25% High | 95% Low, 4% Medium, 1% High |


Warning: Bad labels = bad model, just like bad configuration = broken infrastructure.


5. Models: The Learned Logic

What Is a Model?

A model is the learned logic that connects features (inputs) to labels (outputs). Instead of you writing the rules, the model discovers patterns by analyzing examples.

Automation Analogy: Like a Terraform config that transforms variables into outputs, but the logic is learned from data, not hand-coded.

The Mental Model

flowchart LR
  subgraph INPUTS["Features (Inputs)"]
    F1["files_changed = 150"]
    F2["environment = prod"]
    F3["team = platform"]
    F4["time_of_day = 14"]
    style F1 fill:#e1f5ff,stroke:#007acc,stroke-width:2px
    style F2 fill:#e1f5ff,stroke:#007acc,stroke-width:2px
    style F3 fill:#e1f5ff,stroke:#007acc,stroke-width:2px
    style F4 fill:#e1f5ff,stroke:#007acc,stroke-width:2px
  end

  INPUTS --> M["ML Model\n(Learned patterns from history)"]
  style M fill:#fff4e1,stroke:#ff9900,stroke-width:2px
  M --> L["Prediction\nHigh Risk Deployment"]
  style L fill:#d4f7d4,stroke:#228B22,stroke-width:2px

The model learns rules like:

1
2
3
IF files_changed > 100 AND environment == "prod" THEN risk = "High"
IF files_changed < 50 AND environment == "dev" THEN risk = "Low"
IF time_of_day BETWEEN 0 AND 6 THEN increase risk by one level

You don’t write these rules—the model finds them from data.

Models as Functions

Mathematically, a model is a function:

1
2
3
4
5
6
7
8
9
# Automation: You write the function
def calculate_cost(instance_count, instance_type, hours):
    hourly_rate = get_rate(instance_type)
    return instance_count * hourly_rate * hours

# ML: The model IS the function (learned from data)
def predict_risk(features):
    # Complex learned logic inside
    return risk_level  # "High" / "Medium" / "Low"

Terraform analogy:

1
2
3
4
5
6
7
8
9
10
11
12
13
# Your explicit logic
locals {
  is_production = var.environment == "prod"
  requires_approval = local.is_production && var.change_size > 100
}

# ML learned logic (conceptual)
model_logic {
  learned_pattern_1 = features.environment == "prod" && features.files_changed > 100
  learned_pattern_2 = features.previous_failures > 3
  learned_pattern_3 = features.time_of_day < 6 || features.time_of_day > 22
  risk = combine(learned_pattern_1, learned_pattern_2, learned_pattern_3)
}

What’s Inside a Model?

Every model type represents learned logic differently. Here’s how some common models work under the hood:

See table: Common Model Types & Automation Analogies
Model TypeInternal LogicAutomation Analogy
Decision TreeSeries of if-then rulesNested Terraform conditionals
Linear ModelWeighted sum of featuresTerraform locals with arithmetic
Neural NetworkLayers of transformationsPipeline of processing steps
Random ForestMultiple decision trees votingMultiple validation checks

Tip: For most use cases, treat the model as a black box that learned logic from data. You don’t need to understand the internals to use it effectively.

Model Artifacts

After training, a model becomes a set of files you can deploy:

See example: Model Artifacts
1
2
3
4
5
deployment-risk-model/
├── model.pkl                 # The learned weights/parameters
├── feature_names.json        # Which features it expects
├── label_mapping.json        # How it encodes outputs
└── metadata.json             # Version, training date, etc.

Automation Analogy: Like Terraform state files, models capture learned state:

1
2
terraform.tfstate    # Current infrastructure state
model.pkl            # Current learned patterns

Best Practice: Both models and Terraform state files need to be versioned, backed up, and managed carefully.


6. Putting It All Together: The Complete Picture

Let’s trace a full example with our deployment risk assessment:

Step 1: Collect Training Data

Historical deployments with features AND labels:

files_changedenvironmenttime_of_dayteamprevious_failuresrisk_level (label)
150prod14Platform3High
25dev10Frontend0Low
200prod3Backend1High
50staging16Platform1Medium
5dev11Frontend0Low

Step 2: Train the Model

1
2
3
4
5
6
7
# Separate features from labels
features = data[['files_changed', 'environment', 'time_of_day', 'team', 'previous_failures']]
labels = data['risk_level']

# Train model (it learns patterns)
model = DecisionTreeClassifier()
model.fit(features, labels)

What happens: The model analyzes the training data and discovers patterns, e.g.:

  • Large files_changed + prod environment → “High”
  • Small files_changed + dev environment → “Low”
  • time_of_day < 6 → increased risk
  • High previous_failures → increased risk

Step 3: Use the Model for Predictions

New deployment (no label yet):

1
2
3
4
5
6
7
8
9
10
11
12
# Features for new deployment
new_deployment = {
    'files_changed': 175,
    'environment': 'prod',
    'time_of_day': 22,          # 10 PM
    'team': 'Backend',
    'previous_failures': 2
}

# Model predicts the label
prediction = model.predict(new_deployment)
print(prediction)  # Output: **"High Risk"**

The model applied its learned logic:

  • Large deployment (175 files)
  • Production environment
  • Late evening (22:00)
  • Some previous failures → Prediction: High Risk

The Complete Flow

Here’s how the full ML workflow fits together:

flowchart TB
    A[Historical Deployments<br/>with Known Outcomes] --> B[Extract Features<br/>and Labels]
    B --> C[Train Model<br/>Learn Patterns]
    C --> D[Trained Model<br/>Learned Logic]
    
    E[New Deployment<br/>Features Only] --> D
    D --> F[Prediction<br/>Risk Level]
    
    style A fill:#e1f5ff
    style B fill:#fff4e1
    style C fill:#ffe1e1
    style D fill:#90EE90
    style E fill:#e1f5ff
    style F fill:#90EE90

7. Feature Engineering Revisited

Now that we understand features, labels, and models, let’s circle back to feature engineering from Chapter 2.1. This is where you can really help your model out (or shoot yourself in the foot—I’ve done both).

Raw Features vs Engineered Features

Raw features are direct measurements from your data—whatever you collect as-is:

1
2
3
4
5
raw_features = {
    'timestamp': '2026-01-10 22:30:00',
    'files_changed': 175,
    'team_id': 42
}

Engineered features: Derived features that make patterns easier to learn

1
2
3
4
5
6
7
8
engineered_features = {
    'hour': 22,                           # Extracted from timestamp
    'is_late_night': True,                # hour < 6 OR hour > 20
    'is_weekend': False,                  # Derived from timestamp
    'deployment_size': 'Large',           # files_changed > 100
    'team_experience_score': 0.85,        # Looked up from team history
    'files_changed_normalized': 0.7       # Scaled to 0-1 range
}

Why Engineer Features?

Best Practice: Help the model learn faster by making patterns obvious through feature engineering.

Instead of the model learning:

“When hour is less than 6 OR greater than 20, risk increases”

You explicitly create:

is_late_night: True when hour < 6 OR hour > 20

Automation Analogy:

1
2
3
4
5
6
7
8
9
10
# Raw inputs
variable "instance_count" { default = 10 }
variable "instance_vcpus" { default = 4 }

# Engineered values (locals)
locals {
  total_vcpus = var.instance_count * var.instance_vcpus
  needs_reserved_capacity = local.total_vcpus > 100
  estimated_monthly_cost = local.total_vcpus * 50  # $50 per vCPU
}

Automation Analogy: You create derived values to make decisions easier—feature engineering in ML is like creating Terraform locals.

Common Feature Engineering Techniques

TechniqueExamplePurpose
Binningfiles_changed → “Small”/”Medium”/”Large”Simplify continuous values
Encodingenvironment → {dev:0, staging:1, prod:2}Convert categories to numbers
ScalingNormalize files_changed to 0-1 rangeEnsure consistent scales
Interactionfiles_changed * is_prodCombine related features
AggregationAverage of last 10 deploymentsSummarize history
Time-basedExtract hour, day, month from timestampCapture temporal patterns

Want to go deeper? See Chapter 3.4 – Feature Engineering: The Real Work Behind ML for hands-on techniques, practical checklists, and real-world examples.


8. Common Pitfalls and How to Avoid Them

Let me save you some pain by sharing the mistakes I made (so you don’t have to):

1. Using Labels as Features (Data Leakage)

This is embarrassingly easy to do, especially when your dataset has the label right next to the features.

See Example: Data Leakage (Using Labels as Features)
1
2
3
4
features = {
    'files_changed': 150,
    'deployment_succeeded': True   # ← This is the label! Don't use it as a feature!
}

Automation equivalent:

1
2
3
4
# Wrong: Can't use output to compute the output
output "cost" {
  value = aws_instance.server.cost  # This doesn't exist yet
}

2. Including Irrelevant Features

See Example: Irrelevant Features
1
2
3
4
5
features = {
  'files_changed': 150,
  'developer_shoe_size': 10,      # ← Irrelevant
  'commit_hash_length': 40        # ← No predictive power
}

Warning: Noise confuses the model, slows training, and reduces accuracy—avoid irrelevant features.

Automation equivalent: Including unnecessary variables that don’t affect infrastructure

3. Inconsistent Feature Formats

See Example: Inconsistent Feature Formats
1
2
3
4
5
# Training data
features_train = {'environment': 'prod'}

# Production data
features_prod = {'environment': 'PRODUCTION'}  # ← Different format!

Warning: Model doesn’t recognize the feature value—ensure consistent feature formats.

Automation equivalent: Inconsistent variable naming breaks your configurations

4. Missing Feature Values

See Example: Missing Feature Values
1
2
3
4
5
features = {
  'files_changed': 150,
  'environment': None,    # ← Missing value
  'team': ''              # ← Empty string
}

Best Practice: Handle missing values by using defaults, filling with median/mean, creating “is_missing” boolean features, or removing samples with missing critical features.


9. Practical Guidelines

Based on automation engineering principles:

1. Document Your Features

Create a feature dictionary (like variable documentation):

See Example: Feature Dictionary
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Deployment Risk Features

## files_changed
- Type: Numeric (integer)
- Range: 1 to 10,000
- Description: Number of files modified in deployment
- Source: Git diff

## environment
- Type: Categorical
- Values: dev, staging, prod
- Description: Target deployment environment
- Source: CI/CD pipeline variable

## risk_level (Label)
- Type: Categorical
- Values: High, Medium, Low
- Description: Deployment risk assessment
- Source: Post-deployment SRE review

2. Version Features with Models

See Example: Versioned Features
1
2
3
4
5
6
7
model-v1.0/
├── model.pkl
└── features.json       # What features this model expects

model-v2.0/
├── model.pkl
└── features.json       # Added new features, model retrained
1
2
3
terraform {
  required_version = ">= 1.0"
}

3. Validate Features at Runtime

See Example: Feature Validation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
def validate_features(features, expected_schema):
  """
  Validate features before prediction
  Like validating Terraform variables
  """
  required_features = ['files_changed', 'environment', 'team']
  for feature in required_features:
    if feature not in features:
      raise ValueError(f"Missing required feature: {feature}")
  if features['files_changed'] < 0:
    raise ValueError("files_changed must be positive")
  if features['environment'] not in ['dev', 'staging', 'prod']:
    raise ValueError(f"Invalid environment: {features['environment']}")
  return True

4. Monitor Feature Distributions

See Example: Feature Drift Monitoring
1
2
3
4
5
6
7
8
# Track feature distributions over time
# Alert if they drift significantly

current_avg_files_changed = 150
historical_avg_files_changed = 75

if current_avg_files_changed > historical_avg_files_changed * 2:
  alert("Feature distribution drift detected!")

Automation Analogy: Monitoring infrastructure drift from desired state is like monitoring feature distribution drift in ML.


What I Wish I Knew Earlier

Practitioner’s Lessons:

  • Features = Inputs: The measurable stuff you feed into your model.
  • Labels = Outputs: What you’re trying to predict.
  • Models = Learned Logic: Patterns discovered from examples, not hand-coded.
  • Feature engineering matters: Good features amplify signal and avoid leakage.
  • Good features: Available, relevant, reliable, and consistent.
  • Good labels: Clear, consistent, verifiable, and balanced.
  • Models are artifacts: Version, deploy, and monitor them like code.

What’s Next?

Series 2 – Chapter 2.3: Training vs Inference

In the next chapter, we’ll explore:

  • The difference between training and inference (execution)
  • How model training works (learning phase)
  • How model inference works (prediction phase)
  • Comparing to terraform apply vs runtime behavior
  • When and why models need retraining

Architectural Question: What are the practical differences between model training and inference, and how do they impact automation workflows?

We understand the building blocks—now we’ll see how they come to life.


This post is licensed under CC BY 4.0 by the author.

© 2026 Ravi Joshi. Some rights reserved. Except where otherwise noted, the blog posts on this site are licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.