Skip to content

Latest commit

 

History

History
284 lines (225 loc) · 12.8 KB

File metadata and controls

284 lines (225 loc) · 12.8 KB

DevOps Overview


1. Developers vs. Operations (Ops / DevOps / SRE)

Category Developers Operations (Ops / DevOps / SRE)
Primary Role Design, write, test, refactor application and service code Provision, deploy, operate, observe, secure, and scale infrastructure and platforms
Key Responsibilities - Implement features
- Fix defects
- Refactor & improve code quality
- Write tests & documentation
- Optimize performance at code level
- Automate infrastructure provisioning (IaC)
- Deploy releases & manage environments
- Monitor availability, latency, cost, security posture
- Capacity planning & scaling
- Incident response & root cause analysis
Typical Outputs Source code, unit/integration tests, build artifacts, API definitions Infrastructure definitions (Terraform, CloudFormation), deployment pipelines, runbooks, dashboards, alerts
Scope & Scale From small scripts to globally distributed microservices From a single VM stack to multi-region, multi-account cloud platforms
Collaboration Pattern Provide deployable, testable artifacts; supply runtime requirements Provide reliable platforms & feedback loops; enable self-service deployment and observability
Example Activities - Implement REST endpoint
- Optimize algorithm
- Add feature flag
- Create blue/green deployment strategy
- Configure Prometheus alerts
- Harden IAM policies
Example Tools Git, GitHub/GitLab, IDEs (VS Code, IntelliJ, PyCharm), Docker (dev), Test frameworks, Feature flag systems AWS / Azure / GCP, Terraform / Pulumi / CloudFormation, Ansible / Puppet / Chef, Kubernetes / EKS, Argo CD / Flux, Prometheus, Grafana, Datadog, Security scanners, Incident tools (PagerDuty)

Note: Modern high performing teams blur these boundaries; developers own more of the delivery and runtime quality; operations engineers build platforms and guardrails rather than performing every manual deployment.


2. What Is DevOps?

  • DevOps is a cultural philosophy plus a collection of engineering practices that integrate software development and operations to deliver value rapidly, safely, and sustainably.

  • Core dimensions:

    1. Culture & Collaboration (shared goals, reduced friction, psychological safety)
    2. Automation (build, test, provision, deploy, remediation)
    3. Continuous Integration & Continuous Delivery (CI/CD)
    4. Observability & Feedback (metrics, logs, traces, user telemetry)
    5. Lean Flow (small batch sizes, fast feedback loops)
    6. Reliability Engineering (SLOs, error budgets, resilience testing)
    7. Security Shift Left (DevSecOps: integrating security earlier)
    8. Continuous Learning & Improvement (post-incident reviews, experimentation)

3. Silo Mentality

  • Silo mentality is a mindset where teams optimize locally, hoard information, or protect processes, inhibiting organizational learning and speed.
Aspect Description
Causes Misaligned incentives, legacy org charts, unclear ownership, fear of blame, tool fragmentation
Symptoms Slow handoffs; "throw over the wall" deploys; duplicated scripts; inconsistent environments; delayed incident resolution
Impacts Longer lead times, higher change failure rate, brittle releases, shadow tooling, low morale
DevOps Remedies Cross-functional teams, shared KPIs (e.g., DORA metrics), blameless postmortems, platform APIs/self service, documentation as code, inner source repositories
Practical Actions Standardize pipelines, central IaC modules, chatops for transparency, session tagging & access logs for shared visibility, embed Ops into feature teams temporarily

4. Why DevOps?

Pre DevOps / siloed approaches often exhibited:

  • Long, serialized phases (code → wait weeks → test → wait → deploy)
  • Manual, error prone deployments & environment drift
  • Fragile big-bang releases (high blast radius)
  • Limited, delayed feedback (production errors discovered by users)
  • Security & compliance bolted on late
  • Low deployment frequency + high change failure rate

DevOps counters these with:

  • Small, frequent, automated changes
  • Immutable infrastructure & versioned environments
  • Continuous testing & security scanning
  • Observable systems with rapid detection & rollback
  • Shared accountability (you build it, you help run it)
  • Faster mean time to recovery (MTTR)

5. When to Adopt DevOps

Appropriate When (almost ubiquitous today):

  • Rapid iteration is needed (SaaS, mobile backends, data platforms)
  • Microservices/event driven/modular architectures
  • Multi environment consistency (dev/test/stage/prod)
  • Infrastructure elasticity (cloud, containers, serverless)
  • Need to reduce lead time & increase deployment frequency

Special Considerations (NOT blanket "do not use"):

  • Highly regulated, safety critical domains (finance, healthcare, energy) still benefit, but require:
    • Strong change governance as code (policy as code, approvals embedded in pipelines)
    • Segregation of duties implemented via role based & attribute based access controls (not manual silos)
    • Immutable audit trails (CloudTrail, pipeline logs)
    • Automated evidence collection for compliance

Avoid anti pattern: Using "mission critical" as justification to retain manual, brittle processes; automation with proper controls improves reliability and auditability.


6. DevOps Lifecycle (Conceptual Flow)

  • (Plan) → Code → Build → Test → (Secure) → Package → Release → Deploy → Operate → Monitor → (Learn / Improve)
  1. Continuous Development
  2. Continuous Integration
  3. Continuous Testing
  4. Continuous Deployment/Delivery
  5. Continuous Monitoring & Feedback (+ Continuous Security & Governance embedded in each stage)

7. Stage 1: Continuous Development

  • Activities: Planning, requirements sharing, iterative coding, branching strategies (trunk-based or short-lived feature branches).
  • Tools: Git, GitHub/GitLab, Issue trackers (Jira, GitHub Issues), Architecture-as-code diagrams (Mermaid, PlantUML).
  • Practices:
    • Small commits tied to user stories.
    • Feature flags for progressive delivery.
    • Code review automation (lint, static analysis).

Example:

# Trunk-based model with short-lived feature branch
git checkout -b feat/payment-retry

# Implement & test
git commit -m "feat: add idempotent payment retry logic"

git push origin feat/payment-retry
# Open PR -> CI runs tests & static analysis -> Merge quickly

8. Stage 2: Continuous Integration (CI)

  • Purpose:

    • Ensure that new code integrates cleanly with the mainline several times per day.
  • Triggers:

    • Every push/pull request.
  • Automated Steps:

    • Compile/build, unit tests, static code analysis, security scanning (SAST), dependency checks (SCA).
  • Tools:

    • GitHub Actions, Jenkins, GitLab CI, CircleCI, Azure Pipelines.
  • Pipeline Example:

events: push -> build (compile) -> unit tests -> lint -> security scan -> package artifact (container / zip) -> publish to artifact registry
  • Example AWS Relevant:
    • Build container images in CodeBuild or GitHub Actions → push to Amazon ECR → sign with Notation / Cosign.
    • Leverage OIDC for CI to assume AWS roles without static credentials.

9. Stage 3: Continuous Testing

  • Focus: Automated quality gates (unit, integration, contract, performance, security, accessibility).

  • Tools: Selenium / Playwright (UI), TestNG / JUnit (Java), PyTest, k6 / JMeter (performance), OPA / Checkov (policy), Trivy / Grype (image scanning).

  • Best Practices:

    • Shift-left performance & security tests (run earlier on critical paths).
    • Contract tests to prevent breaking dependent services.
    • Ephemeral test environments (provision on demand via IaC, tear down post-run).
  • Example:

# Performance test using k6 (simplified)
k6 run load-test.js

# Fail pipeline if 95th percentile latency > threshold

10. Stage 4: Continuous Deployment/Delivery

  • Continuous Delivery: Code is always in a releasable state; promotions may require an approval gate.
  • Continuous Deployment: Every passing change auto-deploys to production (subject to guardrails).
  • Techniques: Blue/Green, Rolling, Canary, Feature flag rollout, Shadow traffic.
  • IaC & Config: Terraform, CloudFormation, CDK, Pulumi, ensure environment parity.
  • GitOps: Desired state (manifests) stored in Git; controllers (Argo CD, Flux) reconcile cluster state.
Category Tools
Deployment Orchestrators Argo CD, Flux, Spinnaker, Harness, Octopus
Containers & Scheduling Docker, Kubernetes (EKS), ECS
Packaging Helm, Kustomize, OCI Artifacts
Config Management Ansible, Puppet, Chef, Salt
IaC Terraform, Pulumi, CloudFormation, CDK
Release Strategies Flagger (canary), Argo Rollouts

Example (GitOps Flow):

Developer merges -> CI builds & pushes image: myapp:v1.4.2
CI updates deployment manifest image tag in git (pull request)
Argo CD detects change -> syncs to cluster -> progressive rollout (canary 10% -> 50% -> 100%)
Metrics/error rate guard rollback if thresholds exceeded

11. Stage 5: Continuous Monitoring & Feedback

  • Observability Pillars: Metrics, Logs, Traces, Events, Profiling, User experience data (RUM).
  • Goals: Detect anomalies early, measure SLO compliance, feed improvement loops.
  • Tools:
    • Metrics & Alerts: Prometheus, CloudWatch, Datadog, Dynatrace
    • Logs: ELK / OpenSearch, CloudWatch Logs, Splunk
    • Tracing: OpenTelemetry, Jaeger, AWS X-Ray
    • Visualization: Grafana, Kibana
    • Synthetic & RUM: k6, Checkly, Pingdom, New Relic Browser
  • Practices:
    • Define Service Level Indicators (SLIs) & SLOs (e.g., availability, latency p95).
    • Error budgets → govern release velocity vs. reliability focus.
    • Correlate deploy events with performance changes (annotate dashboards).
    • Centralize structured logs (JSON) + trace IDs propagation.

Example Alert Policy (Conceptual):

IF (http_request_error_rate_5m > 2%) AND (deployment_in_progress == true)
THEN trigger canary halt & page on-call

12. Continuous Security/DevSecOps

Phase Security Practices
Plan / Code Threat modeling, secrets scanning, dependency governance
Build Static analysis (SAST), license compliance, artifact signing
Test Dynamic analysis (DAST), fuzzing, IaC security scanning (Checkov, tfsec)
Deploy Policy as code (OPA/Gatekeeper), least privilege IAM roles, supply chain attestations (SLSA)
Operate Runtime security (Falco, AWS GuardDuty), anomaly detection, automated key rotation
Monitor Central SIEM correlation, continuous compliance (Cloud Custodian, AWS Config)

13. Glossary

Term Definition
CI (Continuous Integration) Automating build & test per change merged frequently
CD (Continuous Delivery/Deployment) Keeping software always deployable / auto-deploying each change
GitOps Managing infra & app state declaratively in Git with automated reconciliation
IaC Infrastructure definitions in code enabling versioning & automation
SLI / SLO Indicator & objective of service performance (e.g., latency p95 < 250ms)
Error Budget Allowed amount of unreliability before slowing release pace
Canary Release Deploy to a small subset of users/traffic to validate health
Blue/Green Two production environments; switch traffic with minimal downtime
Observability Ability to infer internal state from external outputs (metrics/logs/traces)
Shift-Left Security Moving security feedback earlier in lifecycle stages

14. Reference Example (Minimal GitHub Actions -> AWS Deployment Skeleton)

name: build-and-deploy
on:
  push:
    branches: [ main ]

jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      id-token: write    # For OIDC federation
      contents: read
    steps:
      - uses: actions/checkout@v4
      - name: Set up Node
        uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm ci && npm test
      - name: Build image
        run: docker build -t ${{ github.sha }} .
      - name: Login to ECR
        run: aws ecr get-login-password --region $AWS_REGION \
             | docker login --username AWS --password-stdin "$ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com"
      - name: Push image
        run: |
          docker tag ${{ github.sha }} $ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/app:${{ github.sha }}
          docker push $ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/app:${{ github.sha }}
      - name: Update manifest (example)
        run: |
          sed -i "s|IMAGE_TAG|${{ github.sha }}|" k8s/deployment.yaml
      - name: Commit manifest (optional PR)
        run: |
          # In GitOps workflows, push changes tothe  manifest repo rather than applying directly
          echo "Create PR to Argo-monitored repo"