3 min read
ItSupport Crew

ItSupport Crew

This project implements an automated IT support system developed as a final course project (TCC). It combines ITSM/ITIL practices with intelligent agents based on Large Language Models (LLMs), simulating a team of three-tier analysts (L1, L2, and L3) and a documentation agent.

Project Overview

The increasing complexity of IT environments and the strong dependency of business operations on systems have made efficient incident management essential. This project proposes a multi-agent architecture based on LLMs to automate the alert handling cycle:

  • Triage (L1): Detects and classifies the affected resource (CPU, memory, or both)
  • Investigation (L2): Identifies resource-intensive processes and evaluates their criticality
  • Intervention (L3): Executes safe remediations, terminating non-critical processes
  • Documentation: Records all actions and results for audit purposes

The solution integrates with Prometheus to receive alerts and triggers response workflows via the CrewAI framework.

Architecture

All services are orchestrated through Docker Compose:

  • user-service: Main Flask microservice (port 5001)
  • db: PostgreSQL database
  • prometheus: Collects metrics and triggers alerts (port 9090)
  • grafana: Visualization dashboards (port 3000)
  • chaos_monkey: Generates random CPU failures
  • failure-controller: Orchestrates failure simulations
  • it-support-crew: Flask service that exposes endpoints to initiate the multi-agent workflow

The multi-agent system manages the complete incident workflow, from detecting alerts to executing corrective actions and documenting the entire process.

Key Results

  • Significant reduction in incident response time
  • Precise and automated execution of corrective actions
  • Improved quality and agility in technical documentation
  • Validated by industry experts, this architecture demonstrates the potential of AI and LLMs to automate repetitive IT tasks

For detailed implementation, configuration options, and execution instructions, please visit the GitHub repository.

Technologies and Concepts

  • Python
  • CrewAI
  • Large Language Models (LLMs)
  • Natural Language Processing (NLP)
  • Docker and Docker Compose
  • Prometheus and Grafana
  • ITSM/ITIL frameworks