Skip to main content

Agents and Environments

The fundamental concepts of Reinforcement Learning revolve around the interaction between agents and environments. Here you'll learn the key definitions.

๐Ÿค– Agentโ€‹

An agent is the entity that learns and makes decisions in an environment. In MLVisual, our agents are the ants that learn to collect food.

Characteristics of an Agentโ€‹

  • Learns through experience
  • Makes decisions based on current state
  • Receives rewards for its actions
  • Improves performance over time

Example in Ants Sagaโ€‹

In our project, the ant is the agent that must:

  • Decide which direction to move
  • Learn to avoid obstacles
  • Maximize food collection

๐ŸŒ Environmentโ€‹

The environment is the world in which the agent interacts. It includes everything the agent can observe and interact with.

Characteristics of an Environmentโ€‹

  • Provides states - Information about the current situation
  • Receives actions - From the agent
  • Returns rewards - Feedback on action quality
  • Transitions - Changes state based on actions

Example in Ants Sagaโ€‹

Our environment includes:

  • Map layout - Walls, food, obstacles
  • Physics - Movement, collisions
  • Rules - How the world behaves

๐Ÿ“Š Statesโ€‹

A state is a snapshot of the environment at a specific moment. It contains all the information the agent needs to make decisions.

State Representationโ€‹

  • Discrete - Finite set of possible states
  • Continuous - Infinite set of possible states
  • Partial - Agent can't see everything
  • Full - Agent can see everything

Example in Ants Sagaโ€‹

state = {
'ant_position': (x, y),
'food_positions': [(x1, y1), (x2, y2), ...],
'obstacle_positions': [(x1, y1), (x2, y2), ...],
'ant_health': 100,
'food_collected': 5
}

๐ŸŽฎ Actionsโ€‹

An action is a decision the agent can make. The set of all possible actions is called the action space.

Action Typesโ€‹

  • Discrete - Finite set of actions (left, right, up, down)
  • Continuous - Infinite set of actions (move 0.5 units left)
  • Multi-dimensional - Multiple actions at once

Example in Ants Sagaโ€‹

actions = {
'move_up': (0, 1),
'move_down': (0, -1),
'move_left': (-1, 0),
'move_right': (1, 0),
'stay': (0, 0)
}

๐Ÿ”„ Agent-Environment Loopโ€‹

The learning process follows this cycle:

  1. Agent observes the current state
  2. Agent chooses an action based on its policy
  3. Environment receives the action
  4. Environment transitions to a new state
  5. Environment provides a reward
  6. Agent updates its policy based on the reward
  7. Repeat until learning is complete

๐ŸŽฏ Learning Objectivesโ€‹

Goalโ€‹

The agent learns to maximize cumulative reward over time.

Methodsโ€‹

  • Value-based - Learn value functions (Q-learning)
  • Policy-based - Learn policies directly (REINFORCE)
  • Actor-Critic - Combine both approaches

๐Ÿ“š Further Readingโ€‹