ansible-playbooks/roles/valkey/README.md
2025-12-23 09:08:43 +01:00

17 KiB

Redis/Valkey Role

This Ansible role installs and configures Valkey (a Redis fork) for local use only. It provides a shared Redis-compatible instance that multiple services can use as a cache or message broker.

The role also performs required kernel tuning for optimal Valkey performance.

About Valkey

Valkey is a high-performance key/value datastore and a drop-in replacement for Redis. It was created as a community-driven fork after Redis changed its license from BSD to proprietary licenses (RSALv2 and SSPLv1) in March 2024.

Key points:

  • Valkey is 100% API-compatible with Redis
  • Backed by the Linux Foundation
  • Uses permissive open-source license (BSD 3-Clause)
  • No code changes needed in your applications
  • Same commands, same protocol, same performance

Distribution support:

  • Arch Linux: Installs Valkey (redis package replaced in April 2024)
  • Debian/Ubuntu: Installs Valkey from official repositories

Features

  • Installs Redis/Valkey
  • Local-only access (localhost)
  • Configurable memory limits and eviction policies
  • Persistence enabled
  • Systemd integration
  • Automatic kernel tuning (memory overcommit, THP)
  • ACL-based user authentication
  • Firewall configuration (UFW)

Requirements

  • Systemd-based Linux distribution
  • Root/sudo access
  • ansible.posix collection (for sysctl module)
  • community.general collection (for ufw module)

Role Variables

See defaults/main.yml for all available variables and their default values.

Key Configuration Requirements

Required Password

The valkey_admin_password variable must be set in your inventory (min 12 characters). The role will fail if not set.

ACL Users

Service users must be registered via the valkey_acl_users list. See the ACL Configuration Guide section below for details.

Container Access

Valkey binds to 127.0.0.1 by default (secure, localhost-only).

Containers can reach Valkey via Pasta's --map-host-loopback feature, which routes container's 127.0.0.1 to the host's 127.0.0.1.

In docker-compose files, use:

extra_hosts:
    - "host.containers.internal:127.0.0.1"

No additional bind addresses needed!

System Requirements: This role automatically configures kernel parameters (vm.overcommit_memory=1) and transparent hugepage settings

Dependencies

None.

Example Playbook

---
- hosts: servers
  become: true
  roles:
      - role: valkey
      - role: immich # Will connect to system Valkey

Custom Configuration with ACL Users

---
- hosts: servers
  become: true
  roles:
      - role: valkey
        vars:
            valkey_admin_password: "{{ vault_valkey_password }}"
            valkey_maxmemory: 512mb
            valkey_maxmemory_policy: volatile-lru
            valkey_acl_users:
                - username: immich
                  password: "{{ immich_valkey_password }}"
                  keypattern: "immich_bull* immich_channel*"
                  commands: "&* -@dangerous +@read +@write +@pubsub +select +auth +ping +info +eval +evalsha"
                - username: nextcloud
                  password: "{{ nextcloud_valkey_password }}"
                  keypattern: "nextcloud*"
                  commands: "+@read +@write -@dangerous +auth +ping +info"

How Services Connect

Services running on the same host can connect to Valkey at:

  • Host: localhost or 127.0.0.1
  • Port: 6379 (default)

From Containers

Containers need special handling to reach the host's Valkey:

Use host.containers.internal:

REDIS_HOSTNAME: host.containers.internal
REDIS_PORT: 6379

This special DNS name resolves to the host machine from inside containers.

Note: Environment variables often still use REDIS_* naming for compatibility, since Valkey is API-compatible with Redis.

Security

  • Local-only: Valkey binds to 127.0.0.1 only (configurable for container access)
  • Protected mode: Enabled
  • ACL authentication: Each service gets its own user with restricted permissions
  • No remote access: Cannot be reached from network by default

ACL-Based Authentication

This role uses Valkey's ACL (Access Control List) system for fine-grained security. Each service gets:

  • Dedicated credentials: Unique username and password
  • Key pattern restrictions: Can only access specific key patterns
  • Command restrictions: Limited to required commands only
  • Defense-in-depth: Multiple layers of isolation

Configuring ACL Users

Define ACL users in your inventory or host_vars:

# inventory/host_vars/yourserver.yml
valkey_admin_password: "your-strong-admin-password"

valkey_acl_users:
    - username: immich
      password: "{{ immich_valkey_password }}"
      keypattern: "immich_bull* immich_channel*"
      commands: "&* -@dangerous +@read +@write +@pubsub +select +auth +ping +info +eval +evalsha"

    - username: nextcloud
      password: "{{ nextcloud_valkey_password }}"
      keypattern: "nextcloud*"
      commands: "+@read +@write -@dangerous +auth +ping +info"

    - username: gitea
      password: "{{ gitea_valkey_password }}"
      keypattern: "gitea*"
      commands: "+@read +@write -@dangerous +auth +ping +info +select"

ACL Configuration Guide

Key Pattern (keypattern):

  • Single pattern: "myservice*" - matches keys starting with myservice
  • Multiple patterns: "pattern1* pattern2*" - space-separated (automatically converted to ~pattern1* ~pattern2* in ACL file)
  • All keys: "*" - not recommended for security

Note: In the inventory, specify patterns as space-separated strings. The Ansible template automatically adds the ~ prefix to each pattern when generating the ACL file.

Kernel Tuning

The role automatically configures kernel parameters required by Valkey (see tasks/kernel-tuning.yml):

1. Memory Overcommit:

vm.overcommit_memory = 1
  • Required for background saves and replication
  • Configured via /etc/sysctl.conf
  • Applied immediately and persists across reboots

2. Transparent Huge Pages (THP):

transparent_hugepage=madvise
  • Reduces latency and memory usage issues
  • Safely appended to existing GRUB kernel parameters (does not overwrite)
  • Only adds parameter if transparent_hugepage= is not already present
  • Applied at runtime immediately via /sys/kernel/mm/transparent_hugepage/enabled
  • Persists across reboots via /etc/default/grub
  • Automatically detects and uses update-grub (Debian) or grub-mkconfig (Arch)

These settings are required to eliminate Valkey startup warnings and ensure optimal performance.

Note: The role preserves existing GRUB parameters. If you have GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 quiet", it will become GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 quiet transparent_hugepage=madvise".

Commands (commands):

  • &* - Allow all pub/sub channels (required for job queues like BullMQ)
  • +allchannels - Alternative to &*
  • +@read - Allow all read commands (GET, MGET, etc.)
  • +@write - Allow all write commands (SET, DEL, etc.)
  • +@pubsub - Allow pub/sub commands (SUBSCRIBE, PUBLISH, etc.)
  • -@dangerous - Deny dangerous commands (FLUSHDB, FLUSHALL, KEYS, CONFIG, etc.)
  • +commandname - Allow specific command (e.g., +select, +auth, +ping)
  • -commandname - Deny specific command

Common Command Sets:

Service Type Recommended Commands
Simple cache +@read +@write -@dangerous +auth +ping +info
Session store +@read +@write -@dangerous +auth +ping +info +select
Job queue (BullMQ) &* -@dangerous +@read +@write +@pubsub +select +auth +ping +info +eval +evalsha
Pub/sub +@pubsub +@read +@write -@dangerous +auth +ping +info

Security Best Practices:

  • Always include -@dangerous to prevent accidental data loss
  • Use specific key patterns to isolate services
  • Only grant +eval and +evalsha if required (job queues)
  • Only grant &* or +allchannels if using pub/sub
  • Use unique passwords for each service

Setting Secure Passwords

Use Ansible Vault to encrypt all passwords:

# Admin password
ansible-vault encrypt_string 'your-strong-admin-password' --name 'valkey_admin_password'

# Service passwords
ansible-vault encrypt_string 'immich-password' --name 'immich_valkey_password'
ansible-vault encrypt_string 'nextcloud-password' --name 'nextcloud_valkey_password'

Add encrypted values to your inventory:

valkey_admin_password: !vault |
    $ANSIBLE_VAULT;1.1;AES256
    ...

immich_valkey_password: !vault |
    $ANSIBLE_VAULT;1.1;AES256
    ...

Service Management

# Check status
systemctl status redis        # Debian/Ubuntu
systemctl status valkey       # Arch Linux

# Restart service
systemctl restart redis       # Debian/Ubuntu
systemctl restart valkey      # Arch Linux

# View logs
journalctl -u redis -f        # Debian/Ubuntu
journalctl -u valkey -f       # Arch Linux

# Connect to CLI (both systems have redis-cli compatibility)
redis-cli                     # Works on both
valkey-cli                    # Also available on Arch Linux

Persistence

Valkey is configured with RDB persistence:

  • Save after 900 seconds if at least 1 key changed
  • Save after 300 seconds if at least 10 keys changed
  • Save after 60 seconds if at least 10000 keys changed

Data is stored in {{ valkey_dir }} (default: /var/lib/valkey)

Memory Management

When valkey_maxmemory is reached, Valkey will behave based on valkey_maxmemory_policy:

  • noeviction: Return errors when memory limit is reached (default, recommended for BullMQ/job queues)
  • allkeys-lru: Evict least recently used keys (good for pure caching)
  • volatile-lru: Evict LRU keys with TTL set
  • allkeys-random: Evict random keys
  • volatile-random: Evict random keys with TTL

Important for Immich and BullMQ: Services using BullMQ for job queues (like Immich) require noeviction policy. Evicting job queue data can cause:

  • Lost background tasks
  • Failed job processing
  • Data corruption

Only use eviction policies (allkeys-lru, etc.) for pure caching use cases where data loss is acceptable.

Monitoring

Check Valkey info (authenticate as admin):

redis-cli
AUTH default <valkey_admin_password>
INFO
INFO memory
INFO stats

Check connected clients:

redis-cli
AUTH default <valkey_admin_password>
CLIENT LIST

View ACL configuration:

redis-cli
AUTH default <valkey_admin_password>
ACL LIST                    # List all users
ACL GETUSER immich          # View specific user permissions
ACL GETUSER default         # View admin user
ACL CAT                     # List all command categories

Check generated ACL file:

cat /etc/valkey/users.acl
# Example output:
# user default on >password ~* &* +@all
# user immich on >password ~immich_bull* ~immich_channel* &* -@dangerous +@read ...
# Note: Multiple patterns appear as separate ~pattern entries

Troubleshooting

Check if Valkey is running

systemctl status valkey         # Arch Linux
systemctl status valkey-server  # Debian/Ubuntu

Test admin connection

# With authentication (default user)
redis-cli
AUTH default <valkey_admin_password>
PING
# Should return: PONG

Test service user connection

# Test Immich user
redis-cli
AUTH immich <immich_valkey_password>
SELECT 0
PING
# Should return: PONG

# Try restricted command (should fail)
FLUSHDB
# Should return: (error) NOPERM This user has no permissions to run the 'flushdb' command

View ACL configuration

# Check ACL file
cat /etc/valkey/users.acl

# Check runtime ACL
redis-cli
AUTH default <valkey_admin_password>
ACL LIST
ACL GETUSER immich

Debug permission issues

# Monitor all commands (useful for debugging)
redis-cli
AUTH default <valkey_admin_password>
MONITOR
# In another terminal, run your application
# You'll see all commands being executed

View configuration

redis-cli
AUTH default <valkey_admin_password>
CONFIG GET "*"

Check memory usage

redis-cli
AUTH default <valkey_admin_password>
INFO memory

Common ACL Errors

"NOAUTH Authentication required"

  • Client didn't authenticate
  • Service needs to set REDIS_USERNAME and REDIS_PASSWORD

"WRONGPASS invalid username-password pair"

  • Incorrect username or password
  • Verify ACL user exists: ACL GETUSER username
  • Check password in inventory matches service configuration

"NOPERM No permissions to run the 'command' command"

  • Command not allowed in ACL
  • Check ACL: ACL GETUSER username
  • Add required command to commands: in inventory

"NOPERM No permissions to access a key"

  • Key doesn't match allowed patterns
  • Check key pattern: ACL GETUSER username
  • Verify service is using correct key prefix

"NOPERM No permissions to access a channel"

  • Pub/sub channel not allowed
  • Add &* or +allchannels to ACL commands
  • Required for BullMQ and other job queues

Performance Tuning

For high-traffic services, consider:

valkey_maxmemory: 1gb # Increase memory limit
valkey_maxmemory_policy: noeviction # No eviction (for job queues)
# Or for pure caching:
# valkey_maxmemory_policy: allkeys-lru  # LRU eviction

Kernel Tuning (automatically configured): The role automatically sets optimal kernel parameters:

  • Memory overcommit enabled (vm.overcommit_memory=1)
  • Transparent Huge Pages set to madvise

To verify kernel settings:

# Check memory overcommit
sysctl vm.overcommit_memory
# Should show: vm.overcommit_memory = 1

# Check THP status
cat /sys/kernel/mm/transparent_hugepage/enabled
# Should show: always [madvise] never

License

MIT

Author Information

Created for managing shared Valkey instances in NAS/homelab environments.

Multi-Layer Isolation Strategy

This role implements defense-in-depth with three isolation layers:

1. ACL Users (Primary Isolation)

Each service gets its own user with restricted permissions:

  • Unique credentials
  • Key pattern restrictions
  • Command restrictions

2. Database Numbers (Secondary Isolation)

Valkey provides 16 logical databases (0-15) for additional isolation:

Service Database Key Pattern ACL User
Immich 0 immich_bull* immich_channel* immich
Nextcloud 1 nextcloud* nextcloud
Gitea 2 gitea* gitea
Grafana 3 grafana* grafana
Custom 4-15 Custom Custom

3. Key Prefixes (Tertiary Isolation)

Services use unique key prefixes enforced by ACL patterns.

Testing Isolation

# Test as Immich user (database 0)
redis-cli
AUTH immich <immich_valkey_password>
SELECT 0
SET immich_bull_test "data"
# Success

# Try to access other service's keys (should fail)
GET nextcloud_test
# Success (key doesn't exist, not a permission error)
# But ACL prevents SET on non-matching patterns:
SET nextcloud_test "data"
# Error: NOPERM No permissions to access a key

# Try dangerous command (should fail)
FLUSHDB
# Error: NOPERM This user has no permissions to run the 'flushdb' command

Complete Example Configuration

# inventory/host_vars/myserver.yml
valkey_admin_password: "{{ vault_valkey_admin_password }}"

valkey_acl_users:
    # Immich - Photo management (needs BullMQ job queue)
    - username: immich
      password: "{{ vault_immich_valkey_password }}"
      keypattern: "immich_bull* immich_channel*"
      commands: "&* -@dangerous +@read +@write +@pubsub +select +auth +ping +info +eval +evalsha"

    # Nextcloud - Simple caching
    - username: nextcloud
      password: "{{ vault_nextcloud_valkey_password }}"
      keypattern: "nextcloud*"
      commands: "+@read +@write -@dangerous +auth +ping +info +select"

    # Gitea - Session store
    - username: gitea
      password: "{{ vault_gitea_valkey_password }}"
      keypattern: "gitea*"
      commands: "+@read +@write -@dangerous +auth +ping +info +select"

# Service variables
immich_valkey_db: 0
nextcloud_valkey_db: 1
gitea_valkey_db: 2

Best Practices

  • ACL first: Always use ACL users with key pattern restrictions
  • Database numbers: Use for additional logical separation
  • Key prefixes: Enforce via ACL patterns, not trust
  • Document: Keep a table of service assignments
  • Testing: Reserve database 15 for testing/debugging
  • Monitor: Use MONITOR to verify services stay within their patterns