For the past few weeks, devices on my home network had been intermittently failing to get IP addresses. My Steam Deck wouldn’t connect. New devices would spin for ages before giving up. Sometimes things would work, sometimes they wouldn’t, with no obvious pattern.
Today I finally sat down to figure out what was going on.
The symptoms
The DHCP server (a UniFi Dream Machine Pro) was handing out addresses, but clients kept rejecting them. The Steam Deck was the most visible casualty - it would request an IP, get one, then immediately decline it. Over and over.
The frustrating part was that the UniFi console showed everything looking healthy - 68 out of 249 IPs leased, with 178 available. On paper, there was plenty of address space. But devices still couldn’t connect.
Capturing the traffic
I SSH’d into the UDM Pro and grabbed a packet capture on the bridge interface:
tcpdump -npi br0 -w /tmp/dhcp-issue.pcap
Then pulled it down to my Mac:
scp root@192.168.1.1:/tmp/dhcp-issue.pcap .
Since .pcap files are binary, you can’t just grep them. I used tcpdump to read the capture, filtering for DHCP traffic (UDP ports 67 and 68):
tcpdump -r dhcp-issue.pcap -vvv -n 'udp port 67 or udp port 68'
What the packets showed
Two devices were stuck in DHCP failure loops.
The Proxmox Backup Server (bc:24:11:xx:xx:xx) was cycling through the full DHCP handshake - Discover, Offer, Request, ACK - and then immediately sending a Decline. The router would offer .39, the server would decline. Then .43, declined. Then .44, declined. Every single address it was given, it rejected.
The Steam Deck was doing the same thing with .235, declining with the message "acd failed" - Address Conflict Detection was finding that someone else already owned every IP it was offered.
The ARP clue
I pulled the ARP traffic from the capture and found the smoking gun. One MAC address - bc:24:11:xx:xx:xx, the Proxmox Backup Server - was responding to ARP requests for dozens of IPs across the subnet. The router would ask “who has 192.168.1.95?” and the Proxmox box would reply “that’s me.” Same for .96, .97, .104, .137, .188, and scores of others.
A single device claiming to own that many addresses on a /24 network is not normal.
Finding the root cause
My first thought was proxy ARP - a setting where a device answers ARP on behalf of other hosts. I checked on the Proxmox container:
cat /proc/sys/net/ipv4/conf/*/proxy_arp
All zeros. Proxy ARP was already off.
So I checked what IPs the container actually had assigned:
ip -4 addr show
The output was enormous. The container’s eth0 interface had over 130 IP addresses assigned to it, all dynamic DHCP leases stacked as secondary addresses. It had .7, .8, .9, .11 through .19, .22, .24, .26… practically the entire subnet.
What happened
The Proxmox Backup Server is an LXC container (Container 105) set up via a community script. Its network interface was configured to use DHCP. But the DHCP client inside the container was acquiring new leases without ever releasing old ones. Each lease got added as a secondary address on eth0.
This created a feedback loop:
- Container requests a DHCP lease
- Router offers an IP
- Container runs Address Conflict Detection (ARP probe) before accepting
- The container’s own interface already holds that IP (or a neighbouring device’s ACD check gets answered by the container), so ACD fails
- Container declines the address
- Router offers a different IP
- Container eventually accepts one, adding it as yet another secondary address
- Repeat forever, accumulating more and more addresses
Over weeks, the container had silently consumed nearly the entire DHCP pool. Every other device on the network was competing for the handful of addresses the container hadn’t already claimed.
The fix
Since the container’s filesystem was read-only from inside (as LXC containers often are), I couldn’t fix it from within. Instead, I made the change from the Proxmox VE host:
# Stop the container
pct stop 105
# Switch from DHCP to a static IP
pct set 105 -net0 name=eth0,bridge=vmbr0,ip=192.168.1.39/24,gw=192.168.1.1
# Start it back up
pct start 105
That’s it. The container came back with a single static IP, the DHCP client never started, and all 130 stale leases were gone.
The leases on the UDM Pro had a 2-hour TTL, so within a couple of hours the entire pool freed up naturally. The Steam Deck connected on its next attempt.
Lessons
Backup servers should have static IPs. Any infrastructure service that other devices depend on (or that runs 24/7) should not be relying on DHCP. If I’d set a static IP from the start, this never would have happened.
tcpdump is still the best debugging tool for network issues. The symptoms were vague - “devices sometimes can’t connect” - but a packet capture told the whole story in seconds. If you have a network problem you can’t explain, capture the traffic first.
ARP doesn’t lie. When I saw one MAC address answering for half the subnet, the mystery was over. Everything else was just confirming the mechanism.