Table of contents

Open Table of contents

Why leave Cubox

In an age of information overload, a good web clipper is a must for digital hoarders. I used Cubox for a long time, but several issues pushed me to self‑host Hoarder instead.

  • Privacy: The mainland edition of Cubox applies content controls. Some clipped pages could not be shared due to “force majeure”. Keeping a local copy on my NAS is more trustworthy.

    censorship of Cubox

  • Pricing: The free tier is limited to 200 items; VIP costs ¥198/year. If you already have a NAS, self‑hosting is cheaper over time.

    price-of-cubox

  • No need for extra fluff: Cubox kept adding features (AI summaries, etc.) that I don’t need. I use Readwise for reading; I want a reliable “cold storage” for web pages.

Self-Hosting Installation Guide

Hoarder is an open-source project available on GitHub. For detailed installation instructions on Linux machines or Synology systems, consult the official documentation or follow the comprehensive tutorial from NasDaddy.

I highly recommend using Docker for installation. Here’s a reference Docker Compose YAML configuration:

version: "3.8"
services:
  web:
    image: ghcr.io/hoarder-app/hoarder-web:${HOARDER_VERSION:-release}
    restart: unless-stopped
    volumes:
      - data:/data
    ports:
      - 3000:3000 # change to any port you prefer
    env_file:
      - .env
    environment:
      REDIS_HOST: redis
      MEILI_ADDR: http://meilisearch:7700
      DATA_DIR: /data
  redis:
    image: redis:7.2-alpine
    restart: unless-stopped
    volumes:
      - redis:/data
  chrome:
    image: gcr.io/zenika-hub/alpine-chrome:123
    restart: unless-stopped
    command:
      - --no-sandbox
      - --disable-gpu
      - --disable-dev-shm-usage
      - --remote-debugging-address=0.0.0.0
      - --remote-debugging-port=9222
      - --hide-scrollbars
  meilisearch:
    image: getmeili/meilisearch:v1.6
    restart: unless-stopped
    env_file:
      - .env
    environment:
      MEILI_NO_ANALYTICS: "true"
    volumes:
      - meilisearch:/meili_data
  workers:
    image: ghcr.io/hoarder-app/hoarder-workers:${HOARDER_VERSION:-release}
    restart: unless-stopped
    volumes:
      - data:/data
    env_file:
      - .env
    environment:
      REDIS_HOST: redis
      MEILI_ADDR: http://meilisearch:7700
      BROWSER_WEB_URL: http://chrome:9222
      DATA_DIR: /data
      # OPENAI_API_KEY: ...
    depends_on:
      web:
        condition: service_started

volumes:
  redis:
  meilisearch:
  data:

Create a .env file yourself (compose won’t generate it) and restart the stack whenever you change values:

HOARDER_VERSION=release
NEXTAUTH_SECRET=xxxx # random string
MEILI_MASTER_KEY=xxxx # random string
NEXTAUTH_URL=http://localhost:3000 # local URL of Hoarder or your reverse-proxy URL

## Optional below
OPENAI_BASE_URL=https://xxx.com/v1 # OpenAI official endpoint or third-party compatible endpoint
OPENAI_API_KEY=sk-xxxxx # OpenAI API key
INFERENCE_LANG=chinese
INFERENCE_TEXT_MODEL=qwen2-72b-instruct # model used for auto-tagging; qwen2-72b-instruct works great in my setup

What I like

AI‑assisted tagging

Great for hands‑off clipping — I don’t need to decide tags each time, but still get useful categorization for later search.

AI-Mark-of-Hoard

What could be better

Proprietary snapshot format

Snapshots are stored in .db files. I’d love an option to keep an .html snapshot or a .png image for maximal portability.

dbformat-of-Hoarder

Some sites are hard to capture

Hoarder drives a headless Chrome to capture snapshots. Some blogs behind Cloudflare’s bot checks may block it (e.g. Sukka’s post):

cf-block-Hoarder-eg

References

  1. Hoarder on GitHub
  2. NasDaddy — how to run Hoarder on your NAS