Elasticsearch in un'applicazione web Node.js con Docker e Docker Compose

Questo articolo mostra un percorso completo, pragmatico e riproducibile per integrare Elasticsearch in una applicazione web in Node.js, usando Docker e Docker Compose. L'obiettivo è avere un ambiente di sviluppo locale solido, un'architettura pulita nel codice, e una base pronta per passare in staging/produzione.

Prerequisiti

Docker e Docker Compose installati
Node.js (consigliato LTS) e npm

Scenario di esempio

Costruiremo una piccola API web (Express) che indicizza e ricerca documenti (ad esempio prodotti o articoli), con endpoints per:

creare un indice e la sua mappatura
indicizzare documenti
ricercare con filtri, ordinamento e paginazione
implementare un autocomplete (suggest/edge n-gram)

Struttura del progetto

Una struttura possibile:

.
├─ docker-compose.yml
├─ .env
├─ app
│  ├─ package.json
│  ├─ package-lock.json
│  ├─ Dockerfile
│  └─ src
│     ├─ server.js
│     ├─ elastic.js
│     ├─ indices.js
│     └─ routes.js
└─ README.md

Docker Compose: Elasticsearch + Node.js

Per sviluppo locale conviene avere almeno due servizi: elasticsearch e app. In Elasticsearch 8.x la sicurezza è abilitata di default; per un ambiente di sviluppo possiamo scegliere due approcci:

Dev semplice (senza sicurezza): comodo e veloce, non adatto alla produzione.
Dev realistico (con password): un passo in più, ma più vicino a staging/prod.

Qui useremo l'approccio “dev realistico” con password (senza TLS, per semplicità locale). Per ambienti esposti o condivisi, abilita TLS e gestisci i certificati.

File `.env`

Mettiamo le variabili in un file .env in root (non committare credenziali reali).

# .env
ELASTIC_PASSWORD=changeme
ELASTIC_USERNAME=elastic
ELASTICSEARCH_URL=http://elasticsearch:9200
APP_PORT=3000

File `docker-compose.yml`

services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.13.4
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=true
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
      - xpack.security.http.ssl.enabled=false
      - ES_JAVA_OPTS=-Xms1g -Xmx1g
    ports:
      - "9200:9200"
    volumes:
      - esdata:/usr/share/elasticsearch/data
    healthcheck:
      test: ["CMD-SHELL", "curl -fsS http://localhost:9200 >/dev/null || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 30

  app:
    build:
      context: ./app
    environment:
      - NODE_ENV=development
      - APP_PORT=${APP_PORT}
      - ELASTICSEARCH_URL=${ELASTICSEARCH_URL}
      - ELASTIC_USERNAME=${ELASTIC_USERNAME}
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
    ports:
      - "${APP_PORT}:3000"
    depends_on:
      elasticsearch:
        condition: service_healthy
    volumes:
      - ./app:/usr/src/app
      - /usr/src/app/node_modules
    command: ["npm", "run", "dev"]

volumes:
  esdata:

Note importanti:

healthcheck e depends_on.condition aiutano a evitare che l'app parta prima che Elasticsearch sia raggiungibile.
volumes su esdata conserva l'indice tra i riavvii (utile in dev).
La memoria JVM è impostata con ES_JAVA_OPTS. Se hai poca RAM, riduci a 512m/512m.

Dockerfile dell'app Node.js

Un Dockerfile semplice per un progetto Node.js (con hot reload in sviluppo tramite volume).

# app/Dockerfile
FROM node:20-alpine

WORKDIR /usr/src/app

COPY package*.json ./
RUN npm ci

COPY . .

EXPOSE 3000
CMD ["npm", "start"]

Dipendenze Node.js

Usiamo Express per l'API e il client ufficiale Elasticsearch per Node.js.

{
  "name": "node-elasticsearch-docker",
  "version": "1.0.0",
  "type": "commonjs",
  "scripts": {
    "start": "node src/server.js",
    "dev": "node --watch src/server.js"
  },
  "dependencies": {
    "@elastic/elasticsearch": "^8.13.0",
    "express": "^4.19.2"
  }
}

Se preferisci TypeScript, aggiungi typescript, ts-node e definisci i tipi; la logica di integrazione con Elasticsearch resta simile.

Connessione a Elasticsearch: client riusabile

Centralizza la creazione del client in un modulo dedicato. In questo modo:

eviti di istanziare più client inutilmente
hai un solo punto per configurare autenticazione, retry, logging
puoi aggiungere facilmente metriche e tracing

// app/src/elastic.js
const { Client } = require("@elastic/elasticsearch");

const node = process.env.ELASTICSEARCH_URL || "http://localhost:9200";
const username = process.env.ELASTIC_USERNAME || "elastic";
const password = process.env.ELASTIC_PASSWORD || "changeme";

const client = new Client({
  node,
  auth: { username, password },
  // In produzione valuta: sniffOnStart, compression, tls, ecc.
  maxRetries: 5,
  requestTimeout: 30_000
});

async function ping() {
  // Nota: in 8.x puoi usare client.ping() o client.info()
  await client.ping();
}

module.exports = { client, ping };

Creare indice e mappatura

La mappatura definisce tipi e analizzatori. Esempio di indice products con:

campi testuali ricercabili (name, description)
campi strutturati (price, category, tags)
campo per autocomplete (name_suggest) via edge n-gram

// app/src/indices.js
const { client } = require("./elastic");

const INDEX = "products";

async function ensureIndex() {
  const exists = await client.indices.exists({ index: INDEX });

  if (exists) return { index: INDEX, created: false };

  await client.indices.create({
    index: INDEX,
    settings: {
      analysis: {
        filter: {
          autocomplete_filter: {
            type: "edge_ngram",
            min_gram: 2,
            max_gram: 20
          }
        },
        analyzer: {
          autocomplete: {
            type: "custom",
            tokenizer: "standard",
            filter: ["lowercase", "autocomplete_filter"]
          },
          autocomplete_search: {
            type: "custom",
            tokenizer: "standard",
            filter: ["lowercase"]
          }
        }
      }
    },
    mappings: {
      properties: {
        id: { type: "keyword" },
        name: {
          type: "text",
          fields: {
            keyword: { type: "keyword", ignore_above: 256 }
          }
        },
        description: { type: "text" },
        category: { type: "keyword" },
        tags: { type: "keyword" },
        price: { type: "double" },
        createdAt: { type: "date" },

        // Campo dedicato all'autocomplete
        name_suggest: {
          type: "text",
          analyzer: "autocomplete",
          search_analyzer: "autocomplete_search"
        }
      }
    }
  });

  return { index: INDEX, created: true };
}

module.exports = { INDEX, ensureIndex };

Perché name.keyword? Serve quando vuoi:

ordinare alfabeticamente in modo coerente
fare aggregazioni per “nome esatto”
usare filtri exact match

API web con Express

Creiamo endpoints minimi e chiari. Prima il server e il wiring delle rotte.

// app/src/server.js
const express = require("express");
const { ping } = require("./elastic");
const { router } = require("./routes");
const { ensureIndex } = require("./indices");

const app = express();
app.use(express.json());

app.get("/health", async (req, res) => {
  try {
    await ping();
    res.json({ ok: true });
  } catch (err) {
    res.status(503).json({ ok: false, error: err.message });
  }
});

app.use("/api", router);

const port = Number(process.env.APP_PORT || 3000);

(async () => {
  // Inizializzazione all'avvio: crea indice se manca
  await ensureIndex();

  app.listen(port, () => {
    console.log(`API in ascolto su http://localhost:${port}`);
  });
})();

Rotte: indicizzazione e ricerca

// app/src/routes.js
const express = require("express");
const { client } = require("./elastic");
const { INDEX, ensureIndex } = require("./indices");

const router = express.Router();

// Crea indice (o verifica esistenza)
router.post("/setup", async (req, res) => {
  const info = await ensureIndex();
  res.json(info);
});

// Indicizza un singolo documento
router.post("/products", async (req, res) => {
  const doc = req.body;

  if (!doc || !doc.id || !doc.name) {
    return res.status(400).json({ error: "Campi richiesti: id, name" });
  }

  const indexed = await client.index({
    index: INDEX,
    id: String(doc.id),
    document: {
      ...doc,
      name_suggest: doc.name,
      createdAt: doc.createdAt || new Date().toISOString()
    },
    refresh: "wait_for"
  });

  res.status(201).json({ result: indexed.result, id: doc.id });
});

// Ricerca full-text + filtri + paginazione
router.get("/products/search", async (req, res) => {
  const q = String(req.query.q || "").trim();
  const category = req.query.category ? String(req.query.category) : null;
  const minPrice = req.query.minPrice ? Number(req.query.minPrice) : null;
  const maxPrice = req.query.maxPrice ? Number(req.query.maxPrice) : null;

  const page = Math.max(1, Number(req.query.page || 1));
  const size = Math.min(50, Math.max(1, Number(req.query.size || 10)));
  const from = (page - 1) * size;

  const filters = [];
  if (category) filters.push({ term: { category } });

  if (Number.isFinite(minPrice) || Number.isFinite(maxPrice)) {
    const range = {};
    if (Number.isFinite(minPrice)) range.gte = minPrice;
    if (Number.isFinite(maxPrice)) range.lte = maxPrice;
    filters.push({ range: { price: range } });
  }

  const query = q
    ? {
        bool: {
          must: [
            {
              multi_match: {
                // name pesa di più
                query: q,
                fields: ["name^3", "description"]
              }
            }
          ],
          filter: filters
        }
      }
    : { bool: { filter: filters } };

  const resp = await client.search({
    index: INDEX,
    from,
    size,
    query,
    sort: [
      { _score: "desc" },
      { "name.keyword": "asc" }
    ]
  });

  const hits = resp.hits.hits.map((h) => ({
    id: h._id,
    score: h._score,
    ...h._source
  }));

  res.json({
    page,
    size,
    total: resp.hits.total?.value ?? hits.length,
    items: hits
  });
});

// Autocomplete (edge n-gram)
router.get("/products/suggest", async (req, res) => {
  const prefix = String(req.query.q || "").trim();
  if (!prefix) return res.json({ items: [] });

  const resp = await client.search({
    index: INDEX,
    size: 8,
    query: {
      match: {
        name_suggest: {
          query: prefix
        }
      }
    },
    _source: ["id", "name", "category", "price"]
  });

  const items = resp.hits.hits.map((h) => ({
    id: h._id,
    name: h._source?.name,
    category: h._source?.category,
    price: h._source?.price
  }));

  res.json({ items });
});

module.exports = { router };

Avvio dell'ambiente

Dalla root del progetto:

docker compose up --build

Verifica:

http://localhost:3000/health per lo stato dell'API
http://localhost:9200 per Elasticsearch (richiede basic auth)

Esempio chiamata a Elasticsearch via curl con autenticazione basic:

curl -u elastic:changeme http://localhost:9200

Caricare dati di esempio

Indicizza un documento:

curl -X POST "http://localhost:3000/api/products"   -H "Content-Type: application/json"   -d '{
    "id": "p1",
    "name": "Zaino impermeabile",
    "description": "Zaino leggero per escursioni, con tasche interne.",
    "category": "outdoor",
    "tags": ["trekking", "viaggio"],
    "price": 79.9
  }'

Ricerca:

curl "http://localhost:3000/api/products/search?q=zaino&page=1&size=10"

Autocomplete:

curl "http://localhost:3000/api/products/suggest?q=zai"

Concetti chiave: analizzatori, mapping e query

Testo vs keyword

In Elasticsearch:

text è analizzato (tokenizzato) ed è adatto al full-text
keyword è non analizzato ed è adatto a filtri exact match e aggregazioni

Multi match e boosting

Con multi_match puoi cercare su più campi con pesi diversi (boost). Nell'esempio name^3 indica che il campo name conta circa tre volte rispetto a description.

Bool query: must + filter

La combinazione must (rilevanza) e filter (condizioni non influenti sullo score) è tipica per API web perché:

i filtri possono essere cache-ati internamente
la rilevanza resta basata sul testo cercato
il comportamento è più prevedibile per l'utente finale

Gestione degli aggiornamenti e del refresh

Elasticsearch è near real-time: un documento indicizzato potrebbe non essere immediatamente ricercabile finché non avviene un refresh. In sviluppo, per semplificare, abbiamo usato: refresh: "wait_for" sull'indicizzazione, che attende il refresh prima di rispondere.

In produzione valuta con attenzione: wait_for rende le write più lente. Alternative:

accettare una piccola latenza di visibilità
usare refresh espliciti solo per casi speciali
batching con Bulk API per grandi volumi

Bulk indexing per performance

Per import massivi, usa la Bulk API. Esempio:

const { client } = require("./elastic");
const { INDEX } = require("./indices");

async function bulkIndex(products) {
  const operations = products.flatMap((p) => [
    { index: { _index: INDEX, _id: String(p.id) } },
    { ...p, name_suggest: p.name, createdAt: p.createdAt || new Date().toISOString() }
  ]);

  const resp = await client.bulk({ refresh: false, operations });

  if (resp.errors) {
    const failures = [];
    resp.items.forEach((item, i) => {
      const action = item.index;
      if (action && action.error) {
        failures.push({ i, id: action._id, error: action.error });
      }
    });
    throw new Error(`Bulk errors: ${JSON.stringify(failures.slice(0, 3))}`);
  }
}

Operatività Docker: dati, reset e troubleshooting

Persistenza dei dati

Il volume esdata conserva i dati. Se vuoi ripartire da zero:

docker compose down -v

Log utili

docker compose logs -f elasticsearch
docker compose logs -f app

Errori comuni

401 Unauthorized: username/password errati oppure stai chiamando Elasticsearch senza basic auth.
ECONNREFUSED: Elasticsearch non è pronto o la URL è sbagliata. Controlla depends_on e healthcheck.
Out of memory: riduci ES_JAVA_OPTS o assegna più memoria a Docker.

Best practice per produzione

Sicurezza: abilita TLS su HTTP, usa utenti/ruoli dedicati e conserva i segreti in un secret store.
Index template e alias: usa alias per “promuovere” nuovi indici senza downtime (blue/green).
Mapping controllato: evita il mapping dinamico per campi non previsti (o limita con dynamic templates).
Osservabilità: raccogli metriche (latency, error rate) e log strutturati; usa tracing per la ricerca.
Resilienza: imposta timeout, retry ragionati e circuit breaker lato applicazione.
Performance: usa Bulk, riduci refresh, definisci shard/replica in base al carico e alla retention.

Appendice: chiamate rapide per ispezionare indice e mapping

# Mapping
curl -u elastic:changeme "http://localhost:9200/products/_mapping?pretty"

# Settings
curl -u elastic:changeme "http://localhost:9200/products/_settings?pretty"

# Conteggio documenti
curl -u elastic:changeme "http://localhost:9200/products/_count?pretty"

# Esempio ricerca diretta
curl -u elastic:changeme -X POST "http://localhost:9200/products/_search?pretty"   -H "Content-Type: application/json"   -d '{ "query": { "match": { "name": "zaino" } } }'

Con questa base hai un'integrazione funzionale tra Node.js ed Elasticsearch, orchestrata con Docker Compose, estendibile con autenticazione più robusta, migrazioni di indice (alias), e pattern di indicizzazione adatti a casi reali (cataloghi, ricerca interna, log, contenuti editoriali).