Zero-downtime Elasticsearch data reindexing

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.
Development and maintenance of all types of websites:
Informational websites or web applications
Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators
E-commerce websites or web applications
Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers
Business process management web applications
CRM systems, ERP systems, corporate portals, production management systems, information parsers
Electronic service websites or web applications
Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Showing 1 of 1 servicesAll 2065 services
Zero-downtime Elasticsearch data reindexing
Complex
~2-3 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1218
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    853
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1047
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    819

Implementing Zero-Downtime Elasticsearch Reindexing

Changing a field mapping is the most common reason for reindexing. You can't change type from text to keyword on a live index. You can't add a new analyzer to an existing field. Solution: create a new index, transfer data via _reindex API, atomically switch the alias. The application continues working throughout the process — reads via alias, which before the switch points to the old index.

Blue/Green Strategy with Aliases

Alias is an abstraction over one or more indexes. Application works with alias products, not knowing the physical index name.

Initial state:

# Check what the alias points to
curl -u elastic:pw "localhost:9200/_alias/products"

# Response:
# { "products_v1": { "aliases": { "products": { "is_write_index": true } } } }

Step 1 — create a new index with changed mapping:

PUT /products_v2
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 0,
    "refresh_interval": "-1",
    "analysis": {
      "analyzer": {
        "product_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "russian_stemmer"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "id": { "type": "keyword" },
      "title": {
        "type": "text",
        "analyzer": "product_analyzer",
        "fields": {
          "keyword": { "type": "keyword" }
        }
      },
      "price": { "type": "scaled_float", "scaling_factor": 100 },
      "new_field": { "type": "keyword" }
    }
  }
}

number_of_replicas: 0 and refresh_interval: -1 during reindexing speed up data loading.

_reindex API

Step 2 — start reindexing:

POST _reindex?wait_for_completion=false
{
  "source": {
    "index": "products_v1",
    "size": 1000
  },
  "dest": {
    "index": "products_v2",
    "op_type": "create"
  },
  "conflicts": "proceed"
}

wait_for_completion=false — task goes to background, returns task_id. For large indexes (>1M documents) mandatory.

op_type: create — skip document if already exists (important for incremental reindex).

conflicts: proceed — continue on version conflicts, don't abort.

Monitor progress:

# By task_id from response
curl -u elastic:pw "localhost:9200/_tasks/oTUltX4IQMOUUVeiohTt8A:12345?pretty"

# All active reindex tasks
curl -u elastic:pw "localhost:9200/_tasks?actions=*reindex&detailed=true&pretty"

Response contains status.created, status.total — can calculate percentage.

Parallel Reindexing via Slices

For large indexes — parallel slices speed up N times:

POST _reindex?wait_for_completion=false
{
  "source": {
    "index": "products_v1",
    "size": 500
  },
  "dest": {
    "index": "products_v2"
  },
  "slices": "auto"
}

slices: auto — automatically determines number of slices (by source shard count). Each slice processes in parallel as separate task. Reindexing 100M documents with 5 shards using auto runs in 5 threads.

Problem: New Documents During Reindexing

While reindex runs, application continues writing to products_v1 (via alias). New and updated documents won't get into products_v2.

Solution — incremental sync after main reindexing:

POST _reindex?wait_for_completion=false
{
  "source": {
    "index": "products_v1",
    "query": {
      "range": {
        "updated_at": {
          "gte": "2024-01-15T00:00:00",
          "lte": "now"
        }
      }
    }
  },
  "dest": {
    "index": "products_v2",
    "op_type": "index",
    "version_type": "external"
  }
}

version_type: external — use _version field for conflict resolution. Old documents won't overwrite new ones.

For this, mapping must have updated_at field with update timestamp. Without it, incremental reindex is complex.

Atomic Alias Switch

After reindexing and incremental sync complete:

# 1. Restore production settings in new index
PUT /products_v2/_settings
{
  "index.number_of_replicas": 1,
  "index.refresh_interval": "1s"
}

# 2. Wait for replica recovery
curl -u elastic:pw "localhost:9200/_cluster/health/products_v2?wait_for_status=green&timeout=30s"

# 3. Atomically switch alias
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "products_v2",
        "alias": "products",
        "is_write_index": true
      }
    },
    {
      "remove": {
        "index": "products_v1",
        "alias": "products"
      }
    }
  ]
}

Operation is atomic — at switch moment there's no state where alias points to nothing. Requests during switch aren't lost.

Data Transformation During Reindexing

Reindex supports Painless scripts for transformation:

POST _reindex
{
  "source": {
    "index": "products_v1"
  },
  "dest": {
    "index": "products_v2"
  },
  "script": {
    "source": """
      // Split 'full_name' into 'first_name' and 'last_name'
      if (ctx._source.full_name != null) {
        def parts = ctx._source.full_name.splitOnToken(' ');
        ctx._source.first_name = parts[0];
        ctx._source.last_name = parts.length > 1 ? parts[1] : '';
        ctx._source.remove('full_name');
      }

      // Normalize price from string to number
      if (ctx._source.price instanceof String) {
        ctx._source.price = Float.parseFloat(ctx._source.price.replace(',', '.'));
      }
    """,
    "lang": "painless"
  }
}

Rollback Plan

If problems found after switch — rollback in seconds:

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "products_v1",
        "alias": "products",
        "is_write_index": true
      }
    },
    {
      "remove": {
        "index": "products_v2",
        "alias": "products"
      }
    }
  ]
}

Don't delete products_v1 immediately — keep for 24–48 hours for rollback possibility. Then delete to free space.

Pipeline Reindexing via Ingest

For data enrichment during reindexing — via ingest pipeline:

PUT _ingest/pipeline/enrich-products
{
  "processors": [
    {
      "set": {
        "field": "reindexed_at",
        "value": "{{_ingest.timestamp}}"
      }
    },
    {
      "uppercase": {
        "field": "sku",
        "ignore_missing": true
      }
    }
  ]
}

POST _reindex
{
  "source": { "index": "products_v1" },
  "dest": {
    "index": "products_v2",
    "pipeline": "enrich-products"
  }
}

Timeline

Reindexing with simple mapping change — 1 working day (planning, launch, monitoring, switch). Complex scenario with data transformation, incremental sync, and testing — 2–3 days. For indexes > 100M documents — add time for reindex execution itself (6–24 hours depending on hardware).