Agent-skills elasticsearch-security-troubleshooting
git clone https://github.com/elastic/agent-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/elastic/agent-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/elasticsearch/elasticsearch-security-troubleshooting" ~/.claude/skills/elastic-agent-skills-elasticsearch-security-troubleshooting && rm -rf "$T"
skills/elasticsearch/elasticsearch-security-troubleshooting/SKILL.mdElasticsearch Security Troubleshooting
Diagnose and resolve common Elasticsearch security issues. This skill provides a structured triage workflow for authentication failures, authorization errors, TLS problems, API key issues, role mapping mismatches, Kibana login failures, and license-expiry lockouts.
For authentication methods and API key management, see the elasticsearch-authn skill. For roles, users, and role mappings, see the elasticsearch-authz skill. For license management, see the elasticsearch-license skill.
For diagnostic API endpoints, see references/api-reference.md.
Deployment note: Diagnostic API availability differs between self-managed, ECH, and Serverless. See Deployment Compatibility for details.
Jobs to Be Done
- Diagnose HTTP 401 authentication failures
- Diagnose HTTP 403 permission denied errors
- Troubleshoot TLS/SSL handshake or certificate errors
- Investigate expired or invalid API keys
- Debug role mappings that do not grant expected roles
- Fix Kibana login failures, redirect loops, or CORS errors
- Recover from a license-expiry lockout
- Determine why a user lacks access to a specific index
Prerequisites
| Item | Description |
|---|---|
| Elasticsearch URL | Cluster endpoint (e.g. or a Cloud deployment URL) |
| Authentication | Any valid credentials — even minimal — to reach the cluster |
| Cluster privileges | for read-only diagnostics; for fixes |
Prompt the user for any missing values. If the user cannot authenticate at all, start with TLS and Certificate Errors or License Expiry Recovery.
Diagnostic Workflow
Route the symptom to the correct section:
| Symptom | Section |
|---|---|
HTTP 401, | Authentication Failures |
HTTP 403, , access denied | Authorization Failures |
| SSL/TLS handshake error, certificate rejected | TLS and Certificate Errors |
| API key rejected, expired, or ineffective | API Key Issues |
| Role mapping not granting expected roles | Role Mapping Issues |
| Kibana login broken, redirect loop, CORS error | Kibana Authentication Issues |
| All users locked out, paid features disabled | License Expiry Recovery |
Each section follows a Gather - Diagnose - Resolve pattern.
Diagnostic Toolkit
Use these APIs at the start of any security investigation:
curl <auth_flags> "${ELASTICSEARCH_URL}/_security/_authenticate"
Confirms identity, realm, and roles. If this fails with 401, the problem is authentication.
curl <auth_flags> "${ELASTICSEARCH_URL}/_xpack"
Confirms whether security is enabled (
features.security.enabled). If security is disabled, all security APIs return
errors.
curl -X POST "${ELASTICSEARCH_URL}/_security/user/_has_privileges" \ <auth_flags> \ -H "Content-Type: application/json" \ -d '{ "index": [ { "names": ["'"${INDEX_PATTERN}"'"], "privileges": ["read"] } ] }'
Tests whether the authenticated user holds specific privileges without requiring
manage_security.
curl <auth_flags> "${ELASTICSEARCH_URL}/_license"
Check license type and status. An expired paid license disables paid realms and features.
Authentication Failures (401)
A 401 response means Elasticsearch could not verify the caller's identity.
Gather
curl -v <auth_flags> "${ELASTICSEARCH_URL}/_security/_authenticate" 2>&1
The
-v flag shows headers and the response body. Look for:
header — indicates which auth schemes the cluster accepts.WWW-Authenticate
in the response body — theauthentication_exception
field describes what failed.reason
Diagnose
| Symptom | Likely cause |
|---|---|
| Wrong username or password |
| Credentials do not match any realm in the chain |
| The native user account is disabled |
| API key or bearer token has expired |
No header | Security may be disabled; check |
If the user authenticates via an external realm (LDAP, AD, SAML, OIDC), the realm chain order matters. Elasticsearch tries realms in configured order and stops at the first match. If a higher-priority realm rejects the credentials before the intended realm is reached, authentication fails.
Resolve
| Cause | Action |
|---|---|
| Wrong credentials | Verify username/password or API key value. See elasticsearch-authn. |
| Disabled user | . See elasticsearch-authz. |
| Expired API key | Create a new API key. See API Key Issues. |
| Realm chain order | Check realm order (self-managed only). |
| Security disabled | Enable in and restart. |
| Paid realm after expiry | License expired — see License Expiry Recovery. |
Authorization Failures (403)
A 403 response means the user is authenticated but lacks the required privileges.
Gather
Test the specific privileges the operation requires:
curl -X POST "${ELASTICSEARCH_URL}/_security/user/_has_privileges" \ <auth_flags> \ -H "Content-Type: application/json" \ -d '{ "index": [ { "names": ["logs-*"], "privileges": ["read", "view_index_metadata"] } ], "cluster": ["monitor"] }'
The response contains a
has_all_requested boolean and per-resource breakdowns.
Also check the user's effective roles:
curl <auth_flags> "${ELASTICSEARCH_URL}/_security/_authenticate"
Inspect the
roles array and authentication_realm to confirm the user is who you expect.
Diagnose
| Symptom | Likely cause |
|---|---|
for an index | Role is missing the required index privilege |
for a cluster | Role is missing the required cluster privilege |
| User has fewer roles than expected | Roles array was replaced (not merged) on last update |
| API key returns 403 on previously allowed | API key privileges are a snapshot — role changes after |
| operation | creation do not propagate to existing keys |
Resolve
| Cause | Action |
|---|---|
| Missing index privilege | Add the privilege to the role or create a new role. See elasticsearch-authz. |
| Missing cluster privilege | Add the cluster privilege. See elasticsearch-authz. |
| Roles replaced on update | Fetch current roles first, then update with the full array. See elasticsearch-authz. |
| Stale API key privileges | Create a new API key with updated . See elasticsearch-authn. |
TLS and Certificate Errors
TLS errors prevent the client from establishing a connection at all.
Gather
curl -v --cacert "${CA_CERT}" "https://${ELASTICSEARCH_HOST}:9200/" 2>&1 | head -30
Look for:
— CA not trusted.SSL certificate problem: unable to get local issuer certificate
— certificate past its validity date.SSL certificate problem: certificate has expired
— hostname mismatch.SSL: no alternative certificate subject name matches target host name
For deeper inspection (self-managed only):
openssl s_client -connect "${ELASTICSEARCH_HOST}:9200" -showcerts </dev/null 2>&1
This displays the full certificate chain, expiry dates, and subject alternative names.
Diagnose
| Error message | Likely cause |
|---|---|
| Missing or wrong CA certificate |
| Server or CA certificate past expiry |
| Certificate SAN does not include the hostname |
| Self-signed cert not in the trust store |
(Java client) | Truststore missing the CA or wrong password |
Resolve
| Cause | Action |
|---|---|
| Wrong CA cert | Pass the correct CA with or add it to the system trust store. |
| Expired certificate | Regenerate certificates with (self-managed). |
| Hostname mismatch | Regenerate the certificate with the correct SAN entries. |
| Self-signed cert | Distribute the CA cert to all clients or use a publicly trusted CA. |
| Quick workaround | Use / to skip verification. Not for production. |
On ECH, TLS is managed by Elastic — certificate errors usually indicate the client is not using the correct Cloud endpoint URL. On Serverless, TLS is fully managed and transparent.
API Key Issues
Gather
Retrieve the key's metadata:
curl "${ELASTICSEARCH_URL}/_security/api_key?name=${KEY_NAME}" <auth_flags>
Check
expiration, invalidated, and role_descriptors in the response.
Diagnose
| Symptom | Likely cause |
|---|---|
| 401 when using the key | Key expired or invalidated |
| 403 on operations that should be allowed | Key was created with insufficient |
| Derived key has no access | API key created another API key — derived keys have no privilege |
| Key works for some indices but not others | scope is too narrow |
Resolve
| Cause | Action |
|---|---|
| Expired key | Create a new key with appropriate . See elasticsearch-authn. |
| Invalidated key | Create a new key. Invalidated keys cannot be reinstated. |
| Wrong scope | Create a new key with correct . See elasticsearch-authn. |
| Derived key problem | Use with user credentials instead. See elasticsearch-authn. |
Role Mapping Issues
Role mappings grant roles to users from external realms. When they fail silently, users authenticate but get no roles.
Gather
curl <auth_flags> "${ELASTICSEARCH_URL}/_security/_authenticate"
Note the
username, authentication_realm.name, and roles array.
curl <auth_flags> "${ELASTICSEARCH_URL}/_security/role_mapping"
List all mappings and inspect their
rules and enabled fields.
Diagnose
| Symptom | Likely cause |
|---|---|
User has empty array | No mapping matches the user's attributes |
| User gets wrong roles | A different mapping matched first or the rule is too broad |
| Mapping exists but does not apply | is |
| Mustache template produces wrong role name | Template syntax error or unexpected attribute value |
Compare the user's
authentication_realm.name and groups (from _authenticate) against each mapping's rules to
find the mismatch.
Resolve
| Cause | Action |
|---|---|
| No matching rule | Update the mapping rules to match the user's realm and attributes. |
| Mapping disabled | Set on the mapping. |
| Template error | Test the Mustache template with known attribute values. See elasticsearch-authz. |
| Rule too broad | Add / conditions to narrow the match. See elasticsearch-authz. |
Kibana Authentication Issues
Missing kbn-xsrf
header
kbn-xsrfAll mutating Kibana API requests require the
kbn-xsrf header:
curl -X PUT "${KIBANA_URL}/api/security/role/my-role" \ <auth_flags> \ -H "kbn-xsrf: true" \ -H "Content-Type: application/json" \ -d '{ ... }'
Without it, Kibana returns
400 Bad Request with "Request must contain a kbn-xsrf header".
SAML/OIDC redirect loop
Common causes:
- Incorrect
orxpack.security.authc.realms.saml.*.sp.acs
inidp.metadata.path
.elasticsearch.yml - Clock skew between the IdP and Elasticsearch nodes (SAML assertions have a validity window).
- Kibana
does not match the SAML ACS URL.server.publicBaseUrl
Verify the SAML realm configuration:
curl <auth_flags> "${ELASTICSEARCH_URL}/_security/_authenticate"
If this returns a valid user via a non-SAML realm, the SAML realm itself is not being reached. Check realm chain order.
Kibana cannot reach Elasticsearch
Kibana logs
Unable to retrieve version information from Elasticsearch nodes. Verify the elasticsearch.hosts setting
in kibana.yml points to a reachable endpoint and the credentials (elasticsearch.username / elasticsearch.password
or elasticsearch.serviceAccountToken) are valid.
License Expiry Recovery
When a paid license expires, the cluster enters a security-closed state: paid realms (SAML, LDAP, AD, PKI) stop working and users authenticating through them are locked out. Native and file realms remain functional.
Quick triage
curl <auth_flags> "${ELASTICSEARCH_URL}/_license"
If
license.status is "expired", proceed with recovery.
Recovery steps
Follow the detailed recovery workflow in the elasticsearch-license skill. The critical first step depends on deployment type:
| Deployment | First step |
|---|---|
| Self-managed | Log in with a file-based user ( CLI) or native user. |
| ECH | Contact Elastic support or renew via the Cloud console. |
| Serverless | Not applicable — licensing is fully managed by Elastic. |
Examples
User gets 403 when querying logs
Symptom: "I get a 403 when searching
logs-*."
- Verify identity:
curl -u "joe:${PASSWORD}" "${ELASTICSEARCH_URL}/_security/_authenticate"
Response shows
"roles": ["viewer"].
- Test privileges:
curl -X POST "${ELASTICSEARCH_URL}/_security/user/_has_privileges" \ -u "joe:${PASSWORD}" \ -H "Content-Type: application/json" \ -d '{"index": [{"names": ["logs-*"], "privileges": ["read"]}]}'
Response:
"has_all_requested": false — the viewer role does not include read on logs-*.
- Fix: create a
role and assign it to Joe. See elasticsearch-authz.logs-reader
API key stopped working
Symptom: "My API key returns 401 since yesterday."
- Check the key:
curl -u "admin:${PASSWORD}" "${ELASTICSEARCH_URL}/_security/api_key?name=my-key"
Response shows
"expiration": 1709251200000 — the key expired.
- Fix: create a new API key with a suitable
. See elasticsearch-authn.expiration
SAML login redirects to error
Symptom: "Clicking the SSO button in Kibana redirects to an error page."
- Check if the SAML realm is reachable by authenticating with a non-SAML method:
curl -u "elastic:${PASSWORD}" "${ELASTICSEARCH_URL}/_security/_authenticate"
- Verify the IdP metadata URL is accessible from the Elasticsearch nodes (self-managed):
curl -s "${IDP_METADATA_URL}" | head -5
- Check for clock skew — SAML assertions are time-sensitive. Ensure NTP is configured on all nodes.
- Verify
inserver.publicBaseUrl
matches the SAML ACS URL configured in the IdP.kibana.yml
Users locked out after license expired
Symptom: "Nobody can log in to Kibana. We use SAML."
- Check license:
curl -u "admin:${PASSWORD}" "${ELASTICSEARCH_URL}/_license"
Response shows
"status": "expired", "type": "platinum".
- The SAML realm is disabled because the paid license expired. Follow the recovery steps in elasticsearch-license: log in with a file-based or native user, then upload a renewed license or revert to basic.
Guidelines
Always start with _authenticate
_authenticateRun
GET /_security/_authenticate as the first diagnostic step. It reveals the user's identity, realm, roles, and
authentication type in a single call. Most issues become apparent from this response alone.
Check the license early
Before investigating realm or privilege issues, verify the license is active with
GET /_license. An expired paid
license disables realms and features, producing symptoms that mimic misconfiguration.
Use _has_privileges
before manual inspection
_has_privilegesInstead of reading role definitions and mentally computing effective access, use
POST /_security/user/_has_privileges
to test specific privileges directly. This is faster and accounts for role composition, DLS, and FLS.
Avoid superuser credentials
Never use the built-in
elastic superuser for day-to-day troubleshooting. Create a dedicated admin user or API key with
manage_security privileges. Reserve the elastic user for initial setup and emergency recovery only.
Do not bypass TLS in production
Using
curl -k or --insecure skips certificate verification and masks real TLS issues. Use it only for initial
diagnosis, then fix the underlying certificate problem.
Deployment Compatibility
Diagnostic tool and API availability differs across deployment types.
| Tool / API | Self-managed | ECH | Serverless |
|---|---|---|---|
| Yes | Yes | Yes |
| Yes | Yes | Yes |
| Yes | Yes | Limited |
| Yes | Yes (read) | Not available |
(GET) | Yes | Yes | Yes |
| Yes | Yes | Yes |
CLI | Yes | Not available | Not available |
on nodes | Yes | Not available | Not available |
| Elasticsearch logs | Yes | Via Cloud UI | Via Cloud UI |
ECH notes:
- No node-level access, so the
CLI and direct log/certificate inspection are not available.elasticsearch-users - TLS is managed by Elastic — certificate errors typically indicate an incorrect endpoint URL.
- Use the Cloud console for log inspection and deployment configuration.
Serverless notes:
- Licensing APIs are not exposed. License-related lockouts do not occur.
- Native users do not exist — authentication issues are handled at the organization level.
- TLS is fully managed and transparent.