MetricsHub
MetricsHub 1.0.03-SNAPSHOT
-
Home
- Troubleshooting
Degraded Performance
If you observe delays in data collection, missing data points, or timeouts, enable the self-monitoring feature as described in the Monitoring Configuration[1] page. This feature provides detailed metrics about job execution times, helping you identify inefficiencies such as misconfigurations, bottlenecks, or performance issues in specific components.
When self-monitoring is enabled, the metricshub.job.duration metric provides insights into task execution times. Key attributes include:
job.type: The operation performed by MetricsHub. Possible values are:discovery: Identifies and registers components.collect: Gathers telemetry data from monitored components.simple: Executes a straightforward task.beforeAllorafterAll: Performs preparatory or cleanup operations.
monitor.type: The component being monitored, such as:- Hardware metrics:
cpu,memory,physical_disk, ordisk_controller. - Environmental metrics:
temperatureorbattery. - Logical entities:
connector.
- Hardware metrics:
connector_id: The unique identifier for the connector, such as HPEGen10IloREST for the HPE Gen10 iLO REST connector.
These metrics can be viewed in Prometheus/Grafana or in the metricshub-agent-$resourceId-$timestamp.log file. Refer to the MetricsHub Log Files[2] page for details on locating and interpreting log files.
Example
Example of metrics emitted for the HPEGen10IloREST connector:
metricshub.job.duration{job.type="discovery", monitor.type="enclosure", connector_id="HPEGen10IloREST"} 0.020
metricshub.job.duration{job.type="discovery", monitor.type="cpu", connector_id="HPEGen10IloREST"} 0.030
metricshub.job.duration{job.type="discovery", monitor.type="temperature", connector_id="HPEGen10IloREST"} 0.025
metricshub.job.duration{job.type="discovery", monitor.type="connector", connector_id="HPEGen10IloREST"} 0.015
metricshub.job.duration{job.type="collect", monitor.type="cpu", connector_id="HPEGen10IloREST"} 0.015
In this example:
- during
discovery:- The
enclosuremonitor takes0.020seconds. - The
cpumonitor takes0.030seconds. - The
temperaturemonitor takes0.025seconds. - The
connectormonitor takes0.015seconds.
- The
- during
collect, thecpumetrics collection takes0.015seconds.
These metrics indicate that MetricsHub is functioning as expected, with task durations well within acceptable ranges. Jobs exceeding 5 seconds may require further investigation.
For example, if a job takes more than 5 seconds, as shown below:
metricshub.job.duration{job.type="collect", monitor.type="network", connector_id="WbemGenNetwork"} 5.8
- Identify, the
job.type,monitor.type, andconnector.id. In this example, collecting network metrics with theWbemGenNetworkis the bottleneck - Check the
metricshub-agent-$resourceId-$timestamp.logfile for the start and end timestamps of each job step to identify where performance degradation occurs.
You can also:
- Verify resource availability: Ensure the monitored system has sufficient CPU, memory, and storage resources to handle monitoring tasks.
- Check MetricsHub configuration: Review your configuration to ensure MetricsHub is set up correctly .
- Restart services: If configurations appear correct, try restarting relevant services.
- Inspect network configurations: Check for network latency or connectivity issues between MetricsHub and the monitored resources, and ensure network settings (e.g., firewalls or proxies) are not causing delays.
- Examine logs: Look for warnings or errors in the MetricsHub logs[2] or the monitored system's logs to identify potential problems.
- Review timeouts: Ensure timeout settings are appropriate for the environment to prevent unnecessary delays or retries.
