Difference between revisions of "Monitoring guide for SPs"
Line 51: | Line 51: | ||
= Checklist = | = Checklist = | ||
+ | |||
+ | |||
+ | == References == | ||
+ | * [http://argoeu.github.io/monitoring-probes/v1/guidelines_for_monitoring_probes/ Probe development guidelines] | ||
+ | * [https://poem.ni4os.eu/ui/public_probes List of available probes] | ||
+ | * [https://poem.ni4os.eu/ui/public_metrics List of available metrics] | ||
+ | * [https://gocdb.ni4os.eu Topology tool] | ||
+ | |||
+ | |||
+ | |||
+ | : |
Revision as of 12:23, 2 June 2020
The ARGO Monitoring service provides a flexible and scalable framework for monitoring status, availability and reliability of a wide range of services provided by infrastructures with medium to high complexity. ARGO generates reports using customer defined profiles (e.g. for SLA management, operations, etc.). During the report generation, ARGO takes into account custom factors such as the importance of a specific service endpoint and scheduled or unscheduled downtimes.
Topology
Mpla mpla URL: https://gocdb.ni4os.eu
Information about the monitored service(s) the service types they are running (ex. wiki) the service endpoints of the service (ex. endpoint) the way they are organized (ex. in groups of sites, in groups of services). Model different types of infrastructure architectures the service actors (owners, admins)
Service endpoint mandatory fields: Is this service in production? (Y = monitored on prod & devel) (N = monitored on devel) Is this service monitored? Must be Y Do you wish to receive notifications about this service? Y (if you wish to receive alerts)
Extra GOCDB attributes
ARGO can use extra GOCDB attributes to properly monitor service endpoints. Table below will contain attributes that must be defined for each service type.
For example: Service URL: https://www.mywebapp.com:9999
Metrics
A metric is a simple chunk of code that checks specific functionality of a given service. For example: Portal-WebCheck: checks the http if it responds CertValidity: checks the validity of a certificate
For your service you will need some metrics so as to start monitoring it. We will start monitoring with some basic checks like webcheck and cert validity.
Service probe
Apart from the basic checks each service should have a list of specific metrics from the user perspective. Monitoring services from the user point of view means that all the services have to be monitored in the same way regardless of who the service providers are and where they are located. The owners of the service are the ones that know exactly how the service is working. The service development team with the support of the monitoring team is responsible to implement the probe that checks and at the same time mimics the actual end user behaviour without requiring special privileges or special configurations. Probe development guidelines: http://argoeu.github.io/monitoring-probes/v1/guidelines_for_monitoring_probes/
The ARGO Monitoring has a list of probes that are used for the services already monitored. So the SP should search in the library if he want(s) to use them:
- Probes: https://poem.ni4os.eu/ui/public_probes
- Metrics (currently available): https://poem.ni4os.eu/ui/public_metrics
Nagios Exchange: https://exchange.nagios.org/
Probe Development Process
- Discuss (what to check): Discussion with representatives - developers of each service in order to agree on a set of monitored metrics. - Develop (How to check): Development and testing of probe(s). The development lifecycle includes: coding of the probe, documentation, testing and packaging. - Monitor (): The lifecycle of the deployment of the service probe is based on the following repetitive steps: a) guidelines from the service owners are created. The monitoring team makes the necessary configurations. b) test, verify. if it passes the tests c) The report changes and now has your service metrics!!!! Monitoring starts and you can get the status A/R reports for your service.
Checklist
References