Difference between revisions of "Monitoring guide for SPs"

From NI4OS wiki
Jump to navigation Jump to search
 
(29 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
The ARGO Monitoring service provides a flexible and scalable framework for monitoring status, availability and reliability of a wide range of services provided by infrastructures with medium to high complexity. ARGO generates reports using customer defined profiles (e.g. for SLA management, operations, etc.). During the report generation, ARGO takes into account custom factors such as the importance of a specific service endpoint and scheduled or unscheduled downtimes.  
 
The ARGO Monitoring service provides a flexible and scalable framework for monitoring status, availability and reliability of a wide range of services provided by infrastructures with medium to high complexity. ARGO generates reports using customer defined profiles (e.g. for SLA management, operations, etc.). During the report generation, ARGO takes into account custom factors such as the importance of a specific service endpoint and scheduled or unscheduled downtimes.  
 +
 +
ARGO Monitoring Service for NI4OS consists of production and development infrastructure. Production infrastructure is deployed in a redundant manner and is used for generating reports and raising alarms for production-grade on-boarded services. Development infrastructure is used for testing and integration of new services and probes. Web UI can be found:
 +
* Production: https://argo.ni4os.eu
 +
* Development: https://argo-devel.ni4os.eu
  
 
= Topology =
 
= Topology =
Mpla mpla
 
URL: https://gocdb.ni4os.eu
 
  
Information about  
+
The topology tool used in NI4OS is GOCDB and contains general information about the sites participating in the project. It is actually a central registry for e-Infrastructure topology. GOCDB enables detailed describing of service endpoints with custom attributes, tagging and additional sub-endpoints. Services are assigned to resources centres, which are grouped in operations centres. Besides service endpoints, GOCDB enables definition of contact points and declaration of downtimes for individual services endpoints or resource centres.  
the monitored service(s)
 
the service types they are running (ex. wiki)
 
the service endpoints of the service (ex. endpoint)
 
the way they are organized  (ex. in groups of sites, in groups of services). Model different types of infrastructure architectures
 
the service actors (owners, admins)
 
  
Service endpoint mandatory fields:
+
'''URL''': [https://gocdb.ni4os.eu NI4OS GOCDB]
Is this service in production? (Y = monitored on prod & devel) (N = monitored on devel)
+
 
Is this service monitored? Must be Y
+
== Topology Information ==
Do you wish to receive notifications about this service? Y (if you wish to receive alerts)
+
 
 +
Monitoring service relies on topology database to provide the following information:
 +
* the '''monitored service(s)'''
 +
* the '''service types''' they are running (ex. wiki)
 +
* the '''service endpoints''' of the service (ex. endpoint)
 +
* the '''way they are organized'''  (ex. in groups of sites, in groups of services)
 +
* the '''service actors''' (owners, admins, contact points).
 +
 
 +
When adding service endpoint following fields are mandatory for monitoring service:
 +
* Production Level (Is this service in production?):
 +
** Y = monitored on production & development infrastructure
 +
** N = monitored only on development infrastructure
 +
* Monitored (Is this service monitored?)
 +
** must be set to Y
 +
* Notifications (Do you wish to receive notifications about this service?)
 +
** set to Y if you wish to receive alerts.
  
 
== Extra GOCDB attributes ==
 
== Extra GOCDB attributes ==
  
 
ARGO can use extra GOCDB attributes to properly monitor service endpoints. Table below will contain attributes that must be defined for each service type.
 
ARGO can use extra GOCDB attributes to properly monitor service endpoints. Table below will contain attributes that must be defined for each service type.
 
+
Until now there are no extra GOCDB attributes.
For example:
 
Service URL: https://www.mywebapp.com:9999
 
  
 
= Metrics =
 
= Metrics =
  
 
A metric is a simple chunk of code that checks specific functionality of a given service. For example:  
 
A metric is a simple chunk of code that checks specific functionality of a given service. For example:  
Portal-WebCheck:  checks the http if it responds
+
 
CertValidity: checks the validity of a certificate
+
* Portal-WebCheck:  checks the http if it responds
 +
* CertValidity: checks the validity of a certificate
  
 
For your service you will need some metrics so as to start monitoring it. We will start monitoring with some basic checks like webcheck and cert validity.  
 
For your service you will need some metrics so as to start monitoring it. We will start monitoring with some basic checks like webcheck and cert validity.  
Line 37: Line 48:
  
 
The owners of the service are the ones that '''know exactly how the service is working'''. The service development team with the support of the monitoring team  is responsible to implement the probe that checks and at the same time mimics the actual end user behaviour without requiring special privileges or special configurations.  
 
The owners of the service are the ones that '''know exactly how the service is working'''. The service development team with the support of the monitoring team  is responsible to implement the probe that checks and at the same time mimics the actual end user behaviour without requiring special privileges or special configurations.  
Probe development guidelines: http://argoeu.github.io/monitoring-probes/v1/guidelines_for_monitoring_probes/
 
  
The ARGO Monitoring has a list of probes that are used for the services already monitored. So the SP should search in the library if he want(s) to use them:
+
Before you start implementing your own probe please check in the library if appropriate probe is already used for monitoring sevices:
 
* Probes: https://poem.ni4os.eu/ui/public_probes
 
* Probes: https://poem.ni4os.eu/ui/public_probes
 
* Metrics (currently available): https://poem.ni4os.eu/ui/public_metrics
 
* Metrics (currently available): https://poem.ni4os.eu/ui/public_metrics
  
Nagios Exchange: https://exchange.nagios.org/
+
If you cannot find a probe for your service then you should follow the development process described in the next chapter.
  
 
== Probe Development Process ==
 
== Probe Development Process ==
Line 49: Line 59:
 
[[File:Probe-developemnt.png|300px|thumb|right|Development Process]]  
 
[[File:Probe-developemnt.png|300px|thumb|right|Development Process]]  
  
- Discuss (what to check): Discussion with representatives - developers of each service in order to agree on a set of monitored metrics.
+
 
- Develop (How to check): Development and testing of probe(s). The development lifecycle includes: coding of the probe, documentation, testing and packaging.  
+
* Discuss (What to check): Discussion with representatives - developers of each service in order to agree on a set of monitored metrics.
- Monitor (): The lifecycle of the deployment of the service probe is based on the following repetitive steps: a)  guidelines from the service owners are created. The monitoring team makes the necessary configurations. b) test, verify. if it passes the tests  c) The report changes and now has your service metrics!!!! Monitoring starts and you can get the status A/R reports for your service.
+
* Develop (How to check): Development and testing of probe(s). The development lifecycle includes: coding of the probe, documentation, testing and packaging.  
 +
* Monitor (Lets start monitoring): The lifecycle of the deployment of the service probe is based on the following repetitive steps:  
 +
# guidelines from the service owners are created. The monitoring team makes the necessary configurations.  
 +
# test, verify. if it passes the tests   
 +
# The report changes and now has your service metrics!!!! Monitoring starts and you can get the status A/R reports for your service.
 +
 
 +
The probe development guidelines: http://argoeu.github.io/monitoring-probes/v1/guidelines_for_monitoring_probes/
  
 
= Checklist =
 
= Checklist =
  
 +
Integrating new service into ARGO Monitoring service. See below the two main steps.
 +
 +
{| class="wikitable"
 +
|-
 +
! colspan="2" style=text-align:left | 1. Does Topology database (GOCDB) already have service type for my service?
 +
|-
 +
|<tt>'''YES''' - SP to add new service endpoint in Topology database</tt>
 +
|-
 +
|<tt>'''NO''' - SP to follow procedure to add new service type to Topology database, then add new service endpoint & go to step 2.</tt>
 +
|-
 +
! colspan="2"  style=text-align:left | 2. Does POEM contain metrics & probes that can be used to monitor my service?
 +
|-
 +
|<tt>'''YES''' - ARGO admin to add mapping between new service type and metrics in relevant profiles</tt>
 +
|-
 +
|<tt>'''NO''' - SP to follow Probe development process.</tt>
 +
|-
 +
|}
  
 
= References =
 
= References =
 +
* [https://argo.ni4os.eu/ni4os/documentation ARGO documentation for users ]
 
* [http://argoeu.github.io/monitoring-probes/v1/guidelines_for_monitoring_probes/ Probe development guidelines]
 
* [http://argoeu.github.io/monitoring-probes/v1/guidelines_for_monitoring_probes/ Probe development guidelines]
 
* [https://poem.ni4os.eu/ui/public_probes List of available probes]
 
* [https://poem.ni4os.eu/ui/public_probes List of available probes]

Latest revision as of 20:46, 2 June 2020

The ARGO Monitoring service provides a flexible and scalable framework for monitoring status, availability and reliability of a wide range of services provided by infrastructures with medium to high complexity. ARGO generates reports using customer defined profiles (e.g. for SLA management, operations, etc.). During the report generation, ARGO takes into account custom factors such as the importance of a specific service endpoint and scheduled or unscheduled downtimes.

ARGO Monitoring Service for NI4OS consists of production and development infrastructure. Production infrastructure is deployed in a redundant manner and is used for generating reports and raising alarms for production-grade on-boarded services. Development infrastructure is used for testing and integration of new services and probes. Web UI can be found:

Topology

The topology tool used in NI4OS is GOCDB and contains general information about the sites participating in the project. It is actually a central registry for e-Infrastructure topology. GOCDB enables detailed describing of service endpoints with custom attributes, tagging and additional sub-endpoints. Services are assigned to resources centres, which are grouped in operations centres. Besides service endpoints, GOCDB enables definition of contact points and declaration of downtimes for individual services endpoints or resource centres.

URL: NI4OS GOCDB

Topology Information

Monitoring service relies on topology database to provide the following information:

  • the monitored service(s)
  • the service types they are running (ex. wiki)
  • the service endpoints of the service (ex. endpoint)
  • the way they are organized (ex. in groups of sites, in groups of services)
  • the service actors (owners, admins, contact points).

When adding service endpoint following fields are mandatory for monitoring service:

  • Production Level (Is this service in production?):
    • Y = monitored on production & development infrastructure
    • N = monitored only on development infrastructure
  • Monitored (Is this service monitored?)
    • must be set to Y
  • Notifications (Do you wish to receive notifications about this service?)
    • set to Y if you wish to receive alerts.

Extra GOCDB attributes

ARGO can use extra GOCDB attributes to properly monitor service endpoints. Table below will contain attributes that must be defined for each service type. Until now there are no extra GOCDB attributes.

Metrics

A metric is a simple chunk of code that checks specific functionality of a given service. For example:

  • Portal-WebCheck: checks the http if it responds
  • CertValidity: checks the validity of a certificate

For your service you will need some metrics so as to start monitoring it. We will start monitoring with some basic checks like webcheck and cert validity.

Service probe

Apart from the basic checks each service should have a list of specific metrics from the user perspective. Monitoring services from the user point of view means that all the services have to be monitored in the same way regardless of who the service providers are and where they are located.

The owners of the service are the ones that know exactly how the service is working. The service development team with the support of the monitoring team is responsible to implement the probe that checks and at the same time mimics the actual end user behaviour without requiring special privileges or special configurations.

Before you start implementing your own probe please check in the library if appropriate probe is already used for monitoring sevices:

If you cannot find a probe for your service then you should follow the development process described in the next chapter.

Probe Development Process

Development Process


  • Discuss (What to check): Discussion with representatives - developers of each service in order to agree on a set of monitored metrics.
  • Develop (How to check): Development and testing of probe(s). The development lifecycle includes: coding of the probe, documentation, testing and packaging.
  • Monitor (Lets start monitoring): The lifecycle of the deployment of the service probe is based on the following repetitive steps:
  1. guidelines from the service owners are created. The monitoring team makes the necessary configurations.
  2. test, verify. if it passes the tests
  3. The report changes and now has your service metrics!!!! Monitoring starts and you can get the status A/R reports for your service.

The probe development guidelines: http://argoeu.github.io/monitoring-probes/v1/guidelines_for_monitoring_probes/

Checklist

Integrating new service into ARGO Monitoring service. See below the two main steps.

1. Does Topology database (GOCDB) already have service type for my service?
YES - SP to add new service endpoint in Topology database
NO - SP to follow procedure to add new service type to Topology database, then add new service endpoint & go to step 2.
2. Does POEM contain metrics & probes that can be used to monitor my service?
YES - ARGO admin to add mapping between new service type and metrics in relevant profiles
NO - SP to follow Probe development process.

References