Difference between revisions of "Accounting"

From NI4OS wiki
Jump to navigation Jump to search
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= Accounting Service =
+
= Accounting Service Integration=
 
 
= How to integrate the accounting system with your service =
 
 
1. Login to the accounting system (create an account if you don’t have one yet)
 
1. Login to the accounting system (create an account if you don’t have one yet)
 
https://accounting.ni4os.eu/
 
https://accounting.ni4os.eu/
Line 24: Line 22:
 
=== Using REST API ===  
 
=== Using REST API ===  
 
[[File:Acc manual add.png|thumb|390px|Adding Storage Data]]
 
[[File:Acc manual add.png|thumb|390px|Adding Storage Data]]
You can post data by making a post request using the parameters available at [https://accounting.ni4os.eu/apidoc https://accounting.ni4os.eu/apidoc]. You will need your resource key and the post body needs to be an array of JSONs in the same format given in the API page.
+
You can post data by making a post request using the parameters available at [https://accounting.ni4os.eu/apidoc https://accounting.ni4os.eu/apidoc]. You will need your resource key and the post body needs to be an array of JSONs in the same format given in the API page. Date format is 'yyyy-mm-dd'. Post body must always be an array. API responds with True(200) for success or False(400) for error.
 +
 
 
=== Using The Clients ===
 
=== Using The Clients ===
 
You can also use the accounting clients at [https://accounting.ni4os.eu/apiclients https://accounting.ni4os.eu/apiclients]. The installation steps for each of those are shown on the same page.
 
You can also use the accounting clients at [https://accounting.ni4os.eu/apiclients https://accounting.ni4os.eu/apiclients]. The installation steps for each of those are shown on the same page.
Line 38: Line 37:
 
=== Resources and Application Tabs ===
 
=== Resources and Application Tabs ===
 
In the Resources tab the user can see each resource, its partner and country. And in the Applications tab the user can choose the group of the applications to see, their short name and their long name.
 
In the Resources tab the user can see each resource, its partner and country. And in the Applications tab the user can choose the group of the applications to see, their short name and their long name.
 +
 +
=Accounting automation=
 +
<div class="stackedit__html">
 +
 +
<span id="introduction"></span>
 +
== Introduction ==
 +
 +
The NI4OS accounting system collects, analyzes, and provides reports about the usage of services listed on the NI4OS marketplace.
 +
 +
The NI4OS accounting system integration includes a login procedure, resource key generation, and choosing a method for data submission. More information about these procedures can be found on the official NI4OS accounting wiki page (https://wiki.ni4os.eu/index.php/Accounting).
 +
 +
The NI4OS Accounting system currently supports two methods for data submission:
 +
 +
* Using the official API clients tailored for popular HPC/Grid systems (https://accounting.ni4os.eu/apiclients).
 +
* Using the REST API for reporting usage information in a service-agnostic manner (https://accounting.ni4os.eu/apidoc).
 +
 +
In this document we will be discussing the second option, analyzing and submitting the accounting data from a number of services, without requiring changes to their source code.
 +
 +
<span id="scenario"></span>
 +
== Scenario ==
 +
 +
To provide a uniform approach for obtaining accounting data, no matter the nature of the underlying service, the HTTP request/response logs produced by the reverse-proxy facilitating the connection to the server running the service can be analyzed. Once the necessary data is extracted, it can be submitted to the NI4OS Accounting API by making a post request using the parameters available at https://accounting.ni4os.eu/apidoc.
 +
 +
A sample scenario that analyzes the access logs generated by the popular NGINX web server configured in a reverse-proxy mode is described in the subsections below. The benefits of this approach are that it is possible to acquire and report accounting information for multiple services at once, as long as they are all fronted by the same reverse-proxy. This scenario has been validated and is in active use by the [https://catalogue.ni4os.eu/?_=/resources/c28834c3-0402-421a-8cd3-be8f862f7b10 Schrödinger API] and [https://catalogue.ni4os.eu/?_=/resources/d23307b1-832b-435f-b86e-54673045d4bf Gaussian API], two fully on-boarded thematic services present in the NI4OS catalog.
 +
 +
<span id="setup"></span>
 +
== Setup ==
 +
 +
<span id="nginx-configuration"></span>
 +
=== NGINX Configuration ===
 +
 +
The NGINX logging format is flexible and can be configured directly from the main configuration file (<code>nginx.conf</code>). In cases where multiple virtual hosts are associated with a given NGINX installation, minor changes are required, so each log message contains the name of the virtual host handling the request. This would allow distinction between requests made towards multiple services fronted by the same reverse proxy. The NGINX configuration steps are presented below.
 +
 +
# Edit the main NGINX configuration file, by default located in <code>/etc/nginx/nginx.conf</code> and add/alter the following lines within the existing <code>http</code> section:
 +
 +
<pre>http {
 +
        log_format vhosts '$host $remote_addr - $remote_user [$time_local] &quot;$request&quot; $status $body_bytes_sent &quot;$http_referer&quot; &quot;$http_user_agent&quot;';
 +
        access_log /var/log/nginx/access.log vhosts;
 +
        error_log /var/log/nginx/error.log;
 +
}</pre>
 +
<ol start="2">
 +
<li>Alter the configuration files of all virtual hosts, ensuring that if the <code>access_log</code> and <code>error_log</code> directives are present, they are used as (within the <code>server</code> section):</li></ol>
 +
 +
<pre>        access_log /var/log/nginx/access.log vhosts;
 +
        error_log /var/log/nginx/error.log;</pre>
 +
<ol start="3">
 +
<li>Restart the NGINX service so that the changes take effect:</li></ol>
 +
 +
<pre>systemctl restart nginx</pre>
 +
<span id="sample-log-output"></span>
 +
== Sample Log Output ==
 +
 +
The logs which will be outputted by the configuration above will have the following format:
 +
 +
<pre>app1.example.com 127.0.0.1 - - [16/Apr/2022:12:26:08 +0200] &quot;GET /swagger-ui/index.html?configUrl=/api-docs/swagger-config HTTP/1.1&quot; 200 1456 &quot;-&quot; &quot;check_http/v2.3.3 (nagios-plugins 2.3.3)&quot;
 +
app2.example.com 127.0.0.1 - - [16/Apr/2022:12:26:10 +0200] &quot;GET / HTTP/1.1&quot; 302 0 &quot;-&quot; &quot;check_http/v2.3.3 (nagios-plugins 2.3.3)&quot;</pre>
 +
The consistent structure of the logs will make it easy to parse them programmatically and extract the necessary parameters as required by the NI4OS accounting API. For example, Schrödinger API and Gaussian API services fall into the category of WebScience Accounting, and based on the API documentation (https://accounting.ni4os.eu/apidoc), the appropriate parameters and POST body should be created.
 +
 +
<span id="parsing-the-logs"></span>
 +
== Parsing the Logs ==
 +
 +
An example Python script for parsing the generated logs is available below (depends on the <code>requests</code> Python library):
 +
 +
<pre class="language-python">import re
 +
import pprint
 +
from urllib.parse import urlparse
 +
from datetime import datetime
 +
import requests
 +
import logging
 +
 +
def publish_data(data):
 +
    response = requests.post('https://accounting.ni4os.eu/api/accounting/webscience', json=[data],
 +
                            headers={'resourcekey': 'MY_RESOURCE_KEY'})
 +
    if response.status_code == 200:
 +
        logging.info('Successfully published data.')
 +
    else:
 +
        logging.error(f'Failed sending usage data {response.status_code}')
 +
 +
# Endpoints for which we would like to obtain accounting data
 +
ENDPOINTS = [
 +
    '/endpoint1',
 +
    '/endpoint2',
 +
    '/endpoint3',
 +
]
 +
 +
# Relevant HTTP verbs for the invocation of the above endpoints
 +
VERBS = ['GET', 'POST']
 +
 +
# Relevant hostnames of virtual hosts for which we would like to obtain accounting data
 +
VHOSTS = ['app1.example.com',
 +
          'app2.example.com']
 +
 +
# The expected accounting data format by the Accounting API
 +
APP1_STATS = {
 +
    'application_name': 'APP1',
 +
    'workflows': 0,
 +
    'downloads': 0,
 +
    'uploads': 0,
 +
    'users': 0,
 +
    'record_date': datetime.strftime(datetime.today(), '%Y-%m-%d')
 +
}
 +
 +
# The expected accounting data format by the Accounting API
 +
APP2_STATS = {
 +
    'application_name': 'APP2',
 +
    'workflows': 0,
 +
    'downloads': 0,
 +
    'uploads': 0,
 +
    'users': 0,
 +
    'record_date': datetime.strftime(datetime.today(), '%Y-%m-%d')
 +
}
 +
 +
# Setup sets for counting the number of unique users to a given service in a given time frame
 +
app1_unique_users = set()
 +
app2_unique_users = set()
 +
 +
# Setup the expected log format, as generated by NGINX
 +
conf = '$vhost $remote_addr - $remote_user [$time_local] &quot;$verb $request $http_version&quot; $status $body_bytes_sent &quot;$http_referer&quot; &quot;$http_user_agent&quot;'
 +
regex = ''.join(
 +
    '(?P&lt;' + g + '&gt;.*?)' if g else re.escape(c)
 +
    for g, c in re.findall(r'\$(\w+)|(.)', conf))
 +
 +
# Open the NGINX access log for reading
 +
with open('/var/log/nginx/access.log', 'r') as f:
 +
    for line in f:
 +
        request_data = re.match(regex, line)
 +
        if request_data:
 +
            request_status = request_data['status']
 +
            request_file = urlparse(request_data['request']).path
 +
            request_vhost = request_data['vhost']
 +
            request_time = datetime.strptime(request_data['time_local'], '%d/%b/%Y:%H:%M:%S %z')
 +
            request_verb = request_data['verb']
 +
            request_ip = request_data['remote_addr']
 +
            # Check whether the request is made towards a relevant endpoint, is in the desired time range, and is using an accepted HTTP verb
 +
            if request_file in ENDPOINTS and (
 +
                    datetime.today().date() == request_time.date()) and request_status == '200' and request_vhost in VHOSTS and request_verb in VERBS:
 +
                # Determine the host of the request. The accounting logic is specific to the given service (e.g., whether data uploading/downloading is supported at all)
 +
                if request_vhost == 'app1.example.com':
 +
                    APP1_STATS['workflows'] += 1
 +
                    app1_unique_users.add(request_ip)
 +
                elif request_vhost == 'app2.example.com':
 +
                    APP2_STATS['workflows'] += 1
 +
                    APP2_STATS['downloads'] += 1
 +
                    APP2_STATS['uploads'] += 1
 +
                    app2_unique_users.add(request_ip)
 +
 +
APP1_STATS['users'] = len(app1_unique_users)
 +
APP2_STATS['users'] = len(app2_unique_users)
 +
 +
publish_data(APP1_STATS)
 +
publish_data(APP2_STATS)</pre>
 +
The above script is heavily commented and iterates through each line of the NGINX access log, checks whether the logged request has been generated today, and fills the necessary accounting information according to the behavior of the service itself. In case the NGINX logging format is different, the content of the <code>conf</code> variable can simply be rearranged.
 +
 +
'''Important:''' The obtained resource key from the NI4OS Accounting portal should be put in place of the <code>MY_RESOURCE_KEY</code> string, in the HTTP request headers configuration.
 +
 +
<span id="creating-a-python-virtual-environment"></span>
 +
=== Creating a Python Virtual Environment ===
 +
 +
Since the Python script depends on the <code>requests</code> library, it is recommended to use a dedicated Python virtual environment. This can be set up using the following commands:
 +
 +
<pre class="language-bash">apt update
 +
apt install virtualenv
 +
virtualenv -p python3 /opt/accounting/venv
 +
source /opt/accounting/venv/bin/activate
 +
pip install requests</pre>
 +
<span id="scheduling"></span>
 +
== Scheduling ==
 +
 +
The script can be scheduled for execution each day at <code>23:59</code> to report accounting data. It is important to execute the script as the web server user or create a new user which would be added to the web server group, in order to avoid any permission issues when accessing the web server logs. For example, this can be accomplished by creating a dedicated service account user which would execute the script:
 +
 +
<pre class="language-bash">useradd -r -M -s /usr/sbin/nologin log-parser
 +
usermod -a -G adm log-parser
 +
usermod -a -G www-data log-parser</pre>
 +
It is important to assign the correct permissions to the previously generated Python virtual environment:
 +
 +
<pre class="language-bash">chown -R log-parser:log-parser /opt/accouting</pre>
 +
Scheduling the script can be done in the following manner:
 +
 +
<pre class="language-bash">sudo -u log-parser crontab -e # edit the crontab file for the log-parser user</pre>
 +
Within the crontab file for the <code>log-parser</code> user, the following line can be added:
 +
 +
<pre>59 23 * * * /opt/accounting/venv/bin/python3 /opt/accounting/log-parser.py</pre>
 +
The <code>/opt/accounting/log-parser.py</code> path denotes the full path where the accounting script is stored on the file system.
 +
 +
 +
</div>

Latest revision as of 12:39, 1 February 2023

Accounting Service Integration

1. Login to the accounting system (create an account if you don’t have one yet) https://accounting.ni4os.eu/

2. Contact the accounting admin so he can give you access to the resource key of the service you wish to post data for (your resources and service will also be added if not yet present). Admin contact information: svetlozar@parallel.bas.bg

3. Once approved by the admin, you can check your resource key from the Resources tab in the accounting service by clicking the config button next to your resource name.

4. The main way to POST data is by using the API. Details can be found on the API tab of the page of the accounting site. Cloud, Storage, Repository, and WebScience data can also be added manually directly via the site. There are also example scripts under the API Clients tab to help with the data posting.

NI4OS Login

Accounting Login

You will be able to login using your NI4OS credentials by using the NI4OS Login button. In order to be able to use the service, you also need to be a member of a virtual organization. If you attempt to use the service without such membership, you will get an error message with instructions on how to join a VO. Once logged, you will be redirected to the accounting dashboard where you can start browsing the accounting data.

Getting Resource Key

Getting Resource Key

Once you have an account and you have been approved by the accounting admin, you can get your resource key by going to RESOURCES tab and choosing the appropriate resource type, for instance choosing HPC. Then you can see a config button on your HPC system row. Once you click it you can see the resource key. This key will later be used to upload data to the accounting platform.

Submitting HPC Data

There are two ways to submit your data to the accounting service: using the REST API or deploying the accounting clients that are provided for you. The clients will periodically submit data to the accounting service automatically.

Using REST API

Adding Storage Data

You can post data by making a post request using the parameters available at https://accounting.ni4os.eu/apidoc. You will need your resource key and the post body needs to be an array of JSONs in the same format given in the API page. Date format is 'yyyy-mm-dd'. Post body must always be an array. API responds with True(200) for success or False(400) for error.

Using The Clients

You can also use the accounting clients at https://accounting.ni4os.eu/apiclients. The installation steps for each of those are shown on the same page.

Submitting Cloud and Storage Data

You can submit Cloud or Storage data by going to RESOURCES tab and choosing cloud or storage and then clicking on the add data button on the row with the name of your resource.

Using the Accounting Service

Accounting Dashboard

Accounting Data

Browsing the accounting data can be done from the accounting dashboard https://accounting.ni4os.eu/dashboard. Once in the dashboard the user can choose what kind of data to search, the period and the way the tables will be formatted (for example have the table with country for rows and date for clumns), after pressing the show button the data will be displayed. All the information can be grouped by country, date, year, resource name, research community and application. The information displayed in the site is monthly. Also the data in the tables can be filtered and sorted.

Resources and Application Tabs

In the Resources tab the user can see each resource, its partner and country. And in the Applications tab the user can choose the group of the applications to see, their short name and their long name.

Accounting automation

Introduction

The NI4OS accounting system collects, analyzes, and provides reports about the usage of services listed on the NI4OS marketplace.

The NI4OS accounting system integration includes a login procedure, resource key generation, and choosing a method for data submission. More information about these procedures can be found on the official NI4OS accounting wiki page (https://wiki.ni4os.eu/index.php/Accounting).

The NI4OS Accounting system currently supports two methods for data submission:

In this document we will be discussing the second option, analyzing and submitting the accounting data from a number of services, without requiring changes to their source code.

Scenario

To provide a uniform approach for obtaining accounting data, no matter the nature of the underlying service, the HTTP request/response logs produced by the reverse-proxy facilitating the connection to the server running the service can be analyzed. Once the necessary data is extracted, it can be submitted to the NI4OS Accounting API by making a post request using the parameters available at https://accounting.ni4os.eu/apidoc.

A sample scenario that analyzes the access logs generated by the popular NGINX web server configured in a reverse-proxy mode is described in the subsections below. The benefits of this approach are that it is possible to acquire and report accounting information for multiple services at once, as long as they are all fronted by the same reverse-proxy. This scenario has been validated and is in active use by the Schrödinger API and Gaussian API, two fully on-boarded thematic services present in the NI4OS catalog.

Setup

NGINX Configuration

The NGINX logging format is flexible and can be configured directly from the main configuration file (nginx.conf). In cases where multiple virtual hosts are associated with a given NGINX installation, minor changes are required, so each log message contains the name of the virtual host handling the request. This would allow distinction between requests made towards multiple services fronted by the same reverse proxy. The NGINX configuration steps are presented below.

  1. Edit the main NGINX configuration file, by default located in /etc/nginx/nginx.conf and add/alter the following lines within the existing http section:
http {
        log_format vhosts '$host $remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"';
        access_log /var/log/nginx/access.log vhosts;
        error_log /var/log/nginx/error.log;
}
  1. Alter the configuration files of all virtual hosts, ensuring that if the access_log and error_log directives are present, they are used as (within the server section):
        access_log /var/log/nginx/access.log vhosts;
        error_log /var/log/nginx/error.log;
  1. Restart the NGINX service so that the changes take effect:
systemctl restart nginx

Sample Log Output

The logs which will be outputted by the configuration above will have the following format:

app1.example.com 127.0.0.1 - - [16/Apr/2022:12:26:08 +0200] "GET /swagger-ui/index.html?configUrl=/api-docs/swagger-config HTTP/1.1" 200 1456 "-" "check_http/v2.3.3 (nagios-plugins 2.3.3)"
app2.example.com 127.0.0.1 - - [16/Apr/2022:12:26:10 +0200] "GET / HTTP/1.1" 302 0 "-" "check_http/v2.3.3 (nagios-plugins 2.3.3)"

The consistent structure of the logs will make it easy to parse them programmatically and extract the necessary parameters as required by the NI4OS accounting API. For example, Schrödinger API and Gaussian API services fall into the category of WebScience Accounting, and based on the API documentation (https://accounting.ni4os.eu/apidoc), the appropriate parameters and POST body should be created.

Parsing the Logs

An example Python script for parsing the generated logs is available below (depends on the requests Python library):

import re
import pprint
from urllib.parse import urlparse
from datetime import datetime
import requests
import logging

def publish_data(data):
    response = requests.post('https://accounting.ni4os.eu/api/accounting/webscience', json=[data],
                             headers={'resourcekey': 'MY_RESOURCE_KEY'})
    if response.status_code == 200:
        logging.info('Successfully published data.')
    else:
        logging.error(f'Failed sending usage data {response.status_code}')

# Endpoints for which we would like to obtain accounting data
ENDPOINTS = [
    '/endpoint1',
    '/endpoint2',
    '/endpoint3',
]

# Relevant HTTP verbs for the invocation of the above endpoints
VERBS = ['GET', 'POST']

# Relevant hostnames of virtual hosts for which we would like to obtain accounting data
VHOSTS = ['app1.example.com',
          'app2.example.com']

# The expected accounting data format by the Accounting API
APP1_STATS = {
    'application_name': 'APP1',
    'workflows': 0,
    'downloads': 0,
    'uploads': 0,
    'users': 0,
    'record_date': datetime.strftime(datetime.today(), '%Y-%m-%d')
}

# The expected accounting data format by the Accounting API
APP2_STATS = {
    'application_name': 'APP2',
    'workflows': 0,
    'downloads': 0,
    'uploads': 0,
    'users': 0,
    'record_date': datetime.strftime(datetime.today(), '%Y-%m-%d')
}

# Setup sets for counting the number of unique users to a given service in a given time frame
app1_unique_users = set()
app2_unique_users = set()

# Setup the expected log format, as generated by NGINX
conf = '$vhost $remote_addr - $remote_user [$time_local] "$verb $request $http_version" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
regex = ''.join(
    '(?P<' + g + '>.*?)' if g else re.escape(c)
    for g, c in re.findall(r'\$(\w+)|(.)', conf))

# Open the NGINX access log for reading
with open('/var/log/nginx/access.log', 'r') as f:
    for line in f:
        request_data = re.match(regex, line)
        if request_data:
            request_status = request_data['status']
            request_file = urlparse(request_data['request']).path
            request_vhost = request_data['vhost']
            request_time = datetime.strptime(request_data['time_local'], '%d/%b/%Y:%H:%M:%S %z')
            request_verb = request_data['verb']
            request_ip = request_data['remote_addr']
            # Check whether the request is made towards a relevant endpoint, is in the desired time range, and is using an accepted HTTP verb
            if request_file in ENDPOINTS and (
                    datetime.today().date() == request_time.date()) and request_status == '200' and request_vhost in VHOSTS and request_verb in VERBS:
                # Determine the host of the request. The accounting logic is specific to the given service (e.g., whether data uploading/downloading is supported at all)
                if request_vhost == 'app1.example.com':
                    APP1_STATS['workflows'] += 1
                    app1_unique_users.add(request_ip)
                elif request_vhost == 'app2.example.com':
                    APP2_STATS['workflows'] += 1
                    APP2_STATS['downloads'] += 1
                    APP2_STATS['uploads'] += 1
                    app2_unique_users.add(request_ip)

APP1_STATS['users'] = len(app1_unique_users)
APP2_STATS['users'] = len(app2_unique_users)

publish_data(APP1_STATS)
publish_data(APP2_STATS)

The above script is heavily commented and iterates through each line of the NGINX access log, checks whether the logged request has been generated today, and fills the necessary accounting information according to the behavior of the service itself. In case the NGINX logging format is different, the content of the conf variable can simply be rearranged.

Important: The obtained resource key from the NI4OS Accounting portal should be put in place of the MY_RESOURCE_KEY string, in the HTTP request headers configuration.

Creating a Python Virtual Environment

Since the Python script depends on the requests library, it is recommended to use a dedicated Python virtual environment. This can be set up using the following commands:

apt update
apt install virtualenv
virtualenv -p python3 /opt/accounting/venv
source /opt/accounting/venv/bin/activate
pip install requests

Scheduling

The script can be scheduled for execution each day at 23:59 to report accounting data. It is important to execute the script as the web server user or create a new user which would be added to the web server group, in order to avoid any permission issues when accessing the web server logs. For example, this can be accomplished by creating a dedicated service account user which would execute the script:

useradd -r -M -s /usr/sbin/nologin log-parser
usermod -a -G adm log-parser
usermod -a -G www-data log-parser

It is important to assign the correct permissions to the previously generated Python virtual environment:

chown -R log-parser:log-parser /opt/accouting

Scheduling the script can be done in the following manner:

sudo -u log-parser crontab -e # edit the crontab file for the log-parser user

Within the crontab file for the log-parser user, the following line can be added:

59 23 * * * /opt/accounting/venv/bin/python3 /opt/accounting/log-parser.py

The /opt/accounting/log-parser.py path denotes the full path where the accounting script is stored on the file system.