RePol

From NI4OS wiki
Revision as of 23:13, 2 December 2020 by Branko Marovic (talk | contribs)
Jump to navigation Jump to search

Template:Colored box

Introduction

A trustworthy repository should have a transparent policy, informing users about the roles, responsibilities, rights and procedures aimed at ensuring that their deposited data are preserved and disseminated in line with the FAIR principles. Having a repository policy is required for the onboarding of scientific repositories and data into NI4OS-Europe, OpenAIRE and EOSC. As this is directly linked to the policies governing repositories, the University of Belgrade Computer Centre (RCUB) has developed repository policy generator RePol as a tool to facilitate the making of key choices and drafting of policy documents. Specification and exchange of metadata in repositories have been largely defined in the context the related OpenAIRE, EOSC Enhance, EOSpilot and FAIRsFAIR efforts. At the same time, it was found that along with the lack of awareness that a repository should have a clearly defined policy, deciding on the rules of repositories’ operation and expressing these decisions through comprehensive repository policies is a major problem for their owners and administrators. The developed tool will hopefully help repository owners in the creation of policies that they must publish and in the alignment of provided services with requirements for participation and onboarding in the above-mentioned infrastructures.

Location and support

The baseline implementation of RePol is fully operational and is available at https://repol.ni4os.eu. It is available for evaluation, use and comments by users, which can be sent to repol@rcub.bg.ac.rs.

Addressed requirements

The requirements addressed by RePol include the following:

  • The platform is flexible and it and can customise the output. The tool asks for a few mandatory customisation inputs, several choices, and a few options related to primary choices. Based on these inputs, it creates a corresponding draft repository policy document by tailoring a carefully crafted and redacted template. Questions, inputs, offered choices and values, and provided explanations are customizable and are configured through the management of application configuration files and without any additional programming. The used data model reflects the evolving user-facing content, options and choices. The repository policy documents are generated after the user provides a few mandatory inputs (such as repository and owner name) and makes several explained and selectable choices and options. Some options are nested, i.e., some choices are shown only if choices or options they depend on are selected.
  • Choices and inputs offered to users are clear and well-explained, with reasonable defaults, well-handled mutual dependencies and, where possible, validity checks. Some of the offered values are complemented with an open-ended entry field, e.g., when specifying the thematic areas, types and languages of the repository content.
  • The policy template text is concise, clear and aligned with the current best practice, but also adaptable to specific user needs.
  • The collected data and the key elements of the generated policy are provided in a machine-readable format. This allows for an automated interpretation of created policies and extraction of repository-level metadata for inclusion in registries, catalogues and various operational and data discovery tools. Repository policies will be updated as needed to keep them in line with changes in internal and external rules, requirements, norms and conventions. Machine readability permits to easily save user choices and update or modify a previously created registry policy.
  • Users may additionally edit, customise or reformat the generated policy document using their regular document editing tools, as policy documents are produced in an accessible and editable text format. It is assumed that such edits are minor and that they do not affect the validity of associated machine-readable data.

Key features and design decisions

RePol uses a lightweight online form to guide the user through the process of defining a repository policy. By choosing options in the form, the user chooses sets of predefined policy clauses formulated in line with the current best practice. The resulting policy document may be downloaded, additionally customized, and integrated into your repository.

While the policy template must be able to accommodate occasional extensions and new options, the used software platform has to be flexible enough to handle both current requirements and future adaptations of policies. This is achieved by a modular design, which makes it easy to extend the data entry forms and document templates with new sections, options, values and rules.

The development of the underlying policy template was started by collecting a set of representative repository policies, comparing them against the frequently used OpenDOAR “Minimum” and “Optimum” templates, and combining their most essential elements into an integral form with options and alternatives. The resulting composite document was then updated to accommodate the terminology and conventions adopted in the context of recent infrastructure development in Europe. The entire alternative sub-policies needed to be developed to accommodate Creative Commons licenses or policy-related requirements of the Core Trust Seal (CTS) certification framework. The template has been tailored to produce a lightly formatted and easy to edit and publish HTML format.

The data that needs to be provided to generate policy documents, which are supposed to be publicly available anyway, is not particularly sensitive. However, the RePol tool at this point applies a radical approach to sensitive data protection and privacy. The tool does not persist any user-provided data beyond the users’ web browser session and all data is returned to the user within the generated draft policy document, without any local data saving or logging. Although this slightly reduces the capabilities related to the analysis and profiling of RePol, we hope that the resulting simplicity of access to the tool will overweight the drawbacks, particularly during the initial popularisation of the tool. The user-provided data is handed back to the user by being embedded as hidden metadata within the HTML mark-up of the produced policy document or exported in a standalone XML document, both of which can be imported back into the tool. This allows the user to upload a previously saved policy and update or change it.

Usage

The main purpose of this web application is to generate policies for repositories, but it can be used to generate any other type of document, due to the versatile nature of its configurable forms and FreeMarker templates.

By using triggers and conditions, changes of values in form input elements can make panels (groups of input elements) or input elements themselves appear, disappear or change values, making a form act more like a wizard, assisting the user in selecting appropriate values.

One instance of RePol can have multiple forms and corresponding templates. Data is shared among forms to avoid having to enter it more than once.

Access, data entry and document creation

User can access RePol through a web browser and without any authentication. Once a session is established, it is used to store data from form input elements. The user then selects a form based on what document needs to be generated. Each form shows a progress-bar based on the percentage of mandatory fields filled. Once all of the mandatory fields of a form are filled, a document can be generated.

Editing of generated documents

The generated HTML document is a draft that must be read and edited by the user before it is considered final. Te sections that must be revised are clearly marked in the produced document.

If a user wishes to re-upload and re-use the document in RePol, they should refrain from deleting or altering the portion between tags because that is where the machine-readable data is stored.

Saving all forms

At any time, user can download an XML (standalone) file containing all entered data – the latest values of the input elements, from all of the forms at once. Each generated document also contains its data in a machine-readable format.

Later changes of data and documents

Both standalone XML exports and generated documents can be re-uploaded to the RePol and their data parsed to fill the appropriate input elements in all forms that have them, allowing to update the data or the same data to be re-used in other templates or newer versions of the existing ones. All user-made customisations of the earlier created text will have to be repeated in the new document.

Plans and potential uses

At this point, it is not foreseen that RePol will be localised to any of NI4OS-Europe languages, but the NI4OS-Europe team will evaluate the feasibility of this endeavour, especially as part of the National Open Science Cloud priorities and the activities performed by the networks of EOSC Promoters and NI4OS-Europe Translation Officers.

Updates may be also performed to the Repository Policy Generator depending on feedback and evaluation during current use. Special-purpose policy templates could be developed for some specific needs. Immediately useful would be the extension of the tool to the generation of other policy and repository configuration-related and potentially machine-readable artifacts, including those associated with various registries and validators. RePol will be aligned as needed with requirements and emergent demands to create artifacts for EOSC services in development, such as EOSC OS Monitor and OS Policy Registry. The policy template will also need to be readjusted to evolving norms in expectations for repository and service policies and the service certification context. The data used to create repository policies could be extended and exchanged with the Agora-SP service portfolio management tool used by NI4OS-Europe and EOSC platform. Similar policy templates and data entry forms, based on the same approach, rationale and modular structure, could be developed for generic services. Furthermore, the developed software platform could be applied to other similar applications, such as the creation of Privacy Notice, Terms of Use or Intellectual Property Policy documents, but special care should be taken to avoid duplication.

The technical implementation of RePol is general and quite independent from its current application related to repository policies, it may be adopted by other groups in need of similar specific or generic functionality, which would facilitate its further development. Although RePol has already been designed to be highly configurable, its potential application for other purposes would require a further separation of core code and functionality from the application-specific configuration, templates and branding.

Implementation details

When the user fills in a form and requests the corresponding document, all related user-provided data is validated and conditions evaluated, the compiled data model passed on to the FreeMarker library along with the template, which generates the customised document and passes it for submission to the user’s web browser.

Configuration

RePol is a Java 7 web application (EE Web API 7.0) built upon Java Server Faces 2.2 Framework, with PrimeFaces 7 components. It generates documents using FreeMarker 2.3 library.

Configuration files and their parameters are:

settings.cfg (properties file)
This file contains the following parematers:
  • templatePath – path to the directory containing .ftlh template files
  • authenticationPin – pin used to access the hidden panel for reloading configuration (/faces/dashboard.xhtml)
  • version – version of the RePol instance, accessible from the FreeMarker template (${repol_version})
selection-lists.xml
XML file containing all of the lists of pre-defined values used by some input elements. Each list must have a unique identifier by which it is referenced by the list-id attribute of an input element. Each list element has a human-readable label and an actual value that is being used in the template. Actual values should be selected so that they can be used in sentences.
template-forms.xml
This XML file contains forms, their identifiers, labels and descriptions (HOW TO PUT EACH FORM INTO A SEPARATE FILE?). Each form is comprised of panels. The panel is used to group input elements. Input elements are used to receive user-provided data of the appropriate types. Each form input element must have an id unique within the form. Data is shared between forms by having input elements with the same id. If an input element has a list of pre-defined values, it must also have a list-id attribute.

To set up a working instance of RePol, it is necessary to make an arbitrarily placed directory on the server, place FreeMarker templates in it (*.ftlh) and make them readable for the operating system account used by the application. Before deploying the application, it is also necessary to configure repol.TemplatePath parameter in settings.cfg to point to the template directory and to configure in template-forms.xml the associations between forms and corresponding templates. By accessing a hidden panel, an administrator can reload templates and forms without having to restart the server or re-deploy the entire application.

Forms are interactive. Selecting or entering values can trigger changes in other input elements, make them appear, disappear or change values when pre-configured conditions are met. These conditions are formulated as logical expressions of input element values being equal to, containing a given value or being empty or any of the things above negated. They are configured as trees of XML elements, with the root being assigned a condition identifier (BY WHOM?). Each input element can contain triggers that alter values in other input elements when a specified condition is met. If an input element has a condition specified it is visible only if that condition is met. Conditions are also accessible from the FreeMarker template as Boolean constants, evaluated at the time the document generation is initiated.

Given the complex nature of the form configuration, the application validates it when it is initialised by its container. If it fails to start, the administrator should look at the log file of the servlet container server to figure out what went wrong.