For editing, reuse, maintaining and sharing of validation rules it is convenient to define them in text files. In this vignette we demonstrate how rules can be imported from or exported to text files.
The easiest way to define rules is by storing them in a free-form text file. For example, create a file called
# content of ex-1.txt # Some checks on the 'women' dataset. height > 0 weight > 0 mean(height/weight) < 0.5
We can read the rules using
v <- validator(.file="ex-1.txt")
Note that the
# is used to start comment-lines, just like in regular R syntax. Validation rules may span several lines. The only restriction is that rules are stated in valid syntax and can be recognized by validate as validating (basically, it should result in a
To set options and metadata for the rules, the well-known yaml (yaml ain’t markup language) format is used. Yaml is a human-readable way to define (nested) structures. Here is an example of a yaml-based rule definition file.
# content of ex-2.yaml rules: - expr: height > 0 name: height label: height positivity description: | According to the latest research, the average height of American women must be positive. - expr: weight > 0 name: weight label: weight positivity description: | By definition, weight must be positive.
This file can be read in the same way.
v <- validator(.file='rules.yaml')
There are a few things to note here:
yamlindentation must be spaces and not tabs. In the example, there are two spaces of indentation before a key name (
name, and so on). For long rules or pieces of text, you can add a
>after the colon and add a multiline string. You need to start on a new line and add an extra indentation.
!) must be enquoted since the exclamation mark has special meaning in YAML.
The latter remark means that this is wrong:
# READING THIS FILE FAILS rules: - expr: !is.na(height)
and this is ok:
# READING THIS FILE SUCCEEDS rules: - expr: '!is.na(height)'
Rules exported with
export_yaml are enquoted by default.
This is possible. Just separate free-form sections from structured sections with three dashes on a single line.
# content of ex-3.yaml rules: - expr: height > 0 name: height label: height positivity description: | According to the latest research, the average height of American women must be positive. - expr: weight > 0 name: weight label: weight positivity description: | By definition, weight must be positive. --- # free form starts here # we expect the following mean ratio mean(height/weight) < 0.5 # we expect a high correlation cor(height,weight) > 0.99
This can be done at the beginning of your file. Start and end the options section with three dashes on a line to start the (free form or structured) rule section.
--- options: raise: errors --- height > 0
The options you set here will be part of the
validator object, that is created once you read in the file. The options are valid for every confrontation you use this validator for, unless they are overwritten during the call to
This is useful, for example when you have a general rule set, that applies to all your data files, and some rules that apply only in specific cases. The file with specific cases can include one or more rule files in that case. Files should be included in the same section as the options.
--- include: - petes_rules.yaml - nancys_rules.yaml options: raise: errors --- # start rule definitions here
There are two ways to do that. You can either write to a
yaml file immediately as follows
v <- validator(height>0, weight> 0) export_yaml(v,file="my_rules.yaml")
or you can get the
yaml text string using