Regex security model

The Regex security model implements text data validation based on statically defined regular expressions.

A PSL file containing a description of the Regex security model is located in the KasperskyOS SDK at the following path:

toolchain/include/nk/regex.psl

Regex security model object

The regex.psl file contains a declaration that creates a Regex security model object named re. Consequently, inclusion of the regex.psl file into the solution security policy description will create a Regex security model object by default.

A Regex security model object does not have any parameters.

A Regex security model object can be covered by a security audit. In this case, you also need to define the audit conditions specific to the Regex security model. To do so, use the following constructs in the audit configuration description:

It is necessary to create additional objects of the Regex security model in the following cases:

Regex security model methods

The Regex security model contains the following expressions:

Syntax of regular expressions of the Regex security model

A regular expression for the match method of the Regex security model can be written in two ways: within the multi-line regex block or as a text literal.

When writing a regular expression as a text literal, all backslash instances must be doubled.

For example, the following two regular expressions are identical:

// Regular expression within the multi-line regex block

{ pattern:

```regex

Hello\ world\!

```

, text: "Hello world!"

}

// Regular expression as a text literal (doubled backslash)

{ pattern: "Hello\\ world\\!"

, text: "Hello world!"

}

Regular expressions for the select method of the Regex security model are written as text literals with a double backslash.

A regular expression is defined as a template string and may contain the following:

Regular expressions are case sensitive.

Literals and metacharacters in regular expressions

White-space characters in regular expressions

Definition of a character based on its octal or hexadecimal code in regular expressions

Sets of characters in regular expressions

A character set is defined within square brackets [] as a list or range of characters. A character set tells the regular expression interpreter that only one of the characters listed in the set or range of characters can be at this specific location in a sequence of characters. A character set cannot be left blank.

The BracketSpec character set can be listed explicitly or can be defined as a range of characters. When defining a range of characters, the first and last character in the set must be separated with a hyphen.

The ASCII code for the upper boundary character of a range must be higher than the ASCII code for the lower boundary character of the range.

For example, the regular expressions [5-2] or [z-a] are invalid.

The hyphen (minus) - character is interpreted as a special character only within a set of characters. Outside of a character set, a hyphen is a literal. For this reason, the \ metacharacter does not have to precede a hyphen. To use a hyphen as a literal within a character set, it must be indicated first or last in the set.

Examples:

The regular expressions [-az] and [az-] correspond to the characters a, z and -.

The regular expression [a-z] corresponds to any of the 26 English letters from a to z in lowercase.

The regular expression [-a-z] corresponds to any of the 26 English letters from a to z in lowercase and -.

The circumflex (caret character) ^ is interpreted as a special character only within a character set when it is located directly after an opening square bracket. Outside of a character set, a circumflex is a literal. For this reason, the \ metacharacter does not have to precede a circumflex. To use a circumflex as a literal within a character set, it must be indicated in a location other than first in the set.

Examples:

The regular expression [0^9] correspond to the characters 0, 9 and ^.

The regular expression [^09] corresponds to any character except 0 and 9.

Within a character set, the metacharacters *.&|!?+ lose their special meaning and are instead interpreted as literals. Therefore, they do not have to be preceded by the \ metacharacter. The backslash \ retains its special meaning within a character set.

For example, the regular expressions [a.] and [a\.] are identical and correspond to the character a and a dot interpreted as a literal.

Groups of characters and operators in regular expressions

A character group uses parentheses () to distinguish its portion (subexpression) within a regular expression. Groups are normally used to allocate subexpressions as operands. Groups can be embedded into each other.

Operators are applied to more than one character in a regular expression only if they are immediately before or after the definition of a set or group of characters. If this is the case, the operator is applied to the entire group or set of characters.

The syntax contains definitions of the following operators (listed in descending order of their priority):

Page top