.. py:currentmodule:: rule_engine Getting Started =============== The Rule Engine is meant to allow developers to filter arbitrary Python objects with a "rule" specified either by them or by an end user. The "rules" that the Rule Engine uses are Python string expressions in a custom language. The syntax that Rule Engine uses is similar to Python but borrows some features from Ruby. The rules are a custom language and no Python ``exec`` or ``eval`` operations are used, allowing developers to safely and securely evaluate rule expressions provided by potentially untrusted sources. Basic Usage ----------- #. The developer needs to identify data that they would like to be filtered. This would be some kind of object with a set of variable attributes. The rest of the usage example will assume that these objects are comic books. * Comic books have various attributes that could be useful for filtering including: +-----------+-----------------------+-----------------------------------+ | Attribute | Python Type | Rule Engine Type | +-----------+-----------------------+-----------------------------------+ | title | ``str`` | :py:attr:`~.DataType.STRING` | +-----------+-----------------------+-----------------------------------+ | publisher | ``str`` | :py:attr:`~.DataType.STRING` | +-----------+-----------------------+-----------------------------------+ | issue | ``int`` | :py:attr:`~.DataType.FLOAT` | +-----------+-----------------------+-----------------------------------+ | released | ``datetime.date`` | :py:attr:`~.DataType.DATETIME` | +-----------+-----------------------+-----------------------------------+ * An example comic book collection might look like: .. code-block:: python comics = [ { 'title': 'Batman', 'publisher': 'DC', 'issue': 89, 'released': datetime.date(2020, 4, 28) }, { 'title': 'Flash', 'publisher': 'DC', 'issue': 753, 'released': datetime.date(2020, 5, 5) }, { 'title': 'Captain Marvel', 'publisher': 'Marvel', 'issue': 18, 'released': datetime.date(2020, 5, 6) } ] #. Now the developer needs to create a rule object to match the target objects. The attributes of the objects will automatically become valid symbols for the rule expression. Creating a rule object is done by initializing an instance of the :py:class:`~engine.Rule` class which requires one argument, and that is the string expression (in Rule Engine syntax) of the rule. * In the case of the comic book collection, these symbols would be: ``title``, ``publisher``, ``issue``, and ``released``. Notice that these attribute names are also valid symbol names, i.e. they start with a letter and contain no whitespace or punctuation. Just like in Python, Rule Engine symbols must follow these rules. For example, ``released`` is a valid symbol while ``Released Date`` is not (because of the space). * A simple rule for the comic book collection which matches the ``publisher`` symbol to the string ``"DC"`` might look like: .. code-block:: python rule = rule_engine.Rule( # match books published by DC 'publisher == "DC"' ) * Rules can contain more complex expressions such as datetime literals and conditionals. .. code-block:: python rule = rule_engine.Rule( # match DC books released in May 2020 'released >= d"2020-05-01" and released < d"2020-06-01" and publisher == "DC"' ) Notice that the datetime expression is a string, prefixed with ``d`` in ``YYYY-MM-DD HH:mm:SS`` format. If the time portion is omitted, it will be normalized to ``00:00:00`` (midnight, zero minutes, zero seconds). See the :ref:`Literal Values` section for more information. * Certain datatypes also have :ref:`attributes` that can be accessed with the dot (``.``) operator. .. code-block:: python rule = rule_engine.Rule( # normalize potential variations in the publisher case such as 'Dc' 'publisher.as_upper == "DC"' ) * Rules can also match strings using regular expressions. When using this type of comparison, the string on the right hand side of the operator is the regular expression, while the left is the string to compare it with. .. code-block:: python rule = rule_engine.Rule( # match books with a title starting with 'Captain ' 'title =~ "Captain\s\S+"' ) #. Once the rule object has been defined, it can be applied to target object(s). Two primary methods are available for applying the rule to the target objects. Those methods are: * :py:meth:`~engine.Rule.matches` -- This method will determine whether the rule matches a single target object, returning ``True`` or ``False``. * :py:meth:`~engine.Rule.filter` -- This method will filter an iterable of target objects, yielding ones for which the rule matches. * Applying the rule to the comic book collection using each of the two methods might look like: .. code-block:: python # check if the first object matches rule.matches(comics[0]) # => True # filter the iterable "comics" and return matching objects rule.filter(comics) # => Attribute-Backed Objects ^^^^^^^^^^^^^^^^^^^^^^^^ In the previous example, the target objects were Python dictionaries. The keys in the dictionary were used as symbols and while this is the default behavior it can be modified to use object attributes instead. This would be necessary if the target objects had variable attributes (like a Python class object) instead of variable items (like a Python dictionary object). * An example comic book collection using an object-based attribute-backed data structure might look like: .. code-block:: python class Comic(object): def __init__(self, title, publisher, issue, released) self.title = title self.publisher = publisher self.issue = issue self.released = released comics = [ Comic('Batman', 'DC', 89, datetime.date(2020, 4, 28)), Comic('Flash', 'DC', 753, datetime.date(2020, 4, 28)), Comic('Captain Marvel', 'Marvel', 18, datetime.date(2020, 5, 6)) ] To resolve symbols from attributes, a custom :py:class:`~engine.Context` object needs to be defined. This object is used for configuration of Rule behavior, one setting of which is the resolver to use. The resolver defines how a rule looks up symbols to their values for comparison given a target object. The following resolver functions are included in Rule Engine: * :py:func:`~engine.resolve_attribute` -- Resolve symbols by looking them up as attributes on an object. * :py:func:`~engine.resolve_item` -- **(Default)** Resolve symbols by looking them up as keys on a dictionary (or dictionary-like) object. To change the resolver, create a :py:class:`~engine.Context` object, and specify the *resolver* function as a keyword argument. .. code-block:: python # define the custom context to set the resolver context = rule_engine.Context(resolver=rule_engine.resolve_attribute) # then define a rule using the custom context rule = rule_engine.Rule('publisher == "DC"', context=context) Once the rule has been defined with the custom context, it can be used in the same way as a rule with a default context. The context object can be shared with other rule objects that are to be applied on the same objects. The context object should not be shared with rule object that are applied to other objects which do not have the same attributes (like artists). Advanced Usage -------------- The Rule Engine has a number of advanced features that contribute to its flexibility. In most use cases they are unnecessary. Setting A Default Value ^^^^^^^^^^^^^^^^^^^^^^^ By default, :py:class:`engine.Rule` will raise a :py:class:`~errors.SymbolResolutionError` for invalid symbols. In some cases, it may be desirable to change the way in which the language behaves to instead treat unknown symbols with a default value (most often ``None`` / :py:attr:`~.DataType.NULL` is used for this purpose, but any value of a supported type can be used). To change this behavior, set the *default_value* parameter when initializing the :py:class:`~engine.Context` instance. .. code-block:: python # this fails because title is not defined and there is no default_value rule_engine.Rule('title').matches({}) # => SymbolResolutionError: title context = rule_engine.Context(default_value=None) # this evaluates successfully to False because title is null (from the default value) rule_engine.Rule('title', context=context).matches({}) # => False # this evaluates successfully to True because title is a non-empty string rule_engine.Rule('title', context=context).matches({'title': 'Batman'}) # => True Custom Resolvers ^^^^^^^^^^^^^^^^ Rule Engine includes resolvers for accessing attributes :py:func:`as keys` on objects (such as dictionaries) and one for resolving symbols :py:func:`as attributes` on objects. If for some reason, neither of those are suitable for the target object then a custom one can be defined and used. The custom resolver should use the signature ``resolver(thing, name)`` where *thing* is the arbitrary object that the rule is being applied to and *name* is the symbol name as a Python string of the attribute that is to be accessed. If the resolver function fails for any reason, it should raise a :py:class:`~errors.SymbolResolutionError`, forwarding *thing* in a keyword argument. This ensures consistency in how exceptions are raised and handled by the engine. Suggestions """"""""""" When raising a :py:class:`~errors.SymbolResolutionError`, a custom resolver can optionally make a suggestion for a valid symbol name. In this case, the resolver may use the :py:class:`~suggestions.suggest_symbol` function, passing it the invalid name and a list of valid names. The result may then be passed as the *suggestion* keyword. This suggestion may then assist rule authors in correcting mistakes. Type Hinting ^^^^^^^^^^^^ Symbol type information can be provided to the :py:class:`~engine.Rule` through the :py:class:`~engine.Context` instance and will be used for compatibility testing. With type information, the engine will raise an :py:class:`~errors.EvaluationError` when an incompatible operation is detected such as a regex match (``=~``) using an integer on either side. This makes it possible to detect errors in a rule's syntax prior to it being applied to an object. When symbol type information is specified, the value resolved from a symbol and object must either match the specified type or be :py:attr:`~ast.NULL`, otherwise a :py:class:`~errors.SymbolTypeError` will be raised when the symbol is resolved. To define type information, a *type_resolver* function must be passed to the :py:class:`~engine.Context` class. The type resolver function is expected to take a single argument, and that is the name of the symbol (as a Python string) whose type needs to be resolved. The return type should be a member of the :py:class:`~ast.DataType` enumeration. .. code-block:: python # define a basic type resolver, that knows about the four attributes of a # comic book def type_resolver(name): if name == 'title': return rule_engine.DataType.STRING elif name == 'publisher': return rule_engine.DataType.STRING elif name == 'issue': return rule_engine.DataType.FLOAT elif name == 'released': return rule_engine.DataType.DATETIME # if the name is none of those, raise a SymbolResolutionError raise rule_engine.errors.SymbolResolutionError(name) context = rule_engine.Context(type_resolver=type_resolver) :py:attr:`~.DataType.UNDEFINED` can be defined as the data type for a valid symbol without specifying explicit type information. In this case, the rule object will know that it is a valid symbol, but will not validate any operations that reference it. In all cases, when a *type_resolver* is defined, the :py:class:`~engine.Rule` object will raise a :py:class:`~errors.SymbolResolutionError` if a symbol is referenced in the rule that is not known to the *type_resolver*. .. code-block:: python # this is valid: issue is defined as a valid symbol rule = rule_engine.Rule('issue == 1', context=context) # => # this is invalid: author is not defined as a valid symbol rule = rule_engine.Rule('author == "Stan Lee"', context=context) # => SymbolResolutionError: author # this is valid: no type information is defined (context is omitted) rule = rule_engine.Rule('author == "Stan Lee"') # => .. _getting-started-compound-data-types: Compound Data Types """"""""""""""""""" Compound data types such as the :py:attr:`~.DataType.ARRAY` and :py:attr:`~.DataType.MAPPING` types can optionally specify member type information by calling their respective type. For example, an array of strings would be defined as ``DataType.ARRAY(DataType.STRING)`` while a mapping with string keys and float values would be defined as ``DataType.MAPPING(DataType.STRING, DataType.FLOAT)``. For more information, see the documentation for the :py:attr:`~.DataType.ARRAY`, :py:attr:`~.DataType.MAPPING` functions. Compound member types can only be a single data type. In some cases the data type can optionally be nullable which means that the member value can be either the specified type or :py:attr:`~.DataType.NULL`. For example, a :py:attr:`~.DataType.MAPPING` type whose values are all nullable strings may be defined, while a :py:attr:`~.DataType.MAPPING` type with one value type of a :py:attr:`~.DataType.STRING` and another of a :py:attr:`~.DataType.BOOLEAN` may not be defined. In this case, the key type may be defined while the value type is set to :py:attr:`~.DataType.UNDEFINED` which is the default value. Function Data Types """"""""""""""""""" Like compound types, functions can include type information by calling the respective type, in this case :py:attr:`~.DataType.FUNCTION`. Functions only support positional arguments and not keyword arguments but positional arguments can be defined as optional through the *minimum_arguments* option. For example, the :ref:`builtin split` can be called with as few as 1 arguments and as many as 3 arguments. The first argument is always required, so *minimum_arguments* is set to 1. This means the remaining 2 arguments are optional, however for the third argument to be defined in a function call, the second must also be defined. For the split function, the first argument is the string to split, followed by the seperator string to split on and finally the maximum number of times to split the string. .. code-block:: python rule_engine.DataType.FUNCTION( # the name of the function is provided for error messages 'split', # the return data type, in this case an array of strings return_type=ast.DataType.ARRAY(ast.DataType.STRING), # the data type of each of the three arguments argument_types=( ast.DataType.STRING, # argument 1, the string to split ast.DataType.STRING, # argument 2, the seperator to split on ast.DataType.FLOAT # argument 3, the maximum times to split the string ), # the minimum number of arguments, in this case the second two arguments are optional minimum_arguments=1 ) If the return type, or argument types are not specified, then no type checking is preformed. Defining Types From A Dictionary """""""""""""""""""""""""""""""" For convenience, the :py:func:`~engine.type_resolver_from_dict` function can be used to generate a *type_resolver* function from a dictionary mapping symbol names to their respective :py:class:`~ast.DataType`. Starting with version :release:`2.1.0` if a :py:class:`dict` is passed as the *type_resolver*, the :py:func:`~engine.type_resolver_from_dict` function will be used automatically. .. code-block:: python context = rule_engine.Context( type_resolver=rule_engine.type_resolver_from_dict({ # map symbol names to their data types 'title': rule_engine.DataType.STRING, 'publisher': rule_engine.DataType.STRING, 'issue': rule_engine.DataType.FLOAT, 'released': rule_engine.DataType.DATETIME }) ) Changing Builtin Symbols ^^^^^^^^^^^^^^^^^^^^^^^^ To remove the default :ref:`builtin symbols` that are provided, simply initialize a :py:class:`~rule_engine.builtins.Builtins` instance with a *values* of an empty dictionary. This will remove all builtin values, and the dictionary can optionally be populated with alternative values. To add additional values, use the :py:class:`~rule_engine.builtins.Builtins.from_defaults` constructor, with a *values* dictionary. In this case, *values* will optionally override any of the default settings, and keys which do not overlap will be added in addition to the default builtin symbols. .. code-block:: python class CustomBuiltinsContext(rule_engine.Context): def __init__(self, *args, **kwargs): # call the parent class's __init__ method first to set the # default_timezone attribute super(CustomBuiltinsContext, self).__init__(*args, **kwargs) self.builtins = rule_engine.builtins.Builtins.from_defaults( # expose the $version symbol {'version': rule_engine.__version__}, # use the specified default timezone timezone=self.default_timezone ) Rule Inspection --------------- There are a few techniques that can be used to inspect a rule object. * :py:meth:`~engine.Rule.is_valid` -- This class method can be used to determine if a rule expression is valid. It will return ``False`` if for example there are any syntax errors. * :py:attr:`~engine.Context.symbols` -- Rule objects have a :py:attr:`~engine.Rule.context` attribute, which contains the ``symbols`` attribute. This contains the symbol names which were identified within the rule expression. * :py:meth:`~engine.Rule.to_graphviz` -- This method will create a Graphviz directed-graph of the Rule Engine Abstract Syntax Tree (AST) created by the rule expression. This can be helpful when debugging complex rules. This requires the Python ``graphviz`` package to be available.