A new Cerebrum audit log

Design suggestion of a new Cerebrum audit log.

Background

The current changelog (Cerebrum.modules.ChangeLog) is not an adequate audit log.

  • It deletes records
  • It cannot store records for deleted objects
  • It cannot store records for certain events (e.g. entity join)

Requirements

We need an audit log that:

  1. Does not delete records of actual changes
  2. Keeps altered values, for context. E.g. a change in a conceptual object {'id': 1, 'foo': 'bar'} -> {'id': 1, 'foo='baz'} should log that a change was done to the 'foo' attribute of an object with 'id=1', and that the old value was 'bar'. The new value can be fetched from the current value, or from subsequent log records.
  3. Can clear old values. When an object is deleted, we may want to purge parameters that may contain personal info, but keep the log record itself.

Design

Schema suggestion

Column Type Modifiers Comment
tstamp timestamp with time zone not null default now() Add timezone and use datetime + pytz?
change_id numeric(12,0) not null  
change_type_id numeric(6,0) not null Or replace with regular string constants?
entity_id numeric(12,0) not null Was subject_entity, no constraint
change_by_id numeric(12,0) not null Was change_by, no constraint
target_id numeric(12,0)   Was dest_entity, no constraint TODO: Or is this meta?
meta text (or json?) not null A new serialized JSON object, with persistent metadata.
params text (or json?)   Was change_params, contains the changed values, and may be deleted after some time
_id fields

All the entity ID fields works as before, but without database constraints. These IDs are primarily used for lookups, and should remain in indexed columns for that reason.

This allows us to keep changelog records, event after the relevant entities are deleted.

change_by_id
The change_by_id column is now mandatory. When the database belongs to a script, service, etc..., the change_by_id may be set to the entity_id of cereconf.INITIAL_ACCOUNTNAME
meta column

The meta column contains a JSON-serialized dict, with names and other identificators that should be used e.g. when formatting a change entry. The dict should always contain:

{
    "change": str(change_type),
    "change_category": str(change_type.category),
    "change_type": str(change_type.type),
    "change_by_name": "<username>",
    "entity_name": "<username|person ?|ou sko?|group name>",
    "entity_type": str(entity.entity_type),
    "change_program": __file__,
}

And would contain similar data for target_id.

params column

The change_params should always contain the changed values (i.e. the old values):

  • On create/add, nothing is needed in change_params
  • On update, all previous values of modified columns should be contained in change_params
  • On delete, all relevant columns should be contained in change_params
json fields
The JSON fields should be TEXT, or possibly

DBAL

When writing data rows to or fetching data rows from the change-log, each row should be turned into a Change object. This should look somewhat like Cerebrum.modules.event_publisher.event and Cerebrum.modules.event_publisher.eventdb.

Entity merge

One of the requirements of this new changelog is the ability to merge entities. This mainly applies to person entities, but other entities could also be affected.

Given two entities, A and B, where we want to merge B into A:

A merge could be performed as follows:

  1. Entity data (names, contact info) from B is added to A, causing all the relevant events to be issued.
  2. Owned entities by B gets A as a new owner - this probably requires a new set of change_type constants (e.g. category move?)
  3. Entity data is deleted from B, causing all the relevant events to be logged.
  4. A new change_type, (e.g. category join) is implemented, with A as its entity_id, and B as its target_id. Alternatively, this change could be logged on both entities.
  5. The B entity is deleted, causing all the relevant changes to be logged.

There are some more challenges to solve, e.g. what should happen when the objects have a different value for the same attribute? This is probably solved somehow in join_persons.py, but probably deserves some thought and reconsideration.

Formatters

Formatting of change log records are implemented multiple times in Cerebrum, all with a differing number of implementations (per bofhd_commands class).

We would need to re-implement formatters for the new change log, which could be imported and used by the relevant bofhd commands.

Compatibility

A lot of code relies on the change log. This would need to be adapted to the new change log.

  • Changes may exist for entities that no longer exists -- these would need to be filtered out.
  • Columns have changed names, change_params contains different data.

Another option would be to keep the old change_log and have them work in parallell until the above changes have been done everywhere.

TBD

  • Use postgresql json type?
  • Discontinue use of the CLConstants in audit log?
  • Use timezone data in time stamps?
  • Model -> should we keep a shadow copy of entity_name and _code tables, rather than including it in every record?
Av fhl
Publisert 29. juni 2018 10:58