Background
The current changelog (Cerebrum.modules.ChangeLog) is not an adequate audit log.
- It deletes records
- It cannot store records for deleted objects
- It cannot store records for certain events (e.g. entity join)
Requirements
We need an audit log that:
- Does not delete records of actual changes
- Keeps altered values, for context. E.g. a change in a conceptual object {'id': 1, 'foo': 'bar'} -> {'id': 1, 'foo='baz'} should log that a change was done to the 'foo' attribute of an object with 'id=1', and that the old value was 'bar'. The new value can be fetched from the current value, or from subsequent log records.
- Can clear old values. When an object is deleted, we may want to purge parameters that may contain personal info, but keep the log record itself.
Design
Schema suggestion
Column | Type | Modifiers | Comment |
---|---|---|---|
tstamp | timestamp with time zone | not null default now() | Add timezone and use datetime + pytz? |
change_id | numeric(12,0) | not null | |
change_type_id | numeric(6,0) | not null | Or replace with regular string constants? |
entity_id | numeric(12,0) | not null | Was subject_entity, no constraint |
change_by_id | numeric(12,0) | not null | Was change_by, no constraint |
target_id | numeric(12,0) | Was dest_entity, no constraint TODO: Or is this meta? | |
meta | text (or json?) | not null | A new serialized JSON object, with persistent metadata. |
params | text (or json?) | Was change_params, contains the changed values, and may be deleted after some time |
- _id fields
All the entity ID fields works as before, but without database constraints. These IDs are primarily used for lookups, and should remain in indexed columns for that reason.
This allows us to keep changelog records, event after the relevant entities are deleted.
- change_by_id
- The change_by_id column is now mandatory. When the database belongs to a script, service, etc..., the change_by_id may be set to the entity_id of cereconf.INITIAL_ACCOUNTNAME
- meta column
The meta column contains a JSON-serialized dict, with names and other identificators that should be used e.g. when formatting a change entry. The dict should always contain:
{ "change": str(change_type), "change_category": str(change_type.category), "change_type": str(change_type.type), "change_by_name": "<username>", "entity_name": "<username|person ?|ou sko?|group name>", "entity_type": str(entity.entity_type), "change_program": __file__, }
And would contain similar data for target_id.
- params column
The change_params should always contain the changed values (i.e. the old values):
- On create/add, nothing is needed in change_params
- On update, all previous values of modified columns should be contained in change_params
- On delete, all relevant columns should be contained in change_params
- json fields
- The JSON fields should be TEXT, or possibly
DBAL
When writing data rows to or fetching data rows from the change-log, each row should be turned into a Change object. This should look somewhat like Cerebrum.modules.event_publisher.event and Cerebrum.modules.event_publisher.eventdb.
Entity merge
One of the requirements of this new changelog is the ability to merge entities. This mainly applies to person entities, but other entities could also be affected.
Given two entities, A and B, where we want to merge B into A:
A merge could be performed as follows:
- Entity data (names, contact info) from B is added to A, causing all the relevant events to be issued.
- Owned entities by B gets A as a new owner - this probably requires a new set of change_type constants (e.g. category move?)
- Entity data is deleted from B, causing all the relevant events to be logged.
- A new change_type, (e.g. category join) is implemented, with A as its entity_id, and B as its target_id. Alternatively, this change could be logged on both entities.
- The B entity is deleted, causing all the relevant changes to be logged.
There are some more challenges to solve, e.g. what should happen when the objects have a different value for the same attribute? This is probably solved somehow in join_persons.py, but probably deserves some thought and reconsideration.
Formatters
Formatting of change log records are implemented multiple times in Cerebrum, all with a differing number of implementations (per bofhd_commands class).
We would need to re-implement formatters for the new change log, which could be imported and used by the relevant bofhd commands.
Compatibility
A lot of code relies on the change log. This would need to be adapted to the new change log.
- Changes may exist for entities that no longer exists -- these would need to be filtered out.
- Columns have changed names, change_params contains different data.
Another option would be to keep the old change_log and have them work in parallell until the above changes have been done everywhere.
TBD
- Use postgresql json type?
- Discontinue use of the CLConstants in audit log?
- Use timezone data in time stamps?
- Model -> should we keep a shadow copy of entity_name and _code tables, rather than including it in every record?