Cerebrum changelog

Requirements and design of a new Cerebrum change log.

1   Background

Cerebrum uses a changelog to keep track of internal changes. The change list is used to present a partial audit log to system administrators through commands in the bofhd XMLRPC server.

In addition, the changelog has been expanded to work as: - An internal, persistent event queue, through CLHandler - An internal event broker, through Cerebrum.modules.EventLog - An internal event queue, through Cerebrum.modules.event

1.1   Uses

The ChangeLog refers to a multitude of functionality, all based on an abstract concept in Cerebrum.ChangeLog.

Implementations of this API exists for a number of uses:

  • Audit log
  • Quicksync, or a "sync queue" for Active Directory, ePhorte, etc...
  • Reports over new persons, new accounts, other changes
  • Temporary storage (passwords for AD, Virthome one-time keys/invitations)
  • Event-driven sync ("event brokering", "event queue") for Exchange, CIM
  • "Event queue" for publishing messages to RabbitMQ

More info about the current use of RabbitMQ can be found in current-implementations.html.

1.2   Issues

There is a number of issues with the current changelog implementation:

Incomplete logging
Not all changes in the Cerebrum database are logged, which means that the audit log is not complete.
False positives
In certain cases, not enough is done to identify whether a change actually happens. In other words, a record is logged even if an update doesn't cause any database rows to change.
Deleted records

When an object (entity) is deleted, all records referring to that object must be deleted as well, because the record use foreign keys as a reference.

To make matters worse, a record may refer to two distinct objects. If one of the objects are deleted, then the remaining object will loose changelog records as well.

Meaningless internal IDs
In many cases, additional parameters are included in changelog records (often constants). In those cases, the internal ID is used, not the actual value or meaning. This makes parameters useless if the internal ID changes or is removed at any point.
Missing context

Changelog records often lack crucial information:

  • add entity_cinfo for <username> -- there are multiple contact types, which was added?
  • mod entity_cinfo for <username> -- what contact type was changed. What is it's old value?
  • refresh quarantine for <username> -- which quarantine?
Changelog = Eventlog
All implementations of event queues are tied to the changelog. Anything logged as a change is also queued as an event. There is no practical way of filtering out uninteresting events, or joining events that alter the same objects and attributes.
Multiple event systems
  • CLHandler implements a tagging system, that tags a change as processed by some system.
  • Eventlog implements a brokering system with multiple queues
  • events implements a single event queue that is consumed in near real-time.

3   Future work

  • Replace log_change with pub-sub signals, each object and mixin implements their own actors. Send all relevant data (The altered object, db-connection, altered values, ...). Actors could be a change_log, event queue/event filter, business logic from other classes. This would e.g. solve the issue with entities that needs to duplicate each others spreads. Using pub-sub would let that happen, while remaining loosely coupled.
  • Implement better filtering of events in the event_publisher. The primary function when a change happens, should be filtering and possibly merging of events.
  • Implement event support for all changes.
Av fhl
Publisert 29. juni 2018 10:58 - Sist endret 11. mai 2019 20:26