MySQL Audit Data Consolidation – Made Simple

In this blog, I am going to demonstrate how to create your own consolidated audit log archive across many mysql instances. In a followup I’ll show how to extend this example by creating a simple hash chain on that archive – so you can prove whether or not its been modified or tainted in any way and if so where.

In this blog’s example code I’ll be using a new extension to the mysql audit_log_read function as well as show why the mysql x api for sql can make certain tasks much simpler. This new audit reading feature was released in MySQL 8.0.22 Enterprise Edition. This blog relies the MySQL Shell running in both sql and python modes.

A few additional techniques I will show include

  1. Extracting rows from JSON audit data – transforming JSON data to tabular using the JSON_TABLE function.
  2. Inserting those rows from an audited database to an audit data archive MySQL database. As you’ll see the mysqlx api makes the far simpler.

A few factoids. As many DBAs can tell you, whether regulations prevent  or for other security reasons, they don’t have and often time don’t want access to the underlying os server where mysql is running. No SSH for DBAs! In general, the fewer things that go onto that OS running a database service the better, especially from a security stand point.

For many reasons – security, analysis, etc – its a best practice to get get audit data off of the mysql server frequently and onto some central data store where you can access activity across all your mysql servers.  Why?

  1. Easier to run analysis
  2. Prevents the data from being corrupted
  3. Regulatory Requirements
  4. Storage Mangement

Certainly there are various ways to perform moving the audit data task using a variety of products. This is just 1 possible design pattern you might use and it could easily be adapted for third party integration or changed to write data to an object store or some other audit data repo.

Terminology wise, I’ll call the server that consolidates the audit data the “archiving server”. That server is going to have a user account I will call the “auditarchiver” that can only insert and select on the audit_data table. (It can’t change data).

For each server where audit data will be extracted there will be an account that reads the audit data via SQL connection to read the JSON data from the audit files.

So first lets login as an admin on the archiving mysql server instance – I will use root. This entire example requires using the mysql shell (which rocks!). It includes python for the scheduled batch archiving from targeted servers which to pull the audit data from.

Step 1 – Audit Archive database Setup.
Create the schema and table on the archive server
On the audit data archiving server

> mysqlsh

Step 2 – on the archiving server
First create the accounts archive server. Grant select and insert to the auditarchiver account. (no more)

Step 3 – for each server to read (pull) audit data – create an auditreader account

> mysqlsh

/* If you aren’t running mysql enterprise version 8.0.22 or higher – stop and upgrade */

Step 4 – install enterprise audit if not already installed.

Install mysql enterprise audit
In a nutshell, execute the following SQL located in the mysql share dir – audit_log_filter_linux_install.sql

Step 5 –
Edit /etc/my.cnf – to minimally include first 3 lines are required for this example.

Restart the server

Step 6 – add audit filters and bind to user
If you haven’t added audit filters previously – the following will log everything for everyone connecting. This will log a great deal, but for the purpose of seeing this work in a test environment this makes sense. In production you’ll likely want to be more selective.

Repeat for each audited instance. Probably best to start with one and go from there.

Step 7 – generate some audit data activity
Run some adhoc SQL on the servers as various users where you installed mysql enterprise audit.

Step 8 – pick a server where you can schedule mysqlsh in batch mode

What follows is how this batch python script will work (The consolidated code is repeated at the end to copy edit and run.
Please change my passwords and use specific server names etc.

So first I will connect using the mysqlx api to the read server with its own session and the archive server with its own session.

Change “localhost” to the ip/hostname of your archiving server.

Change “localhost” to the ip/hostname of your audited server.

Ok now I need to see if I have prior archive data – so I can point to where in the audit data I need to start reading newer data. If the archive contains no data for this instance – I am going to start at the beginning of my log data.
I create a json string with the “start” in the json if the archive table contains no data for this instance (identified by its server_uuid). The “start” tells the function to perform a general date time search.

However if there already prior data loaded, then I get the last timestamp and event id that was inserted and use that as a pointer into the audit data – in this case there is not a “start” in the JSON search string.

Ok we’ve now set the session variable for my search criteria – @nextts.

If you want to look at the json search string
view_nextts = read_session.run_sql("select @nextts")

Now – in the next step you’ll see in the SQL the call to the audit_log_read component
AUDIT_LOG_READ(@nextts)

And you will see that I wanted to store the data in rows in my archive – so I am using the JSON_TABLE function to convert from JSON to rows.

Now you could go all JSON and store each event in a JSON data type. That would be simple as well. But in this case I’m storing in a table. It’s up to you.

Ok – now as the auditarchiver – I am going to save the data I just extracted.
Here’s where sqlx api is very handy. I can loop on results and save to the table with very little code.

As you may have noticed – I’m not trying to extract to much as one time from the audit log. I’ve maxed it at 100 events. You’re also limited by the buffer size you’ve set for –audit-log-read-buffer-size.

See https://dev.mysql.com/doc/refman/8.0/en/audit-log-reference.html#sysvar_audit_log_read_buffer_size

So save the this script to a dir.
cd to the dir

Now you just run mysqlsh in batch mode.
First run –
frank@Mike % mysqlsh --py < archiveauditbatch

The archive is empty – get oldest audit event
Archive was empty – load all

The next runs
frank@Mike % mysqlsh --py < archiveauditbatch
Data in archive getting last event ts and id
Archive was not empty – skip first duplicate event

Ok – now that you’ve run a few tests, create a scheduled batch with cron or you favorite scheduler. Have it loop till empty etc etc.

Step 9 – login to the archive server and have a look at the data
select * from audit_archive.audit_data order by server_uuid, id, ts;

Run some stats

Finally – this is far from production quality code
I inlined the passwords, I didn’t provide a parameter for the audit servers to loop and collect data etc.  I didn’t check errors etc. The point was to demonstrate some techniques to help jump start folks.

In a followup blog –
I’ll show you how to perform hash chaining etc – so you can prove your audit data is immutable and untainted.

Thanks for using MySQL.  Here’s the complete batch script 

Leave a Reply