Write Yourself a Query Rewrite Plugin: Part 2

In my last post I covered how to use the query rewrite framework to write your own pre-parse plugin. The interface is simplistic: a string goes in, a string comes out, and the rest is up to your plugin’s internal workings. It doesn’t interact that much with the server. Today I am going to show you the other type of plugins that rewrite queries, post-parse query rewrite plugins. This type is, out of necessity, more tightly coupled with the server; it operates on the internal data structures that make up the query’s parse tree.

Creating the Plugin

Declaring the plugin is similar to declaring a pre-parse plugin: you declare the plugin in the usual way but with the addition of a specific plugin descriptor for the post-parse query rewrite plugin type. This is a struct, as usual:

As you can see there’s no function to free up memory, as we had with the pre-parse query rewrite plugin type. This is not likely needed in a post-parse scenario because you will be working much more on the data structures that the server offers and roll your own to a much smaller degree. Another thing to note is that you can ask the server to give you the statement digest along with the parse tree. Statement digests are essentially hashed values for queries which you can read all about in the manual. They are calculated from normalized statements (see the link) during parsing and you are likely to want them if you write a post-parse plugin. The provided Rewriter plugin (read about it in this post) makes heavy use of them.

The plugin’s entry point (I’ll call it the ‘rewrite function’ from here on) is called when a statement has been parsed. This happens either when a statement is sent over the wire the normal way, or when a prepared statement is prepared. This makes it possible for your plugin to operate so that prepared statements are only rewritten when they are prepared and after that the rewritten version is always run.

The rewrite function gets called with a Mysql_rewrite_post_parse_param struct, which gives you a flag that you can raise to tell the server that you rewrote the query, and a THD object. The THD object is key to accessing the internals of the server, and it allows you to examine the parse tree.

At Your Service

A plugin working on the parse tree is not that useful without the necessary primitives to operate on that parse tree. That’s where the parser service comes in. The MySQL 5.7 server offers certain services to plugins—read all about them here in the manual—and long with the query rewrite framework, the server now offers a parser service. This service lets you parse strings, manipulating the parse tree and the related functionality. The full documentation will end up in the manual in due course, but I will give you a quick walk-through today.

First Things First

MySQL has a habit of resolving the “current database” when parsing a string. This means that once you’ve parsed a query, there is no more dependency on the “current database” context. The “current database” is a session-dependent setting, so you may or may not want to use the “current database” from the session that triggered the plugin. If you wish to modify it, then you can use the mysql_parser_set_current_database(MYSQL_THD,</code> const MYSQL_LEX_STRING) interface to set the “current database” to whatever you want.

Let’s Parse

Alright, let’s parse something. The function to use for that is: int mysql_parser_parse(MYSQL_THD thd, const MYSQL_LEX_STRING query, unsigned char is_prepared, sql_condition_handler_function handle_condition, void *condition_handler_state)

This function lets you send in a query string for parsing. If you set is_prepared to a non-zero value then the string will be parsed as a prepared statement, i.e. you’re allowed to use the ? placeholder in lieu of literal values. There are interfaces that let you work with these placeholders (count, locate, etc.) later on.

A word of caution here: when you parse a string it will be the session’s new “current” parse tree, replacing the old one. There are ways around it which I may cover in a future post. But for now, let’s assume that parsing a string means replacing the “current” parse tree.

You may also want to handle parse errors, in which case you can send in a function pointer to a callback function handle_condition. If you want a state to be passed along to it, for instance to count the number of non-fatal warnings, then you would pass a pointer to it as condition_handler_state.

Take a Walk on the Server Side

The parse tree is walked using callbacks, so you will need to declare a parse_node_visit_function to handle them. This will be called for each literal in the parse tree, and it also lets you pass a state object around. The signature for this is: int (*parse_node_visit_function)(MYSQL_ITEM item, unsigned char* arg)

The MYSQL_ITEM represents a literal which you examine by printing it with: MYSQL_LEX_STRING mysql_parser_item_string(MYSQL_ITEM item)

The string is allocated on the server side so it has to be freed by the server if you are to have any chance of your plugin running on OS’es that have separate heaps for dynamic libraries (yes, I’m looking at you, Windows). You do this by calling: void mysql_parser_free_string(MYSQL_LEX_STRING string)

Once you have a function to process literals in place—printing them out is a good start—then you’re ready to walk the parse tree. It is available from the MYSQL_THD, and we walk it by calling: int mysql_parser_visit_tree(MYSQL_THD thd, parse_node_visit_function processor, unsigned char* arg)

If this function returns a non-zero value then the server has probably knocked on your door already by calling your handle_condition() function, if you bothered to declare it. In other words, a non-zero return value means something went wrong.

Rewriting Something

Typically the first thing your plugin will want to do is look at the current query to see whether it should be rewritten. There are some different ways that you can do that. You can look at the digest—the normalized representation of the query—the query string as it looked when it came in, and the literals in the query. The first three have an increasing level of precision: the digest will accurately identify any normalized query, but there’s a slim chance of collisions, especially if the queries strings are very long, as the digest is only calculated on the first 1024 tokens (something we’d like to make configurable in future work). The normalized query format will group highly similar queries together by replacing constant expressions with ‘?’ symbols, while the exact query string is the way it looked prior to parsing. If you don’t want to use something like that, then you’re probably better off writing a pre-parse plugin.

The framework doesn’t let you make destructive changes to the parse tree, so you have to create a new one. How do we create a new parse tree, then? Anyone?

If you said “parse a string” out loud right now, you’re right. That’s what we do, and we use mysql_parser_parse() to do it. If you paid attention earlier, you know that the tree we parse will now be the new “current” parse tree.

Once you’re done you can raise the FLAG_REWRITE_PLUGIN_QUERY_REWRITTEN note in your rewrite function and return 0. Then you can look for the “Query rewritten” SQL Notes in your client applications.

I hope that this was helpful. As always, THANK YOU for using MySQL. Happy rewriting!

3 thoughts on “Write Yourself a Query Rewrite Plugin: Part 2

  1. The new functionality you mention in this post looks quite useful.

    Another related and common requirement I see is the need to change from one database name to another often when migrating applications. For various reasons the application may be talking to database db1, but then need to talk to database db2 (but the application’s configuration is hard-coded to talk to db1. Here the mysql_real_connect() call is used and the database db1 is given during the call. This migration may be quite a short period of time after which configuration changes may allow the new database name to be used.

    What would be nice is to be able to configure a database alias so that if I try to connect to db1 (which doesn’t exist) then the connection actually connects to db2 (assuming I have such rights). I don’t need to hide the fact I’m in db2 as the application is unlikely to care, so SELECT DATABASE() would give me db2 and any access permissions would apply to db2 as normal for that user.

    I’d guess the query rewrite behaviour described in these 2 posts is intended once you are connected and “in a database”, rather than at the slightly earlier stage I’m talking about.

    If such a feature were possible it would be most interesting to have too.

  2. So when are VIEWs and PARTITIONing being rewritten to use query rewriting (like they should have been done originally?

    (This is the reason why we ripped both of them out of Drizzle so early on – architecturally they didn’t make sense where they were, VIEWs and PARTITIONs are merely replacing one part of the parsed SQL with one you prepared earlier)

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enter * Time limit is exhausted. Please reload CAPTCHA.