What is LOAD DATA and how does it work

I would like to highlight a less known MySQL SQL command, analyze how it works and understand some of the decisions that need to be made when using it.

Let’s start with the basics: The Reference Manual.

It does have a lot to say about LOAD DATA. But we will focus on LOAD DATA INFILE. IN a nutshell it has this syntax:

LOAD DATA
  [LOW_PRIORITY | CONCURRENT] [LOCAL]
  INFILE 'file_name'
  [REPLACE | IGNORE]
  INTO TABLE tbl_name
  [PARTITION (partition_name [, partition_name] ...)]
  [CHARACTER SET charset_name]
  [{FIELDS | COLUMNS}
    [TERMINATED BY 'string']
    [[OPTIONALLY] ENCLOSED BY 'char']
     [ESCAPED BY 'char']
    ]
  [LINES
    [STARTING BY 'string']
    [TERMINATED BY 'string']
  ]
  [IGNORE number {LINES | ROWS}]
  [(col_name_or_user_var
  [, col_name_or_user_var] ...)]
  [SET col_name={expr | DEFAULT},
  [, col_name={expr | DEFAULT}] ...]

OK, so this is a command to load the data in a text file into a table.

Let’s give it a try.

Let’s use the following table definition:
CREATE TABLE t1 (a INT PRIMARY KEY, b INT, c INT);

And the following data file (data.txt):

1,1
2,2
3,3
4,4

But … Hmm.

So this is an SQL command executed on a server computer by a client computer. I know how to create the table. But where do I put my data file ?

Back to the reference manual:

 

If LOCAL is not specified, the file must be located on the server host and is read directly by the server.

OK, so I need to put data.txt on my server. Let’s give it a try:

mysql> create table t1 (a INT PRIMARY KEY, b INT, c INT);
Query OK, 0 rows affected (0.05 sec)

mysql> select @@datadir;
+--------------------------------------------+
| @@datadir |
+--------------------------------------------+
| /home/gkodinov/work/mysql-8.0/bld/datadir/ |
+--------------------------------------------+
1 row in set (0.00 sec)

mysql> exit;
Bye 

[gkodinov@loki01]~/work/mysql-8.0/bld/mysql-test: mkdir /home/gkodinov/work/mysql-8.0/bld/datadir/textdata/

[gkodinov@loki01]~/work/mysql-8.0/bld/mysql-test: cat > /home/gkodinov/work/mysql-8.0/bld/datadir/textdata/data.txt
1,1
2,2
3,3
4,4
[gkodinov@loki01]~/work/mysql-8.0/bld/mysql-test: mysql -u root
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 9
Server version: 8.0.16-debug Source distribution

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> load data infile 'data/data.txt' into table t1 fields terminated by ',';
ERROR 1290 (HY000): The MySQL server is running with the --secure-file-priv option so it cannot execute this statement

Hmm !

mysql> select @@secure_file_priv;
+--------------------+
| @@secure_file_priv |
+--------------------+
| NULL |
+--------------------+
1 row in set (0.00 sec)

This NULL value (meaning “do not allow any file access”) prevents the server from accessing any local file in the file system. OK. That seems a reasonable default.

Now, for the sake of continued action let’s enable it to access files in the data sub-directory of the data directory by adding the following:

--secure-file-priv=/home/gkodinov/work/mysql-8.0/bld/datadir/textdata

to mysqld.

Second try:

mysql> select @@datadir,@@secure_file_priv\G
*************************** 1. row ***************************
@@datadir: /home/gkodinov/work/mysql-8.0/bld/datadir/
@@secure_file_priv: /export/home/tmp/gkodinov/work/mysql-8.0/bld/datadir/textdata/
1 row in set (0.00 sec)

mysql> load data infile 'textdata/data.txt' into table t1 fields terminated by ',';
ERROR 1261 (01000): Row 1 doesn't contain data for all columns

Hmm, looks like we’re past the file access fence. But, as the server correctly noticed, we have 3 columns in the table and just 2 columns for each row in data.txt. Let’s fix that too:

mysql> load data infile 'textdata/data.txt' into table t1 fields terminated by ',' (a,b);
Query OK, 4 rows affected (0.01 sec)
Records: 4 Deleted: 0 Skipped: 0 Warnings: 0

mysql> select * from t1;
+---+------+------+
| a | b | c |
+---+------+------+
| 1 | 1 | NULL |
| 2 | 2 | NULL |
| 3 | 3 | NULL |
| 4 | 4 | NULL |
+---+------+------+
4 rows in set (0.00 sec)

Perfect !

As you can see LOAD DATA INFILE works pretty similarly to importing text data into an MS Excel spreadsheet. And it has pretty reasonable default configuration to prevent people reading random data across the server’s file system. You just need to have your data at an appropriate directory of the server computer and you can import them.

The benefits of going into this extra trouble are mostly that you only have to parse a single INSERT statement and then just execute it as many times as there’re rows.

So far so good. But … hmmm … have your data at the server computer ? So that you can import them and store them again into the data directory into the same server computer ?

This might work for small files. But if you have lots and lots of data then having two different copies of them on the same computer may not even be possible due to space limitations.

But, as you recall from earlier, there’s two computers involved into executing an SQL command: the server computer and the client computer. What if we can use the latter’s file storage space and somehow stream the data to the former over the client/server connection (together with the SQL command itself) ?

This is exactly what the optional LOCAL modifier does !

But hey, this would require a network protocol extension: you have a lot of extra data to send. Let’s see what does this look like on the wire. We could of course use wireshark or any other network data analysis tool to do that. But fortunately we do not have to.

The mysql client/server protocol wire formats are documented in the server’s Doxygen documentation. It has the following diagram:

So basically it works like this:

  1. The client sends the COM_QUERY to execute the SQL command as it usually does.
    • In our case this is “LOAD DATA LOCAL INFILE datafile/data.txt”
  2. The server (instead of sending a resultset or a status) sends a special request (0xFB) together with the file name of the file the client is supposed to read and send
    • In our case this is “datafile/data.txt”
  3. The client starts streaming the contents of the file requested by the server followed by an empty (zero length) packet to signify the end of the file
  4. The server executes the command and uses the data sent and continues as it normally would with any other DML command and sends an OK (or an ERR) packet to signify the outcome of the execution.

Good ! But … wait …

Why does the server need to send a file name to the client to read from ? Didn’t it just provide an SQL command with that same file name it in ?

Yes, it did. But unfortunately (or not) SQL is not the easiest language to parse. To know the file name the client needs to locally have a full-fledged parser for the MySQL SQL dialect that is *in sync* with the relevant parser on the server. Tricky !

This is why the client tries to take advantage of the parsing done anyway by the server and expects it to parse the statement received and provide the bit the client needs: the file name to read.

OK, let’s move our data file to the client computer and give this whole thing a try:

mysql> load data local infile 'textdata/data.txt' into table t1 fields terminated by ',' (a,b);
ERROR 1148 (42000): The used command is not allowed with this MySQL version

What ?

This not very friendly error message says LOAD DATA LOCAL INFILE is disabled. But why ?

RefMan to the rescue again (about –local-infile):

This variable controls server-side LOCAL capability for LOAD DATA statements. Depending on the local_infile setting, the server refuses or permits local data loading by clients that have LOCAL enabled on the client side.

And the default of this variable has been OFF since 8.0.2.

OK, let’s enable this:

mysql> select @@local_infile;

+----------------+
| @@local_infile |
+----------------+
| 1 |
+----------------+
1 row in set (0.00 sec)

mysql> load data local infile 'textdata/data.txt' into table t1 fields terminated by ',' (a,b);
ERROR 1148 (42000): The used command is not allowed with this MySQL version

Hmm, still not working ?! What’s wrong ?!

Remember the above network exchange explanation ? The client receives from the server a name of a file to read from. Thus the client needs to make its own decision if it wants to trust the server and the file name it sends to it.

This situation is explained very well in this Reference manual page. It also mentions that the client needs to signal that it trusts the server and the connection to it by adding one extra option (or mysql_options() option): –local-infile.

Let’s give that a try:

[gkodinov@loki01]~/work/mysql-8.0/bld/datadir: ../runtime_output_directory/mysql --local-infile=1 <your options here>
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 9
Server version: 8.0.16-debug Source distribution

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> load data local infile 'textdata/data.txt' into table t1 fields terminated by ',' (a,b);
Query OK, 4 rows affected (0.00 sec)
Records: 4 Deleted: 0 Skipped: 0 Warnings: 0

Nice ! Finally !

Ok, so let’s recap what does it take to get LOAD DATA LOCAL INFILE working:

  1. You need to tell the server to enable it for its clients to use by specifying –local-infile on the server.
  2. You need to instruct your mysql client to enable it for all SQL statements. Or manage it on a more fine-grained basis (e.g. per connection) if you use the C API.
  3. And of course you need to have the right file and specify the right import options.

I know it’s a tall order and a severe usability limitation. But it’s definitely worth it considering the kind of magic it does to transfer a bunch of text data over the wire and import them into a server relational table. All with a single command.

Thank you for using MySQL !

One thought on “What is LOAD DATA and how does it work

Leave a Reply

Your email address will not be published. Required fields are marked *