Wednesday, October 31, 2007


Hello again ;)
I ran into a tricky situation last couple of days. I have some big tables with varchar fields where I have to JOIN ON varchar fields. The problem does not concern only the JOIN - it is a general problem using varchar / text fields used to be searched one way or another.

So - the situation: We have one table with about 1M rows (relatively small amount data for each row ~90 bytes) on one hand. And another table with records to be matched - about 100k rows (relatively medium amount of data for each row ~ 500 bytes). We do a SELECT which JOINs the two tables based on VARCHAR columns. Do not ask "why?" - This is the business logic; it cannot be otherwise (at least in the beginning). If we index the columns we will got the results fast! What is fast? Less than 50 ms to get around 200 joined rows. Well, OK, you got me - it depends strongly on the hardware configuration and MySQL settings. Let's assume this is our limit. I was very happy to discover that it works that fast. The speed is critical, because we will need to join around 100k rows.

Where is the problem then? Well, it happened that the first table is populated with another several hundred thousand records and it became with about 1.9M rows. And? And, it appeared that the MySQL should copy the temp table on the hard drive. As you can guess - the result is: 204 joined rows (not 204k, but only two hundred and four) for ..... 417 seconds. What about 100k rows? Killing me.

What can we do about that? Well, I know (the business logic predicts) that the first table - that with about 2M rows would never have more than 4 million rows. I also noticed that partitioning is only partially supported in the beta (RC yet) of MySQL 5.1.x, but the recommended for production environment is 5.0.x which does not support partitioning. So, I discovered the great options MAX_ROWS and AVG_ROW_LENGTH. Which does what - it says to the MySQL engine, that this table would never have more than MAX_ROWS tows, and the average row length would be AVG_ROW_LENGTH. You can find the average row length by executing the command SHOW TABLE STATUS FROM [DB_NAME] LIKE 'TBL_NAME'. So, if you know, if your business logic predicts that you will have a large table, BUT this table will have relatively known content (like it will have not more than 4M rows), and you know the average row length, you can set the MAX_ROWS and AVG_ROW_LENGTH for that table. The AVG_ROW_LENGTH is required if you have blob/text/varchar fields.

That will cause the MySQL engine to optimize the space required for that table, also optimizing the myisam_data_pointer_size, which sets the number of bytes used for internal row pointers. Yes - it works only with MyISAM tables. But you will be surprised. After I set the MAX_ROWS and AVG_ROW_LENGTH on that table with the 2M records - I got a surprising results - 90k joined rows for ..... 47 seconds !!

I think I will use these options often in the future, what about you?


Anonymous said...

Great tip!

I heard before about these options but never thought they can have such a big influence on joining tables!

Unknown said...

Well, I am searching 'AVG_ROW_LENGTH', so Google leads me to your blog.

Thank you tips, and here is my confusing: I have read MySQL manual and I do not find the answer of how to set value of AVG_ROW_LENGTH. If I set it as 50, it means 50 characters or 50 byte?

Thank you again.

Anton Staykov said...

Hello WalkinRaven,
50 is measured in Bytes, since it is AVG_ROW_LENGTH - which means a total length of a ROW. Now, a row can contain multiple columns, which can be different data types: numbers, chars, varchars, dates etc., so the row length is measured in BYTES. For how MySQL store different data types and how many bytes/bits each datatype occupies please refer to MySQL manual, Section DataTypes Storage Requirements (

As for setting this value - it can be set via my.cnf / my.ini in [mysqld] section.

Unknown said...

Thank you!

chandu said...

can u provide the syntax for assiging values for a particular table??thank u....

Anton Staykov said...

you can do it either in the CREATE TABLE or in the ALTER TABLE commands:

For example: