The internet is full of configuration files, you can easily google them and grab one that looks like a fit for your server. There lies a problem with them, your environment is unique and with 99.9999999% certainty does not match any other on planet; simply just having different data which results in unique finger print for your server. Only server it could match is MySQL replica, but even then there will be minor subtleties that do not match with the primary server. In that perspective, this article is already doomed before even started. But am going to continue any way. For those who are seeking for InnoDB configuration. That will be sequel for this piece.
The base of configuration file to be populated during article
[mysql] ## Client port = 3306 # The default port for MySQL socket = /var/lib/mysql/mysql.sock # You can choose other socket location or leave this out all together and let OS use it's default path [mysqld] ## General user = mysql # the default user for MySQL default_storage_engine = InnoDB # or MyISAM socket = /var/lib/mysql/mysql.sock # You can choose other socket location # or leave this out all together and let OS use it's default path pid_file = /var/lib/mysql/mysql.pid # You can choose other pid location # or leave this out all together and let OS use it's default path ## MyISAM key_buffer_size = [value] myisam_recover = FORCE,BACKUP myisam_use_mmap myisam_block_size = 1024 ## InnoDB innodb_flush_method = O_DIRECT innodb_log_files_in_group = 2 innodb_log_file_size = 32M innodb_flush_log_at_trx_commit = 1 innodb_file_per_table = 1 innodb_buffer_pool_size = 8M ## storage datadir = /var/lib/mysql/ # You can choose other data directory location # or leave this out all together and let OS use it's default path ## Binary logging log_bin = /var/lib/mysql/mysql_bin # You can choose other binary log directory location # or leave this out all together and let OS use it's default path expire_logs_days = 14 sync_binlog = 1 ## Other tmp_table_size = 32M max_heap_table_size = 32M query_cache_type = 0 #disable query_cache_size = 0 #disable max_connections = [value] thread_cache_size = [value] table_open_cache = [value] table_definition_cache = [value] open_files_limit = 65535 # ## safety max_allowed_packet = 16M max_connect_errors = 1000000 skip_name_resolve innodb = FORCE innodb_strict_mode = 1 ## logs log_error = /var/lib/mysql/mysql_error.log # You can choose other socket location # or leave this out all together and let OS use it's default path log_queries_not_using_indexes = 1 slow_query_log = 1 # Set to zero to track all queries slow_query_log_file = /var/lib/mysql/mysql_slow.log # You can choose other socket location # or leave this out all together and let OS use it's default path
There are some values set and shall not go deeper with them, as explaining all of what is set would extend this article to quite long piece for no good reason. There is link to MySQL documentation on each setting so you can check if the setting is fit for you or not.
Configuration keys with [value]
Even I am concentrating on MODX by topic. This configuration is valid for all other applications that use MyISAM also. If you don't have application yet and want to write one. Do choose InnoDB instead of MyISAM for multitude of reasons that are out of scope for this article. Why MyISAM then? That is simply cause of it having FULLTEXT index available. Yes, MySQL 5.6 has it also for InnoDB but majority of MODX instances still run on pre 5.6 software.
Key buffers also known as key caches is the major player in all configurations where data is stored within MyISAM engine. This value has to be high enough to fit all indexes, but not their data. MyISAM lets operating system handle the caching for actual data. Here is how you can find how much memory you need for buffer.
Using Command Line
You can find how much your indexes take space with next command: du -sch `find /datadir/path -name "*.MYI"`
This will output human readable size for you. For example, on our demo db where there are gig's of data this shows 2.4G there is some rounding done with the command line command as next will show different result
Using MySQL Client
SELECT concat(round(sum(index_length)/(1024*1024*1024),2),'G') idx FROM information_schema.tables WHERE engine='MyISAM';
For example this will output 2.38G on the same dataset as used before. There are clearly bit of throw in direction or another. Which one then to use? Both give good direction. If you have plenty of memory available. Would say that go with even bigger value. You could allocate 2.5G or 3G for MyISAM key buffers, but that is entirely something you have to see. Does your database keep on growing substantially every day/week/month? Will be the major question, if no. Then you can keep the buffers relatively small. If yes, would recommend to have plenty of free memory reserved. Just remember, having 4G of unused buffer space is same as 1M unused as long all indexes fir into memory
You can keep track of how many percentages of key buffer is used with next query:
SELECT 100 - ( (SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Key_blocks_unused') * (SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_VARIABLES WHERE VARIABLE_NAME = 'key_cache_block_size') * 100 / (SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_VARIABLES WHERE VARIABLE_NAME = 'key_buffer_size'));
If the value does not increase by time, you can consider shrinking your key buffer size and utilize the memory somewhere else where it could be used to gain performance out of the system.
This variable controls how MyISAM tries to recover from corruption when opening table. As recommendation, would set this to FORCE, BACKUP like in the example file. This way when possible corruption appears, MySQL tries to repair the table, but if it cannot recover some row. It will continue and the row will be lost. But no fear, MySQL will write an .BAK file in format: tbl_name-datetime.BAK for you to scramble in later.
By enabling memory-mapped access to data files can increase your MyISAM performance by enabling access to .MYD files directly through OS's page cache, thus avoiding system call. How much performance can be gained? I run few tests with sysbench with dataset of 1,000,000 rows on our server. The two test sets
First Run Config
The first one was run four times per configuration for mmap with using threads from one to four incrementally. Between each thread change, the database was cleaned and re-prepared.
Second run config
The second run was done with same configuration and plan as above, but so that the database was not cleaned and all benchmarks were run after each other.
The Test Bench
Running MySQL 5.5.35 on Ubuntu server. 2TB WD in raid 0 with 32Gb memory and 4G key buffer size
The sysbench test
sysbench --num-threads=1 \ --max-requests=100000 \ --test=oltp \ --oltp-table-size=1000000 \ --oltp-skip-trx \ --oltp-range-size=1000 \ --oltp-dist-type=uniform \ --mysql-table-engine=myisam \ run >> bench.log
The Average results per benchmark run
|Cleanup / Prepare||Transactions per second||R/W Operations per second||Min. Request Time||Avg. Request Time||Max. Request Time||Average 95% Percentile|
The differences are not huge really. Enabling mmap seems to give bit of leverage when the database is not purged all the time, which obviously is closer to real world use than having DB flushed once in a while. So if you plan to clear your database all the time. Do not enable the variable. Otherwise will recommend to enable the setting. Just by looking at maximum query times and how many read/write operations can be performed in second it is clear indication that you should enable the variable.
Originally was about to write about how to set myisam_block_size to same with OS page size. But shall say that you can leave it to defaults. And why? I run exhaustive list of benchmarks with sysbench to prove that changing this value will speed up your mysql inserts. But at least on this server this was counter intuitive and resulted in lot worse performance. Below table of results, running sysbench with 4 threads and also changed the --oltp-range-size to 100 to speed up reads.
|Transactions per second||R/W Operations per second||Min. Request Time||Avg. Request Time||Max. Request Time||Average 95% Percentile|
|myisam_block_size = 1024||2470.710||46943.467||1.343ms||1.617ms||137.523ms||1.660ms|
|myisam_block_size = 4096||2198.727||41775.800||1.350ms||1.820ms||1068.493ms||1.880ms|
The results are pretty clear, larger block size causes on this system quite drastic dive in performance. So what to choose? I would recommend to test out the setting yourself. Instrument your queries if possible and try comparing results from that perspective. It is completely plausible that on different kind of load and on different server infrastructure this option can be advantageous.
Max_connections variable controls how many concurrent connections your database can serve. This is by default 100 connections which can serve small applications nicely, for example this blog has maxed out value of four concurrent connections to this day. You can set this value up to 500 without worrying unless you know how many connections is used at peak times. To know how many connections has been used at peak/maximum. You can query it with :
show status where variable_name = 'max_used_connections';
If max_used_connections is same with max_connections variable, it is likely that some visitor has been served with 'too many connections' error. So keep close eye on this if the status variable starts to bloat to direction of the set variable.
Variable controls how many connections can be set to cache after user disconnects from databases, that is when page has finished loading for php applications as an example. Obviously the connection does not get cached if there is no space in the cache for it. The performance improvement from thread cache is not tremendous but small things count. You can follow 'Threads_connected' status variable to see how many threads are connected at a time to see min/max values. Another status variable to follow is 'Threads_created`, if this variable increases all the time it is possible that your thread_cache_size is too small. To see how many threads are already cached check the status variable 'Threads_cached' and if the value equals to thread_cache_size then it is obviously full and you should consider if to increase cache size.
There is no magic number what to set really, if you have 'Threads_connected' peaking between 200-225, you could set thread cache to 256 or can use smaller number like 192 as creating new thread is not too expensive operation and for most times you can go with smaller thread cache. But there is no harm in bigger values as long it is not too big as there is no point having loads of cached threads hanging around for most of their time.
This section only applies for MySQL servers with version newer than 5.1.2 where table cache was only one configuration variable. If you are running pre 5.1.2 version. Please refer to MySQL documentation for configuration tips.
Table cache is used for reusing table file descriptors, index file descriptors and other metadata depending on what storage engine you are using. For MyISAM this setting is more crucial against InnoDB where it only holds parsed table definitions.
Table open cache holds the file descriptor and index descriptors for MyISAM storage engine. The file descriptor is opened into cache for each concurrent query to it like subquery or self join. Index descriptor is opened only once per table and is shared between threads. How big the table_open_cache should be is more trickier question. The table cache should hold atleast value of max_connections * N where N is maximum number of table joins. That should add to quite nice number as there ain't too many application without joins around, most likely none. For this site this value right now could be anywhere between 512 - 1024 really. But playing safe is always good and you can lift the value pretty high as it does not consume too much memory.
Table definition cache holds the parsed table definitions from .frm files. The number should be high enough to hold all of your tables. But do not try to hold all tables in if there are tens of thousands. The opening and parsing table definitions is not too expensive operation. For typical MODX instance you can leave the value to it's default for most of MySQL versions. But if want to play safe, you can use throw dice to value like 512 or 1024 if you fancy power of 2 values. The table definition cache is shared between all threads.
Disabled Query Cache
By disabling query cache, you'll avoid bit of overhead caused by checking if there is result set for query in cache. Also it is hard to keep sane data in it really. If you are running small site that has no loads of unique queries and does not have any user specific queries. You can set query_cache_type to 1 and query_cache_size to 8192 (8M) as an example. But if you have multiple users querying your site including manager actions. You better disable this completely as it is likely that there will not be any data to be fetched from cache. An example lets take forum application.
- User does login action and gets unique user page after login. 1st unique query
- Another user is already logged in and goes to own user page. 2nd unique query
- Both start browsing the forums here and there asking loads of unique queries like, does the user has permission to view some category, board etc...?
- Multiply these actions for each user online
The query cache could be useful for second or two. But after a while with multiple users online. The cache just goes in limbo;
- Check if in cache === false
- Run query
- Cache the results
- Repeat and rinse
The InnoDB configurations
I left small defaults for InnoDB just in case there is some other application running on the host or some extra uses InnoDB tables. As mentioned at the start. There will be sequel about InnoDB configuration and what to take in account.
It is hard to write little but loads about configuring MySQL and MyISAM. There are too much moving parts to say "use this or that". I tried to crush in as much of data without making this article 10 pages long. Which I guess happened any ways. Please leave a comment if found something that is not correct, want to know more, or just want to say hello (Which would be awesome). If you want good solid configuration where to start from. You can use Percona Tools for MySQL which is wicked little site with good configuration tool. Just remember, configuration is key stone for your site. It cannot fix bad queries but can boost them a bit. In the end, it is all about the application how well it can use your MySQL instance.