Friday, January 21, 2011

Postgresql Configuration

* 1 View currently running queries
* 2 Change lock file location
* 3 PostgreSQL vacuum
* 4 PG Buffer Cache (shared_buffers)
* 5 Working memory (work_memory)
* 6 Check points
* 7 Linux kernel system settings - SHARED MEMORY
* 8 Linux kernel system settings - SEMAPHORES


View currently running queries

In order to view the currently running queries, you can use this line:

SELECT client_addr, usename, datname, procpid, waiting, query_start, current_query FROM pg_stat_activity;

This can be issued from psql - example connection line:

/local/tsalomie/postgresql/bin/psql -h localhost -p 6001 -U tsalomie -d tpcw250_25000

Resources for this topic can be found here: http://chrismiles.info/systemsadmin/databases/articles/viewing-current-postgresql-queries/

Change lock file location

change the property unix_socket_directory to:

unix_socket_directory='/tmp'

in the postgresql.conf file.

or alternatively we can use the '-k /tmp' option when starting the postmaster (or from pg_ctl with -o option)

PostgreSQL vacuum

PostgreSQL uses multi versioning, resulting in unused pages that are not recovered. The vacuum process marks unused pages as available and can also delete them (shrinking the database). During the vacuum process, database statistics can also be obtained. This leads to the following vacuum options.

postgresql vacuumdb documentation

postgresql vacuum documentation

Synopsis:

VACUUM [ FULL | FREEZE ] [ VERBOSE ] [ table ]
VACUUM [ FULL | FREEZE ] [ VERBOSE ] ANALYZE [ table [ (column [, ...] ) ] ]

Only vacuum the [table_name] table, or if missing all tables in database:

VACUUM [table_name];

Vacuum full the [table_name] database, or if missing all the tables in the database. After this, db shrinkage will occur.

VACUUM FULL [table_name];

Vacuum analyze [table_name] database will do normal vacuum and generate db statistics. If [table_name] is not specified, then all tables in the database will be vacuum-analyzed.

VACUUM ANALYZE [table_name];

Vacuum full analyze [table_name] does (vacuum full) and (vacuum analyze). If [table_name] is not specified, all the tables in the database will be vacuum-full-analyzed.

VACUUM FULL ANALYZE [table_name];

To do a vacuum full on all databases on a server (in a cluster) do something like:

# vacuumdb -h localhost -p 5432 -a -f


PG Buffer Cache (shared_buffers)

* PG buffer cache - LRU algorithm + clock sweep algorithm - better performance than basic disk cache algorithm

* Data in PG buffer cache ~ data in OS disk cache => bad idea to give too much memory to PG buffer cache, still as the caching algorithm in PG is better than the OS one, too small is bad again - tricky problem of balance

* PG relies heavily on OS disk cache - gives it OS independence ...? (why not have OS specific implementations as DB performance is priority not platform independence)

* Practice proves PG buffer cache should be 0.25 - 0.33 of RAM

* TOTAL_RAM - (OS_TAKEN_RAM + DATABASE_ALLOTED_RAM + APPLICATIONS) = effective_cache_size

* On Windows OS - don't push shared_buffers over 10k - 50k (aka 80MB - 400MB)


Working memory (work_memory)

* Not too big, this is used for sorting, ordering, grouping operations.

* If there operations are not frequent, then keep value small

* Value around 10-20MB is normal - increase if needed



Check points

* Remember that if frequent write operations the amount of checkpoints performed increases with the size of shared_buffers

* Solution - rely more on OS disk cache



Linux kernel system settings - SHARED MEMORY

There are a couple of linux system settings that can affect Postgresql: $ sysctl -w kernel.shmmax=134217728 $ sysctl -w kernel.shmall=2097152

They represent the Shared Memory Max and All values. The commands above show how to inject these values in sysctl. They can also be manually set in /etc/sysctl.conf with su rights.

Once they are set, one can apply them by: sysctl -p /etc/sysctl.conf

SHMMAX defines the maximum contiguous memory area that can be requested

SHMALL defines the amount of memory all the shared memory areas can take up. The rule is:

MAX TOTAL SHARED MEMORY AREAS = PAGE_SIZE * SHMALL

PAGE_SIZE = getconf PAGE_SIZE

SHMALL = cat /proc/sys/kernel/shmall

Remark: making SHMALL bigger than the amount of available RAM leads to swapping! bad bad bad

Remark from [1]: shmall determines the total amount of shared memory to be allocated using its value multipled by the OS pagesize.



Linux kernel system settings - SEMAPHORES

SEMMNI = at least CEIL(MAX CONNECTIONS / 16), this is Maximum number of semaphore identifiers (i.e., sets)

SEMMNS = at least CEIL(MAX CONNECTIONS / 16) * 17, this is Maximum number of semaphores system-wide

SEMMSL = at least 17, this is Maximum number of semaphores per set

The following two we don't touch.

SEMMAP Number of entries in semaphore map - see text

SEMVMX Maximum value of semaphore - at least 1000 (The default is often 32767, don't change unless forced to)


We can inspect the content for kernel.sem:

* cat /proc/sys/kernel/sem
* /sbin/sysctl -p | grep sem

The output has the structure: SEMMSL, SEMMNS, SEMOPM, SEMMNI

For a Postgresql server to handle 4800 connection we can use the line:

echo 250 32000 32 400 > /proc/sys/kernel/sem

Resources:

Postgresql documentation for kernel resources

Setting semaphore properties in linux

No comments: