10 Minutes to Postgresql: January 2011

Tuesday, January 25, 2011

Regular Expression

my_column has values like Example: AA001

Query to list any character not matching the format of ([A-Za-z][A-Za-z][0-9][0-9][0-9]) will be listed

select my_column from my_table where my_column !~ '([A-Za-z][A-Za-z][0-9][0-9][0-9])';

Matching a list of characters

Unlike the previous examples, these next regular expressions carry out queries using a pattern defined in a "class," a list of characters enclosed by brackets, [ ]. Think of a class as a simplified way of expressing logical ORs within a SQL statement.

Study the results returned by the queries:

SELECT record FROM myrecords WHERE record ~ '[a]';
SELECT record FROM myrecords WHERE record ~ '[A]';
SELECT record FROM myrecords WHERE record ~* '[a]';
SELECT record FROM myrecords WHERE record ~ '[ac]';
SELECT record FROM myrecords WHERE record ~ '[ac7]';
SELECT record FROM myrecords WHERE record ~ '[a7A]';
SELECT record FROM myrecords WHERE record ~* '[ac7]';

What happens when you search for a character, the letter z, that is not present in any string in the table?

SELECT record FROM myrecords WHERE record ~ '[z]';
SELECT record FROM myrecords WHERE record ~ '[z7]';

This statement excludes all strings with the characters 4 OR a:

SELECT record FROM myrecords WHERE record !~ '[4a]';

The use of the dash, -, in the class implies a search range--that is, to return all strings having any digits between 1 and 4:

SELECT record FROM myrecords WHERE record ~ '[1-4]';

Here are a few more range examples:

SELECT record FROM myrecords WHERE record ~ '[a-c5]';
SELECT record FROM myrecords WHERE record ~* '[a-c5]';
SELECT record FROM myrecords WHERE record ~ '[a-cA-C5-7]';

Matching two or more characters in a pattern

The previous examples made pattern searches one character or class at a time. However, you can use many classes in a pattern. These three statements return the same string from the example table:

SELECT record FROM myrecords WHERE record ~ '3[a]';
SELECT record FROM myrecords WHERE record ~ '[3][a]';
SELECT record FROM myrecords WHERE record ~ '[1-3]3[a]';

These two statements also return the same records:

SELECT record FROM myrecords WHERE record ~ '[23][a]';
SELECT record FROM myrecords WHERE record ~ '[2-3][a]';

Always keep in mind the potential for confusion. This next returns nothing because there is no such character string, ac, in any of the records in the example table:

SELECT record FROM myrecords WHERE record ~ '[a][c]';

However, introducing a range for each class returns the records:

SELECT record FROM myrecords WHERE record ~ '[a-b][b-c]';

Excluding strings in a class

To create a class of characters to exclude, insert the caret, ^, immediately after the left bracket of a class. Inserting at any other point other than as the first character in the class means you're looking for the caret in a string rather than excluding the class' character pattern.

Notice that queries can still return strings containing those excluded characters. Class exclusions prevent explicit searches using those characters.

This next statement excludes all digits from 0 to 9 from the target search. In other words, this expression returns strings that don't include digits.

SELECT record FROM myrecords WHERE record ~ '[^0-9]';

Making choices

The pipe | denotes alternation. In other words, it's a logical OR for pattern searches. Suppose you want to return all records with strings that begin with a or end with c:

SELECT record FROM myrecords WHERE record ~ '^a|c$';

This statement returns strings beginning either with a or 5, or ending with c:

SELECT record FROM myrecords WHERE record ~ '^a|c$|^5';

This next example performs a targeted search excluding digits and all lowercase letters. Control characters, spaces, and uppercase letters can all appear in the search pattern:

SELECT record FROM myrecords WHERE record ~ '[^0-9|^a-z]';

Repeating characters (quantifiers)

Sometimes you need to find strings that have repeated sets of the same character(s). You can find them with quantifiers--represented by the characters *, ?, and + as well as by digits enclosed within braces { }.

To find a sequence of 0 or more matches, use *:

SELECT record FROM myrecords WHERE record ~ 'a*'';

To find a sequence of one or more matches, use +:

SELECT record FROM myrecords WHERE record ~ 'b+';

To find a sequences of zero or one occurrence, use ?:

SELECT record FROM myrecords WHERE record ~ 'a?';

To find a sequences of exactly # matches, use {#}:

SELECT record FROM myrecords WHERE record ~ '[0-9]{3}';

To find a sequences of exactly # or more matches, use {#,}:

SELECT record FROM myrecords WHERE record ~ '[0-9]{4,}';

To find a sequences of # through ## (inclusive) matches, where # does not exceed ##, use {#, ##}:

SELECT record FROM myrecords WHERE record ~ '[a-c0-9]{2,3}';

Monday, January 24, 2011

Kernel Parameters

Modifying kernel parameters (Linux)

Before installing a DB2^(R) database system, you should update your Linux^(TM) kernel parameters. The default values for particular kernel parameters on Linux are not sufficient when running a DB2 database system. DB2 automatically raises the IPC limits where necessary based on the needs of the DB2 database system. However, it may be more practical to set the limits permanently on your Linux system if you have products or applications other than a DB2 database system.

Prerequisites

You must have root authority to modify kernel parameters.

Procedure

To update kernel parameters:

Red Hat and SUSE Linux

The following information can be used to determine if there are any necessary changes required for your system.

The following is the output from the ipcs -l command. Comments have been added following the // to show what the parameter names are.

   # ipcs -l

  ------ Shared Memory Limits --------
  max number of segments = 4096               // SHMMNI 
  max seg size (kbytes) = 32768               // SHMMAX
  max total shared memory (kbytes) = 8388608  // SHMALL
  min seg size (bytes) = 1

  ------ Semaphore Limits --------
  max number of arrays = 1024                 // SEMMNI
  max semaphores per array = 250              // SEMMSL
  max semaphores system wide = 256000         // SEMMNS
  max ops per semop call = 32                 // SEMOPM
  semaphore max value = 32767

  ------ Messages: Limits --------
  max queues system wide = 1024               // MSGMNI
  max size of message (bytes) = 65536         // MSGMAX
  default max size of queue (bytes) = 65536    // MSGMNB

Beginning with the first section on Shared Memory Limits, SHMMAX and SHMALL are the parameters that need to be looked at. SHMMAX is the maximum size of a shared memory segment on a Linux system whereas SHMALL is the maximum allocation of shared memory pages on a system.

For SHMMAX, the minimum required on x86 systems would be 268435456 (256 MB) and for 64-bit systems, it would be 1073741824 (1 GB).

SHMALL is set to 8 GB by default (8388608 KB = 8 GB). If you have more physical memory than this, and it is to be used for DB2, then this parameter should be increased to approximately 90% of the physical memory as specified for your computer. For instance, if you have a computer system with 16 GB of memory to be used primarily for DB2, then 90% of 16 GB is 14.4 GB divided by 4 KB (the base page size) is 3774873. The ipcs output has converted SHMALL into kilobytes. The kernel requires this value as a number of pages.

The next section covers the amount of semaphores available to the operating system. The kernel parameter sem consists of 4 tokens, SEMMSL, SEMMNS, SEMOPM and SEMMNI. SEMMNS is the result of SEMMSL multiplied by SEMMNI. The database manager requires that the number of arrays (SEMMNI) be increased as necessary. Typically, SEMMNI should be twice the maximum number of connections allowed (MAXAGENTS) multiplied by the number of logical partitions on the database server computer plus the number of local application connections on the database server computer.

The third section covers messages on the system.

MSGMNI affects the number of agents that can be started, MSGMAX affects the size of the message that can be sent in a queue, and MSGMNB affects the size of the queue.

MSGMAX should be change to 64 KB (that is, 65535 bytes), and MSGMNB should be increased to 65535 on Server systems.

To modify these kernel parameters, we need to edit the /etc/sysctl.conf file. If this file does not exist, it should be created. The following lines are examples of what should be placed into the file:

kernel.sem=250 256000 32 1024
#Example shmmax for a 64-bit system
kernel.shmmax=1073741824 
#Example shmall for 90 percent of 16 GB memory
kernel.shmall=3774873  
kernel.msgmax=65535
kernel.msgmnb=65535

Run sysctl with -p parameter to load in sysctl settings from the default file /etc/sysctl.conf.

   sysctl -p

To make the changes effective after every reboot, boot.sysctl needs to be active on SUSE Linux. On Red Hat, the rc.sysinit initialization script will read the /etc/sysctl.conf file automatically.

Friday, January 21, 2011

Postgresql Configuration

* 1 View currently running queries
* 2 Change lock file location
* 3 PostgreSQL vacuum
* 4 PG Buffer Cache (shared_buffers)
* 5 Working memory (work_memory)
* 6 Check points
* 7 Linux kernel system settings - SHARED MEMORY
* 8 Linux kernel system settings - SEMAPHORES

View currently running queries

In order to view the currently running queries, you can use this line:

SELECT client_addr, usename, datname, procpid, waiting, query_start, current_query FROM pg_stat_activity;

This can be issued from psql - example connection line:

/local/tsalomie/postgresql/bin/psql -h localhost -p 6001 -U tsalomie -d tpcw250_25000

Resources for this topic can be found here: http://chrismiles.info/systemsadmin/databases/articles/viewing-current-postgresql-queries/

Change lock file location

change the property unix_socket_directory to:

unix_socket_directory='/tmp'

in the postgresql.conf file.

or alternatively we can use the '-k /tmp' option when starting the postmaster (or from pg_ctl with -o option)

PostgreSQL vacuum

PostgreSQL uses multi versioning, resulting in unused pages that are not recovered. The vacuum process marks unused pages as available and can also delete them (shrinking the database). During the vacuum process, database statistics can also be obtained. This leads to the following vacuum options.

postgresql vacuumdb documentation

postgresql vacuum documentation

Synopsis:

VACUUM [ FULL | FREEZE ] [ VERBOSE ] [ table ]
VACUUM [ FULL | FREEZE ] [ VERBOSE ] ANALYZE [ table [ (column [, ...] ) ] ]

Only vacuum the [table_name] table, or if missing all tables in database:

VACUUM [table_name];

Vacuum full the [table_name] database, or if missing all the tables in the database. After this, db shrinkage will occur.

VACUUM FULL [table_name];

Vacuum analyze [table_name] database will do normal vacuum and generate db statistics. If [table_name] is not specified, then all tables in the database will be vacuum-analyzed.

VACUUM ANALYZE [table_name];

Vacuum full analyze [table_name] does (vacuum full) and (vacuum analyze). If [table_name] is not specified, all the tables in the database will be vacuum-full-analyzed.

VACUUM FULL ANALYZE [table_name];

To do a vacuum full on all databases on a server (in a cluster) do something like:

# vacuumdb -h localhost -p 5432 -a -f

PG Buffer Cache (shared_buffers)

* PG buffer cache - LRU algorithm + clock sweep algorithm - better performance than basic disk cache algorithm

* Data in PG buffer cache ~ data in OS disk cache => bad idea to give too much memory to PG buffer cache, still as the caching algorithm in PG is better than the OS one, too small is bad again - tricky problem of balance

* PG relies heavily on OS disk cache - gives it OS independence ...? (why not have OS specific implementations as DB performance is priority not platform independence)

* Practice proves PG buffer cache should be 0.25 - 0.33 of RAM

* TOTAL_RAM - (OS_TAKEN_RAM + DATABASE_ALLOTED_RAM + APPLICATIONS) = effective_cache_size

* On Windows OS - don't push shared_buffers over 10k - 50k (aka 80MB - 400MB)

Working memory (work_memory)

* Not too big, this is used for sorting, ordering, grouping operations.

* If there operations are not frequent, then keep value small

* Value around 10-20MB is normal - increase if needed

Check points

* Remember that if frequent write operations the amount of checkpoints performed increases with the size of shared_buffers

* Solution - rely more on OS disk cache

Linux kernel system settings - SHARED MEMORY

There are a couple of linux system settings that can affect Postgresql: $ sysctl -w kernel.shmmax=134217728 $ sysctl -w kernel.shmall=2097152

They represent the Shared Memory Max and All values. The commands above show how to inject these values in sysctl. They can also be manually set in /etc/sysctl.conf with su rights.

Once they are set, one can apply them by: sysctl -p /etc/sysctl.conf

SHMMAX defines the maximum contiguous memory area that can be requested

SHMALL defines the amount of memory all the shared memory areas can take up. The rule is:

MAX TOTAL SHARED MEMORY AREAS = PAGE_SIZE * SHMALL

PAGE_SIZE = getconf PAGE_SIZE

SHMALL = cat /proc/sys/kernel/shmall

Remark: making SHMALL bigger than the amount of available RAM leads to swapping! bad bad bad

Remark from [1]: shmall determines the total amount of shared memory to be allocated using its value multipled by the OS pagesize.

Linux kernel system settings - SEMAPHORES

SEMMNI = at least CEIL(MAX CONNECTIONS / 16), this is Maximum number of semaphore identifiers (i.e., sets)

SEMMNS = at least CEIL(MAX CONNECTIONS / 16) * 17, this is Maximum number of semaphores system-wide

SEMMSL = at least 17, this is Maximum number of semaphores per set

The following two we don't touch.

SEMMAP Number of entries in semaphore map - see text

SEMVMX Maximum value of semaphore - at least 1000 (The default is often 32767, don't change unless forced to)

We can inspect the content for kernel.sem:

* cat /proc/sys/kernel/sem
* /sbin/sysctl -p | grep sem

The output has the structure: SEMMSL, SEMMNS, SEMOPM, SEMMNI

For a Postgresql server to handle 4800 connection we can use the line:

echo 250 32000 32 400 > /proc/sys/kernel/sem

Resources:

Postgresql documentation for kernel resources

Setting semaphore properties in linux

Wednesday, January 19, 2011

Delete Duplicate Records

create table temp_dup_mytable as select sid,eid,snum,count(sid) from mytable group by 1,2,3 having count(sid) > 1;

CREATE SEQUENCE temp_seq INCREMENT 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1;

ALTER TABLE mytable ADD COLUMN temp_id integer;

UPDATE mytable SET temp_id = nextval('temp_seq') from temp_dup_mytable tpu where mytable.eid = tpu.eid and mytable.sid=tpu.sid and mytable.snum=tpu.snum;

-- debug

select d.* from mytable d where d.temp_id>0;

delete from mytable where temp_id not in (select min(temp_id) from mytable where temp_id > 0 group by sid,eid,snum having count(*) > 0);

-- debug

select d.* from mytable d where d.temp_id>0;

ALTER TABLE mytable drop COLUMN temp_id;

alter table mytable add constraint sid_eid_snum_key unique(sid,eid,snum);

PostgreSQL: Granting permission on any schema and its tables

$ more grant_schema_table.sh

#Name grant_schmea_table.sh

#Author Deepak Murthy

#Created 20100113

#Version 1.0

#*************************************************************************

#Description

#This script is for granting permission on any schema and its tables.

#####

#example – ./grant_schmea_table.sh sandbox_dba role_dba_ro fimdwh select 5432

#example – ./grant_schmea_table.sh sandbox_dba role_dba_rw fimdwh select,insert,delete,update 5432

#####

echo “grant usage on schema $1 to $2; set search_path=$1; \d” |psql -U gpadmin -p $5 -t -A -d $3| grep $1|cut -d”|” -f2 | nawk -v S=$1 ‘BEGIN{RS=”"} { for(i=1;i<=NF;i++) printf S “.” $(i) “,”;print

“” }’|sed -e ‘s/.$//’|nawk -v U=$2 -v P=$4 ‘{print ”grant “P” on “$1″ to “U”;”}’|psql -U gpadmin -p $5 -t -Aq -d $3

$ more grant_schema_table.sh#Name grant_schmea_table.sh#Author Deepak Murthy#Created 20100113#Version 1.0#*************************************************************************#Description#This script is for granting permission on any schema and its tables.######example – ./grant_schmea_table.sh sandbox_dba role_dba_ro fimdwh select 5432#example – ./grant_schmea_table.sh sandbox_dba role_dba_rw fimdwh select,insert,delete,update 5432#####echo “grant usage on schema $1 to $2; set search_path=$1; \d” |psql -U gpadmin -p $5 -t -A -d $3| grep $1|cut -d”|” -f2 | nawk -v S=$1 ‘BEGIN{RS=”"} { for(i=1;i<=NF;i++) printf S “.” $(i) “,”;print”" }’|sed -e ‘s/.$//’|nawk -v U=$2 -v P=$4 ‘{print ”grant “P” on “$1″ to “U”;”}’|psql -U gpadmin -p $5 -t -Aq -d $3