Document

Programming articles and books.

Sphinx 1.10-beta 测试版 新功能介绍


2010年7月19日,Sphinx 1.10-beta版本发布,增加了很多新功能,主要亮点如下:

  • 实时索引支持
  • 字符串字段值存储
  • MPM模式支持
  • 外部文本文件索引
  • searchd直接支持MySQL协议
  • SphinxQL接口查询完善

新增功能列表:

  • 实时索引:added RT indexes support (Chapter 4, Real-time indexes)
  • MPM模式支持:fork、prefork、threads;added prefork and threads support (workers directives)
  • 本地分布式搜索支持多线程;added multi-threaded local searches in distributed indexes (dist_threads directive)
  • AddQuery()子查询缓存added common subquery cache (subtree_docs_cache,
    subtree_hits_cache directives)
  • 文本字段信息存储及返回;added string attributes support (sql_attr_string,
    sql_field_string,
    xml_attr_string,
    xml_field_string directives)
  • 索引时词汇数统计;added indexing-time word counter (sql_attr_str2wordcount,
    sql_field_str2wordcount directives)
  • SphinxQL模式支持高亮调用和分词结果调用;added CALL SNIPPETS(),
    CALL KEYWORDS() SphinxQL statements
  • SphinxQL模式支持字段权重和索引权重设置;added field_weights, index_weights options to
    SphinxQL SELECT statement
  • SphinxQL模式支持SphinxSE(仅insert数据);added insert-only SphinxQL-talking tables to SphinxSE (connection='sphinxql://host[:port]/index')
  • SphinxSE查询中支持select计算;added select option to SphinxSE queries
  • 增加searchd崩溃coredump文件设置;added backtrace on crash to searchd
  • 支持从SQL字段属性值调入外部文本文件索引;added SQL+FS indexing, aka loading files by names fetched from SQL
    (sql_file_field directive)
  • 多线程模式时支持watchdog守护;added a watchdog in threads mode to searchd
  • 索引合并时自动消除脏数据;added automatic row phantoms elimination to index merge
  • 不记录词汇在文档的位置(可减小索引大小和提高性能);added hitless indexing support (hitless_words directive)
  • indextool增加参数:added --check, --strip-path, --htmlstrip, --dumphitlist ... --wordid switches to indextool
  • searchd增加参数--stopwait提升退出时保存信息的能力,--logdebug设定调试信息输出;added --stopwait, --logdebug switches to searchd
  • indexer增加参数--dump-rows用以输出所有读取到的数据,--verbose用以输出详细的问题信息;added --dump-rows, --verbose switches to indexer
  • 混合字符设置(AT&T可被所分解为AT&T、AT、T);added "blended" characters indexing support (blend_chars directive)
  • 增加sql_joined_field,用于支持类似SQL的GROUP_CONCAT处理,以及对SPH_RANK_PROXIMITY_BM25下评分值的控制;added joined/payload field indexing (sql_joined_field directive)
  • 增加FlushAttributes()接口用于将内存信息保存到磁盘;added FlushAttributes() API call
  • BuildExcerpts增加多项处理参数;added query_mode, force_all_words, limit_passages, limit_words, start_passage_id, load_files, html_strip_mode,
    allow_empty options, and %PASSAGE_ID% macro in before_match, after_match options
    to BuildExcerpts() API call
  • SELECT支持获取@groupby/@count/@distinct信息;added @groupby/@count/@distinct columns support to SELECT (but not to expressions)
  • 查询时自动展开词汇(可能会降低性能);added query-time keyword expansion support (expand_keywords directive,
    SPH_RANK_SPH04 ranker)
  • 批量查询数目限制;added query batch size limit option (max_batch_queries directive; was hardcoded)
  • 增加SINT()函数用于正确解析32位整数到64位;added SINT() function to expressions

新功能说明:

  • 因为目前新增功能较多,相关功能还在监测和完善中,部分功能可能存在不可预知的问题,暂时请不要使用到正式场合!
  • 实时索引:目前限制较多,尚不太稳定,已知问题请查看问题列表
  • 字符串值存储:单字符串4MB大小限制,不要存储大量的数据,尚未优化更好的存储方式
  • 详细变更列表:查看

测试版文档:

测试版下载:

    Sphinx 1.10-beta 新! (r2420; 2010-07-19)
    Source tarball (tar.gz) 1.10-beta 1.4M 下载
    Win32 binaries w/MySQL support 1.10-beta 3.1M 下载
    Win32 binaries w/MySQL+PostgreSQL support 1.10-beta 4.6M 下载
    Win32 binaries w/MySQL+PgSQL+libstemmer+id64 support 1.10-beta 4.9M 下载
    RHEL 5.x x86_64 RPM 1.10-beta 3.1M 下载
    RHEL 5.x i386 RPM 1.10-beta 3.6M 下载
    RHEL 4.x i386 RPM 1.10-beta 3.5M 下载
    RHEL 5.x x86_64 binaries 1.10-beta 3.1M 下载
    RHEL 5.x i386 binaries 1.10-beta 3.6M 下载
    RHEL 4.x i386 binaries 1.10-beta 3.5M 下载

firepass ssl vpn linux 客户端

firepass ssl vpn 的windows版本使用IE的插件,非常方便,也没有什么不兼容的问题.
在linux平台,只能使用firefox的插件,但官方对firefox的支持没有那么快,到目录支持firefox 2.x -- firefox 3.0的版本.再新的firefox 3.5以上都没有相关的插件更新.

另外,这个linux 上的firefox插件只能使用firefox 32位版本,64位不行.所以,如果是64位系统,则还需要下载32位版本的firefox,并安装.

参考:
http://andyleonard.com/2010/05/20/installing-the-f5-firepass-vpn-client-...

全文检索引擎sphinx 中文手册

最近全文检索引擎 sphinx推出了功能全面的1.10-beta版本.
在此发布以前工作成果,sphinx 0.9.9中文手册,为以后新版本文档跟进做准备:
sphinx 0.9.9中文手册

翻译草稿,不完善,请谅解.

sphinx 1.10 新功能.
* 实时全文检索引擎,可以像update数据库一样更新索引
* 支持字条串原文存储功能.

karia2 will not have version 0.1.0

最初打算在0.1的时候,程序可能基本稳定,功能比较全面.
但是现在svn版本号到了0.0.98,感觉程序总体还有问题,
不完善,这可能是最初设计功能复杂了些,现在svn中跳过0.1.0版本,直接到了0.1.60版本,希望能在0.2上程序完善起来,并发布真正的alpha preview版本.
在0.2版本到来之前,只会有几个snapshot版本.
希望这些snapshot版本能给部分用户使用,使用过程中遇到问题能反馈回来,以便改进karia2.

一种基于skype的下载资源搜索方法

传统的下载工具均使用自身的下载资源搜索方法,但是这种方法一般都需要直接连接某一具有外网IP的服务器,或者资源在这个服务器上,或者这服务器是一个资源服务器群的对外proxy,这种方法无论怎么加密,保密,很容易被某些不希望支持p2sp方式下载的组织或者公司屏蔽,失去了通过资源加速下载的功能.

在这希望引入一个skype P2P网络层,在这个层次上传递下载资源相关数据.众所周知,目前封堵skype还有难度,也就是说,只要用户愿意使用基本skype的下载资源搜索,谁都封堵不了这一功能.

这种方法的优点,
* 服务器端不需要有固定IP,不需要资源服务器直接接入外网,甚至运营前期数据不大的情况下不需要部署服务器到商业机房,只要内网有长时间运行的一台电脑,几个skype账号即可.
* 不容易被封.原因上面已经说明.
* 扩容容易,多个skype账号即可.

也有缺点,
* 对不经常使用skype的用户,需要多运行一个IM软件.
* 传递数据量有所限制
* 由于skype传递信息的方式的关系,客户端可能接到响应不及时.

在没有其他更好的搜索资源方法之前,重点实现这一方法.尽快让用户能用上.

Asterisk Dialplan命令中文翻译

Asterisk的拨号计划命令,翻译自
http://www.voip-info.org/wiki/view/Asterisk+-+documentation+of+applicati...

网上有个《asterisk app命令中文翻译》,版本比较老,内容更像是软件翻译的。我参考了下,并根据voip-info.org现有新版本做了些修正。由于内容很多名词比较专业,翻译的不够完整,英文实在不好的可以参考一下。

Asterisk Dialplan Commands
常规命令

  • Authenticate:鉴别用户
  • VMAuthenticate:根据“voicemail.conf”鉴别用户
  • Bridge:连接两个专用来话 (1.6+)
  • ChannelRedirect:重定向现有的通道到拨号计划
  • Curl:允许接受外部的URLs。并且支持POSTing,不支持CURL。
  • DUNDiLookup:用DUNDi查寻号码
  • Log:向日志文件中添加文件 (1.6+)
  • JabberSend:经过Jabber发送文件消息(Jabber是一个即时通信软件)
  • Page:Page a mobile device (new in Asterisk v1.2)
  • SendDTMF:发送专用的DTMF数据
  • SendImage:发送图像文件
  • SendText:发送给客户文本消息
  • SendURL:发送给客户URL用以显示
  • System:执行系统命令
  • Transfer:将呼叫方转移到另一个分机
  • TrySystem:执行系统命令并且总是返回0
  • Wait:稍等
  • WaitExte:等待呼叫方拨打一个新的分机
  • WaitForRing:等待呼叫申请
  • WaitMusicOnHold:等待,在线拨放音乐(1.6)


话单与计费

  • ForkCDR:将CDR分成两个独立的实体
  • NoCDR:确保 asterisk没有把CRD保存为特定的呼叫
  • ResetCDR:重置CDR数据
  • SetAccount:设置帐簿号码
  • SetAMAflags:开通AMA标志开户的渠道
  • SetCDRUserField:建立CDR用户领域
  • AppendCDRUserField:给CDR用户附加数据


呼叫管理(hangup, answer, dial, etc)

  • AMD:侦测自动应答机
  • Answer:应答一个振铃的通道
  • Busy:示忙,等待挂断
  • ChanIsAvail:检测通道可用性
  • Congestion:显示拥塞等待挂断
  • Dial:设置一个呼叫连接当前的通道
  • DISA:直接访问内部系统
  • Hangup:无条件挂断
  • RetryDial:一个电话,重试失败,允许可选的分机
  • Ringing:显示呼叫


呼叫方描述  (ID, Name etc)

  • CallingPres:更改呼叫方的身份介绍
  • LookupBlacklist:从黑名单数据库上查询访客的身份 姓名和号码
  • LookupCIDName:从当地的数据库查询访客的身份和姓名
  • PrivacyManager:如果没有CallerID过来,要求输入电话号码
  • SetCallerID:建立CallerID,发生矛盾服从CallerID
  • SetCallerPres:建立访客资料的独立信道
  • SetCIDName:建立呼叫者身份 姓名,发生矛盾服从CallerID
  • SetCIDNum:只建立访客的身份号码(没有姓名),发生矛盾服从CallerID
  • SoftHangup:请求挂断转到另一个信道
  • Zapateller:拥SIT阻止远程商人


ADSI

  • ADSIProg:将Asterisk ADSI脚本加载到电话
  • GetCPEID:获得ADSI CPE ID


数据库操作

  • DBdel:从数据库中删除一个键
  • DBdeltree:从数据库中删除一批数据
  • DBget:从数据库中取出数值. 发生矛盾服从DB.
  • DBput:在数据库中储存一个数值. 发生矛盾服从DB.
  • MYSQL:执行不同的 mySQL 数据库行为
  • RealTime:使用实时数据库资料填充变量
  • RealTimeUpdate:使用实时数据库更新字段


应用程序整合

  • AGI:执行一套AGI应用程序
  • DeadAGI:在中断的信道上执行AGI
  • EAGI:在音频通道上执行一套 AGI 适应程序
  • EnumLookup:在 ENUM 中查找号码
  • ExternalIVR:执行扩展IVR引擎
  • Jack:Interface to JACK audio for audio manipulation (1.6+)
  • Macro:执行宏
  • MacroExclusive:同一时间只有一个通道可呼叫这个宏,所有其他人只能等待(1.4)
  • MacroExit:退出的宏,就像它已全部完成 (1.4)
  • NoOp:不进行操作,可将数值打印到控制台进行调试
  • Perl:res perl 是Apache 的 mod_perl, 只适用于 Asterisk
  • PHP:res_php 不需要就可将PHP与进行一体化处理
  • Read:读有效的DTMF变量
  • TXTCIDName:从 TXT 记录中查找访客的姓名
  • UserEvent:发送专用事件给管理接口


 
控制流与超时设定

  • AbsoluteTimeout:设置最长呼叫时间
  • DigitTimeout:设置数字间的最长时间间隔
  • Gosub:跳到一个子程序并且返回 (new in v1.2)
  • GosubIf:有条件的跳到一个子程序并且返回 (new in v1.2)
  • Goto:跳转到特定的priority, extension, 或context(专用名词)
  • GotoIf:带条件判断的Goto
  • GotoIfTime:带时间条件判断的Goto
  • Random:在拨号栏中随机跳动
  • ResponseTimeout:设置等候回应的最长时间
  • Return:从Gosub或GosubIf返回(new in v1.2)
  • StackPop:删除返回地址,但没有返回(new in v1.2)
  • While:开始一个当循环- *1.2beta
  • EndWhile:结束一个当循环 - *1.2beta
  • ExecIf:条件执行 - *1.2beta
  • ExitWhile:退出循环

 
字符串与变化处理

  • ImportVar:变量赋值
  • Math:进行简单的计算
  • SetGlobalVar:赋数值变量
  • Set:设置通道变量和功能值
  • SetVar:赋数值变量(如果不允许,使用 Set)
  • DBRewrite:执行的Perl兼容正则表达式和替代了一个MySQL数据库。
  • DumpChan:清除有关这个通道的呼叫信息


语音:放音

  • Background:在等待分机的时候播放声音文件
  • BackgroundDetect:通话检测背景音文件
  • ControlPlayback:用快进,重绕和退出当前命令控制器播放声音文件
  • DateTime:报出日期/和时间(已经废了)
  • Echo:重复声音回复到用户端
  • Festival:使用Festival语音合成器(Festival是一个开源的TTS)朗诵课文
  • Milliwatt:在0分贝产生持续的1000赫兹的音调
  • MP3Player:播放MP3声音文件
  • MusicOnHold:无限期地播放音乐
  • Playback:播放一个声音文件
  • Playtones:当执行其它命令的时候播放声音列表
  • SayUnixTime:报出日期/和时间
  • SayAlpha:报出字母
  • SayDigits:报出阿拉伯数字
  • SayNumber:报出号码
  • SayPhonetic:报出语音
  • SetMusicOnHold:在控制系统上设置默认的音乐
  • SetLanguage:为录音重放设置语言转换
  • StopPlaytones:停止播放音调列表


语音:录音和监测

  • ALSAMonitor:监测ALSA控制台
  • ChangeMonitor:转换监测的路线文件名
  • ChanSpy:通用的通道插话
  • Dictate:录音和回放
  • MixMonitor:记录和混合呼叫语音(不同于监视器)v1.2.x
  • Monitor:录音电话会话到声音文件
  • Record:通话录音保存到语音文件
  • StopMonitor:停止监测路线
  • StopMixMonitor:停止监视通道监测MixMonitor


 
SIP 命令

  • SIPdtmfMode:在SIP呼叫过程中转换DTMF模式
  • SIP_HEADER():挑选SIP invite消息的任何header(取代了SIPGetHeader )
  • SIPAddHeader:填标题到外部的SIP邀请

 
DAHDI(过去的ZAP)命令

  • Flash:刷新到Zap中继
  • DAHDIBarge:Barge in DAHDI 通道(原来的ZapBarge)
  • ZapCD:ISDN call deflection (bristuff)
  • BristuffZapEC:Echo cancellation on/off (bristuff)
  • DAHDISendKeypadFacility:Send digits out of band over a PRI
  • DAHDIRAS:提供ISDN数据服务 (原来的ZapRAS)
  • DAHDIScan:扫描DAHDI通道到监控呼叫(原来的ZapScan)


语音邮件和会议

  • Directory:规定语音邮件扩展目录
  • HasNewVoicemail:有条件的分支到优先权+101
  • MailboxExists:检查邮箱是否存在
  • MeetMe:简单的 MeetMe 会议桥
  • MeetMeAdmin:MeetMe 会议管理
  • MeetMeCount:MeetMe参与者统计
  • MiniVM:Mini-Voicemail (new in v1.6)
  • VoiceMail:语音留言
  • VoiceMailMain:进入语音系统
  • VMAuthenticate:基于语音邮箱鉴别用户


呼叫队列和ACD操纵

  • AddQueueMember:动态添加排队成员
  • AgentCallbackLogin:呼叫座席回叫注册
  • AgentLogin:呼叫座席注册
  • AgentMonitorOutgoing:记录座席的对外呼叫
  • ParkAndAnnounce:驻留通告
  • PauseQueueMemeber:暂停一个座席
  • ParkedCall:回复被停放的呼叫
  • Queue:排队等代呼叫
  • RemoveQueueMember:动态移除列队成员
  • UnpauseQueueMemeber:恢复一个座席


警报监测/中心站

  • AlarmReceiver:仿效一个Ademco 连接ID的警报接受站


业余广播/转发连接

  • Rpt:支持业余无线电广播和商业化的两种转发器连接


外部应用(非Digium的CVS里)

  • Asterisk app_dbodc:用unixODBC进行拨号方案修正
  • DynExtenDB:数据库中的存储扩充
  • Iconv:字符转换设置
  • app Prepaid:为Postgres 所设计的
  • PPPD:邮件后台程序连接器


Bristuff & zaphfc applications

  • PickUp:大多数路线是独立的;是bristuff片的一部分
  • PickUpChan:接收指定路线
  • BristuffPickupSIPuri
  • PickDown:挂断长途电话
  • Steal:接受中转电话
  • Devstate:产生一种安装设备驱动程序状态来改变事件(忙,打电话…)
  • Segfault:用 segfault 撞击Asterisk
  • ZapEC:对于Zap能或者不能的回波消除
  • Autoanswer:在指定范围内自动回话
  • AutoanswerLogin:登陆到自动回话应用程序


ISDN应用

  • ISDN重叠拨号


Sirrix通道的应用

  • SrxEchoCan:不能/能回波消除
  • SrxDeflect:偏转进入的电话
  • SrxMWI:在Sirrix组上 设置/重新设置信息等待指示(MWI)

ostgreSQL Performance and Maintenance (postgres 优化与维护)

转: http://ruimemo.wordpress.com/2010/03/31/postgresql-performance-and-maint...

PostgreSQL Performance and Maintenance (postgres 优化与维护)

每次看postgresql的设定,好像每次都忘记。过一段时间又要看。今天终于决定,开始blog吧。

postgresql 优化与维护

1. 硬件

数据库最重要的就是 I/O了。所以一切从I/O开始。

RAID: 这个基本不用说,数据库放RAID10上面,只读的备份数据库可以放RAID0,反正挂了没关系。谨记:数据库是Random Read

RAID卡的选择:

RAID卡一定要带电池的才可以(BBU)有电源的才能做到东西写进 CACHE,RAID就返回硬盘写成功(不用等)

1. Areca

2. LSI (真正的LSI,re-brand不要)

3. HP P400 以上系列

硬盘选择:

首选是SAS: 15K RPM 每个SAS大约能提供25MB/s的Random Write。也就是说在RAID10的设定下,如果需要50MB/s的Random Write就需要4个硬盘

节俭选择是: SATA 可以多用几个硬盘(SAS一倍数量)达到在RAID10中接近SAS的速度。就算SATA买SAS一倍的数量,价格仍然比SAS便宜。

也可以买 产品: 例如 Compaq的 MSA 70 (P800 Battery backed RAID control)

CPU:64位

Cache:越大越好 (现在个人电脑都3M的cache了)

CORE:越多 越好 (postgresql毕竟是跑cpu的)建议最少4个core

RAM: 最少4G。通常根据具体需求,用16-64G的RAM

2. OS (系统)

可用系统:

1. Debian Stable

2. CentOS

3. Ubuntu LTS

4. Red Hat

5. SUSE Enterprise

如果准备付费(服务),那么就是 Canonical, Novell 跟 Redhat这三家选择而已

如果准备不买任何服务,可以用Debian, CentOS, Ubuntu LTS

这里还是觉得系统用Red Hat (不付费就CentOS)毕竟人家是企业级的老大哥,错不了。

* 现在CentOS也可以买到服务了。

不可用系统: 例如 fedora (redhat QA) ubuntu (non-LTS)

Scheduler:

Grub 增加: elevator=deadline

redhat 的图标可以看出,deadline是数据库的最佳选择

文件系统 (Filesystem)

这里的选择是:ext2,ext3 跟 ext4。 为什么只考虑这几个呢?因为数据库还是稳定第一,内核开发人员所做的文件系统,理论上说出问题的情况会少点。



WAL: 放ext2
因为WAL本身自己有Journal了,不需要用ext3 (ext2快很多)

data: ext3

Block Size: postgres自己是8k的block size。所以文件系统也用8k的 block size。这样才能最佳的提高系统的效能。

ext4:出来时间还 不够长,不考虑。

分区 (Partitioning)

Postgres 跟系统 OS 应该在不同分区

系统(OS):系统应该放独立的RAID1

数据库 (Postgres Data):数据库应该放独立的RAID10上。 如果RAID是带电池的,mount 的时候给 data=writeback的选项

独立的数据库分区,就不许要记录文件时间了(都是放数据的)所以mount的时候要给noatime的 选项,这样可以节约更新时间(timestamp)的I/O了。

WAL日志(xlogs): 独立的RAID1上 (EXT2 系统)日志是 Sequential write,所以普通的硬盘(SATA)速度就足够了,没有必要浪费SAS在log上

Postgresql 日志(logs):直接丢给syslog就可以。最好在syslog.conf中设定单独的文件名. 这里 例如用local2来做postgresql

local2.* -/var/log/postgres/postgres.log

记得log要给Async,这样才不会等卡在log的I/O上, 同时记得设定logrotate以及创建路径(path)

ext2 VS ext3 性能测试:

HP DL585

4 Dual Core 8222 processors

64GB RAM

(2) MSA70 direct attached storage arrays.

25 spindles in each array (RAID 10)

HP P800 Controller

6 Disk in RAID 10 on embedded controller

xlog with ext3: avg = 87418.44 KB/sec

xlog with ext2: avg = 115375.34 KB/sec

3. Postgres 内存 (Memory Usage)

Shared Buffer Cache

Working Memory

Maintenance Memory

Shared Buffers

Postgres 启动时要到的固定内存。每个allocation是8k。 Postgres不直接做硬盘读写,而是把硬盘中的东西放入Shared Buffers,然后更改Shared Buffers,在flush 到硬盘去。

通常 Shared Buffers设定为内存(available memory)的25%-40%左右。

在系统(OS)中,记得设置 kernel.shmmax的值(/etc/sysctl.conf)

kernel.shmmax决定了进程可调用的最大共享内存数量。简单的计 算方法是

kernel.shmmax=postgres shared_buffers + 32 MB

要保留足够的空间(不然会out of memory)postgresql除了shared buffer还会用到一些其他的内存,例如max_connections, max_locks_pre_transaction

Working Memory

这个是postgres运行作业中 (task)需要的内存,例如内存内的hashed (aggregates, hash joins)sort (order by, distinct 等等)合理的设定,可以保证postgres在做这些东西的时候可以完全在内存内完成,而不需要把数据吐回到硬盘上去作swap。但是设定太大的话,会造 成postgres使用的内存大于实际机器的内存,这个时候就会去硬盘swap了。(效能下降)

working memory是per connection and per sort的设定。所以设定一定要非常小心。举例来说,如果设定working memory为32MB,那么以下例子:

select * from lines, lineitems

where lines.lineid = lineitems.lineid

and lineid=6

order by baz;

这里就可 能用到64MB的内存。

hashjoin between lines and lineitems (32MB)

order by baz (32MB)

要注意自己有多少query是用到了order by或者join

如果同时有100个链接,那么就是 100 connection X 64MB = 6400MB (6G) 内存

通常来说,working mem不要给太大,2-4MB足够

在postgres 8.3之后的版本,working mem可以在query中设定

Query:

begin;

set work_mem to ’128MB’;

select * from foo order by bar;

insert into foo values (‘bar’);

reset work_mem;

commit;

Function:

create function return_foo() returns setof text as

$ select * from foo order by bar; $

SET work_mem to ’128MB’

LANGUAGE ‘sql’

postgres官方不建议(但是支持)在 postgresql.conf文件中更改work_mem然后HUP (数据库应该没有任何中断)

利用 explain analyze可以检查是否有足够的work_mem

sort (cost=0.02..0.03 rows=1 width=0) (actual time=2270.744..22588.341 rows=1000000 loops=1)

Sort Key: (generate_series(1, 1000000))

Sort Method: external merge Disk:13696kb


-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.006..144.720 rows=1000000 loops=1)

Total runtime: 3009.218 ms

(5 rows)

以上的 query分析显示,这里需要从硬盘走13MB的东西。所以这个query应给set work_mem到16MB才能确保性能。

Maintenance Memory (维护内存)

maintenance_work_mem 决定系统作维护时可以调用的内存大小。

这个也是同样可以在query中随时设定。

这个内存只有在VACUUM, CREATE INDEX 以及 REINDEX 等等系统维护指令的时候才会用到。系统维护是,调用硬盘swap会大大降低系统效能。通常maintenance_work_mem超过1G的时候并没有 什么实际的效能增加(如果内存够, 设定在1G足以)

Background Writer (bgwriter)

功能:

负责定时写 shared buffer cache 中的 dirty shared buffers

好处:

a. 减少系统flush shared buffers到硬盘(已经被bgwriter做了)

b. 在checkpoint中,不会看到I/O的突然性暴增,因为dirty buffers在背景中已经被flush进硬盘

坏处:

因为一直定时在背后flush disk,会看到平均硬盘I/O怎加(好过checkpoint时I/O暴增)

设定:

bgwriter_delay:

sleep between rounds。 default 200(根据机器,数据而调整)

bgwriter_lru_maxpages:

决 定每次bgwriter写多少数据。如果实际数据大于这里的设定,那么剩余数据将会被postgres的进程(server process)来完成。server porcess自己写的数据会造成一定的性能下降。如果想确定所有的数据都由bgwriter来写,可以设定这里的值为-1

bgwriter_lru_multiplier:

采 用计算的方式来决定多少数据应该被bgwriter来写。这里保持内置的2.0就可以。

计算bgwriter的I/O:

1000 / bgwriter_delay * bgwriter_lru_maxpages * 8192 = 实际I/O

(8192是 postgres的8k block)

例如:

1000/200 * 100 * 8192 = 4096000 = 4000 kb

bgwrater 可以用 pg_stat_bgwriter 来监测。如果想要观察bgwrater 的运行状况,记得首先清理旧的stat信息。

bgwriter如果设定的太大(做太多事情)那么就会影响到前台的效能 (server)但是如果由系统(server)来做buffer flush同样会影响效能。所以这里的最好设定就是通过观察 pg_stat_bgwriter 来找到一个最佳的平衡点。

WAL (write ahead log)

postgres中的所有写动作都是首先写入WAL,然后才执行的。这样可以确保数据的准确跟 完整。当中途数据库崩溃的时候,postgres可以通过WAL恢复到崩溃前的状况而不会出现数据错误等等问题。

WAL 会在两种情况下被回写硬盘。

1. commit。 当commit数据的时候,WAL会被强制写回硬盘(flush)并且所有这个commit之前的东西如果在WAL中,也会一同被flush。

2. WAL writer进程自己会定时回写。

FSYNC vs ASYNC

postgres 的 default 是做 fsync,也就是说postgres会等待数据被写入硬盘,才会给query返回成功的信号。如果设定sync=no关闭fsync的话,postgres不会等待WAL会写硬盘,就直接返回query成功。通常这个会带来15-25%的性能提升。

但是缺点就是,如果系统崩溃 (断电,postgres挂掉)的时候,你将有可能丢失最后那个transcation. 不过这个并不会造成你系统的数据结构问题。(no data corrupt)如果说在系统出问 题的时候丢失1-2笔数据是可以接受的,那么25%的性能提升是很可观的。

WAL设定:

fsync 可以选择on或者off

wal_sync_method:

linux中是使用fdatasync。其他的。。。不知道,应该是看系统的文 件参数了

full_page_writes:

开启的时候,在checkpoint之后的第一次对page的更改,postgres会将每 个disk page写入WAL。这样可以防止系统当机(断电)的时候,page刚好只有被写一半。打开这个选项可以保证page image的完整性。

关 闭的时候会有一定的性能增加。尤其使用带电池的RAID卡的时候,危险更低。这个选项属于底风险换取性能的选项,可以关闭



wal_buffers:

WAL 的储存大小。default 是 64 kb。 实验证明, 设定这个值在 256 kb 到 1 MB 之间会提升效能

wal_writer_delay

WAL 检查WAL数据(回写)的间隔时间。值是毫秒(milliseconds)

Checkpoints

确保数据回写硬盘。dirty data page会被 flushed回硬盘。

checkpoint 由以下3中条件激发 (bgwriter如果设定,会帮忙在后台写入,所以就不会有checkpoint时候的短期高I/O出现)

1. 到达设定的WAL segments

2. 到达设定的timeout

3. 用户下达checkpoint指令

如果 checkpoint运行频率高于checkpint_warning值。postgres会在日志(log)中记录出来,通过观察log,可以来决定 checkpoint_segments的设定。

增加cehckpoint_segments或者checkpoint_timeout可以有一 定的效能提升。而唯一的坏处就是如果系统挂了,在重启的时需要多一点时间来回复(系统启动回复期间数据库是不能用的)鉴于postgres很少挂掉,这个 其实可以设定的很长(1天都可以)

设定:

checkpoint_segments 最多的wal log数量,到达后会激发checkpoint,通常设定在30就好

checkpoint_timeout 一般设置15-20分钟,常的可以设定1天也没关系

checkpoint_completion_target 这个保持不动就好。内建是0.5,意思就是每个checkpoint预计在下个checkpoint完成前的一半时间内完成(听起来有点绕嘴,呵呵)

checkpoint_warning 如果checkpint速度快于这个时间,在log中记录。内建是30秒

理论中的完美设定,就是你的backend从来不用回写硬盘。 东西都是由background来写入的。这个就要靠调整bgwriter, checkpoints跟wal到一个最佳平衡状态。当然这个是理想中的完美,想真的做到。。。继续想吧。呵呵

4. 维护 – 保持postgres的笑容

维护数据库是必 须的。基本维护

vacuum

delete数据的 时候,数据库只是记录这笔数据是‘不要的‘并不是真的删除数据。所以这个时候就要vacuum了,vacuum会把标记为‘不要‘的数据清除掉。这里要注 意的是,vacuum不会清理index。当数据更改超过75%的时候,需要重新建立index。postgres 8.4 index可以用cluster重建速度快很多。在postgres 9.x中,vacuum=cluster,没有任何区别了(保留cluster只是为了兼容旧版指令)

Full Vacuum

这个会做exclusive lock。vacuum跟full vacuum的区别是vacuum会把标志为‘不要‘的空间标志成可以再次使用(回收)而 full vacuum则会把这个空间删除(返还给系统OS)所以vacuum之后你的postgres在硬盘上看到的占用空间不会减少,但是full vacuum会减小硬盘占用空间。不建议使用full vacuum,第一没必要,第二exclusive lock不好玩。

ANALYZE

Analyze 会更新统计信息(statistics)所有的query的最佳方案,以及sql prepared statement都是靠这统计信息而决定的。所以当数据库中的一定量数据变动后(例如超过10%),要作analyze,严格的说,这个是应该常做的东 西,属于数据库正常维护的一部分。另外一个很重要的就是,如果是 upload数据(restore那种)做完之后要记得作analyze(restore自动不给你作的)

当 建立新的table的时候,或者给table增加index,或者对table作reindex,或者restore数据进数据库,需要手动跑 analyze才可以。analyze直接影响default_statistics_target数据。

Autovacuum

根 据postgres的官方资料,autovacuum在8.3之后才变得比较真的实用(8.1推出的)因为在8.3之前,autovacuum一次只能同 时做一个数据库中的一个table。 8.3之后的版本,可以作多数据库多table。

设定

log_autovacuum_min_duration:

-1 为关闭。0是log全部。>0就是说超过这个时间的就log下来。例如设定为30,那么所有超过30ms的都会被日志记录。

autovacuum_max_workers:

同 时启用的autovacuum进程。通常不要设定太高,3个就可以。

autovacuum_naptime:

检查数据库的时 间,default是1分钟,不用改动

autovacuum_vacuum_threshold:

最低n行记录才会引发 autovacuum。也就是数据改变说低于这个值,autovacuum不会运行。default是50

autovacuum_analyze_threshold:

运 行analyze的最低值,跟上面的一样

autovacuum_vacuum_scale_factor:

table中的百分比的计算方 式(超过一定百分比作vacuum)内建是20% (0.2)

autovacuum_analyze_scale_factor:

同上, 不过是analyze的设定

autovacuum_freeze_max_age:

最大XID出发autovacuum

autovacuum_vacuum_cost_delay:

延 迟。。如果系统负荷其他东西,可以让vacuum慢点,保证其他东西的运行.这里是通过延迟来限制

autovacuum_vacuum_cost_limit:

同 上,也是作限制的,这里是通过cost限制limit

Cluster

Cluster 类似于vacuum full。建议使用cluster而不是vacuum full。cluster跟vacuum full一样会重写table,移除所有的dead row。同样也是要做exclusive lock。

Truncate

Turncat 会删除一个table中的所有数据, 并且不会造成任何的dead row(delete则会造成dead row)同样的,turncate也可以用来重建table

begin;

lock foo in access exclusive mode;

create table bar as select * from foo;

turncate foo;

insert into foo (select * from bar);

commit;

这样就重新清理了 foo这个table了。

REINDEX

重 新建立index

5. 其他

planner:

statistics直接决定planner的结 果。使用planner,那么要记得确保statistics的准确(analyze)

default_statistics_target:

设定analyze分析的值。这个可以在 query中随时设定更改

set default_statistics_target to 100;

analyze verbose mytable;

INFO: analyzing “aweber_shoggoth.mytable”

INFO: “mytable”: scanned 30000 of 1448084 pages, containing 1355449 live rows and 0 dead rows; 30000 rows in sample, 65426800 estimated total rows

ANALYZE

set default_statistics_target to 300;

analyze verbose mytable;

INFO: analyzing “aweber_shoggoth.mytable”

INFO: “mytable”: scanned 90000 of 1448084 pages, containing 4066431 live rows and 137 dead rows; 90000 rows in sample, 65428152 estimated total rows

ANALYZE

Set statistics per column 给不同的column设定不同的 statistics

alter table foo alter column bar set statistics 120

查找何时需要增加statistics

跑 个query作expain analyze

这个就会看到例如:

-> Seq Scan on bar (cost=0.00-52.00 rows=52 width=2 (actual time=0.007..1.894 rows=3600 loops=1)

这里的rows应该跟真正的rows数量差不多才 是正确的。

seq_page_cost

planner 作sequential scan时候的cost。default是1,如果内存,cache,shared buffer设定正确。那么这个default的值太低了,可以增加

random_page_cost

planner 作random page fetch的值。default是4.0 如果内存,cache,shared buffer设定正确,那么这个值太高了,可以降低

seq_page_cost跟random_page_cost的值可以设定成一样 的。然后测试效能,可以适当降低random_page_cost的值

cpu_operator_cost

default 是0.0025,测试为,通常设定在0.5比较好

set cpu_operator_cost to 0.5;

explain analyze select ….

cpu_tuple_cost

default 是0.01 测试为,通常设定在0.5比较好

set cpu_tuple_cost to 0.5;

explain analyze select …

effective_cache

应 该跟尽可能的给到系统free能接受的大小(越大越好)

total used free shared buffer cached

mem: xxxx yyyyy zzz aaaa bbbb cccc

设定的计算方法为:

effective_cache=cached X 50% + shared

这里的50%可以根据服务器的繁忙程度 在40%-70%之间调整。

监测方法:

explain analyze ;

set effective_cache_size=新的值;

explain analyze ;

reset effective_cache_size;

尝试出一个最适合的值,就可以改postgresql.conf文件设定成固定了。

Natural vs Primary Key

Primary Key 基本因为要做join,跟Natural相比多消耗20%左右的效能。所以尽力primary做在Natural key上。

Btree vs hash

btree 比 hash 快,不管什么情况,所以不要用hash

gin vs gist

full text的时候,用gin不要用 gist

<!--adcode-->


<!--/adcode-->

hack了一下pssh,现在更好用了

原来的pssh,只在使用无密码证书登陆的时候非常方便,而对只支持密码登陆的ssh主机,则会弹出输入密码的交互提示,导致自动化脚本运行不自动了.

由于服务器比较多,并且登陆方式不一样,有的用证书,有的使用密码,所以对pssh作一点hack,以方便提高工作效率.

现在对pssh稍微做了点修改,在定义hosts.txt文件的时候,可选地加入密码项,如果检测到密码则使用sshpass这个命令来处理密码的自动输入功能.这样就能全自动进行了.

原来的hosts.txt 格式:
1. host[:port] [login]
2. [user@]host[:port]

hack后的文件格式:
1. host[:port] [login[:passwd]]
2. [user[:passwd]@]host[:port]

虽然这样有些安全问题,但即使使用证书也存在同样级别的安全问题,因为如果保存hosts.txt的主机不安装,那么这主机上的登陆证书一样不安全. 把hosts.txt的权限设置与证书权限一样高,安全性是一样的.

pssh-2.1.1的补丁:

Index: psshlib/psshutil.py
===================================================================
--- psshlib/psshutil.py (版本 496)
+++ psshlib/psshutil.py (工作副本)
@@ -42,7 +42,12 @@
         else:
             sys.stderr.write("Bad line. Must be host[:port] [login]\n")
             sys.exit(3)
-        hosts.append((host, port, user))
+        # add by <a href="mailto:liuguangzhao@users.sf.net">liuguangzhao@users.sf.net</a>   host[:port] [login[:passwd]]
+        passwd = None
+        user_fields = user.split(':')
+        if len(user_fields) == 2:
+            user, passwd = user_fields
+        hosts.append((host, port, user, passwd))
     return hosts
 
 def parse_host(host, default_user=None, default_port=None):
@@ -54,4 +59,8 @@
         user, host = host.split('@', 1)
     if ':' in host:
         host, port = host.rsplit(':', 1)
-    return (host, port, user)
+    # add by <a href="mailto:liuguangzhao@users.sf.net">liuguangzhao@users.sf.net</a>   [user[:passwd]@]host[:port]
+    passwd = None
+    if ':' in user:
+        user, passwd = user.split(':')
+    return (host, port, user, passwd)
Index: bin/pnuke
===================================================================
--- bin/pnuke   (版本 496)
+++ bin/pnuke   (工作副本)
@@ -51,8 +51,11 @@
     if opts.errdir and not os.path.exists(opts.errdir):
         os.makedirs(opts.errdir)
     manager = Manager(opts)
-    for host, port, user in hosts:
+    for host, port, user, passwd in hosts:
         cmd = ['ssh', host, '-o', 'NumberOfPasswordPrompts=1']
+        if passwd != None:
+            cmd = ['sshpass', '-p', passwd, 'ssh', host, '-o', 'NumberOfPasswordPrompts=1']
+        
         if not opts.verbose:
             cmd.append('-q')
         if opts.options:
Index: bin/pscp
===================================================================
--- bin/pscp    (版本 496)
+++ bin/pscp    (工作副本)
@@ -61,8 +61,14 @@
     if opts.errdir and not os.path.exists(opts.errdir):
         os.makedirs(opts.errdir)
     manager = Manager(opts)
-    for host, port, user in hosts:
-        cmd = ['scp', '-qC']
+    for host, port, user, passwd in hosts:
+        if passwd == None:
+            cmd = ['scp', '-qC']
+            # sys.stderr.write('using standard scp method\n')
+        else:
+            cmd = ['sshpass', '-p', passwd, 'scp', '-qC']
+            # sys.stderr.write('using sshpass scp method\n')
+
         if opts.options:
             cmd += ['-o', opts.options]
         if port:
Index: bin/pssh
===================================================================
--- bin/pssh    (版本 496)
+++ bin/pssh    (工作副本)
@@ -90,9 +90,16 @@
             sys.stderr.write('Automatic reading from stdin is deprecated.  '
                     'Please use the -I option.\n')
     manager = Manager(opts)
-    for host, port, user in hosts:
-        cmd = ['ssh', host, '-o', 'NumberOfPasswordPrompts=1',
-                '-o', 'SendEnv=PSSH_NODENUM']
+    for host, port, user, passwd in hosts:
+        if passwd == None:
+            cmd = ['ssh', host, '-o', 'NumberOfPasswordPrompts=1',
+                   '-o', 'SendEnv=PSSH_NODENUM']
+            # sys.stderr.write('using standard ssh method\n')
+        else:
+            cmd = ['sshpass', '-p', passwd, 'ssh', host, '-o', 'NumberOfPasswordPrompts=1',
+                   '-o', 'SendEnv=PSSH_NODENUM']
+            # sys.stderr.write('using sshpass method\n')
+
         if not opts.verbose:
             cmd.append('-q')
         if opts.options:
Index: bin/prsync
===================================================================
--- bin/prsync  (版本 496)
+++ bin/prsync  (工作副本)
@@ -66,8 +66,15 @@
     if opts.errdir and not os.path.exists(opts.errdir):
         os.makedirs(opts.errdir)
     manager = Manager(opts)
-    for host, port, user in hosts:
-        ssh = ['ssh']
+    for host, port, user, passwd in hosts:
+        # ssh = ['ssh']
+        if passwd == None:
+            ssh = ['ssh']
+            # sys.stderr.write('using standard ssh method\n')
+        else:
+            cmd = ['sshpass', '-p', passwd, 'ssh']
+            # sys.stderr.write('using sshpass method\n')
+
         if opts.options:
             ssh += ['-o', opts.options]
         if port:

PostgreSQL 9.0新功能示例

PostgreSQL 9.0 release notes

This document showcases many of the latest developments in PostgreSQL 9.0, compared to the last major release – PostgreSQL 8.4. There are more than 200 improvements in this release. While all the important advances are explained and demonstrated, some of the more esoteric changes are not covered, but these are unlikely to be known issues for most users.

Hot Standby and Streaming Replication are the two new features that mark Version 9.0 as a landmark in PostgreSQL's development.

While there are several third party replication solutions available for PostgreSQL that meet a range of specific needs, this new release brings a simple, sturdy and integrated version that will probably be used as a default in most High Availability installations using PostgreSQL.

Changes are arranged in four chapters:

  • The two new features you can't ignore
  • Other new features
  • Potential issues when upgrading existing systems
  • Other improvements


Contents

[hide]


The two features you can't ignore

The significance of these two new features is the motivation for allocating a full version number to this release – 9.0 (not 8.5).


Hot Standby

This feature allows users to create a 'Standby' database – that is, a database replaying the primary's binary log, while making it available for read-only queries. It is substantially similar to enterprise standby database features of top proprietary databases, such as Oracle's DataGuard.

Implementation of this feature took over two years and is quite complex. While read-only queries are running, the standby database has to be able to replay the binary modifications coming from the production database, decide if these modifications are conflicting with the read-only queries and which action should be taken as a consequence: pause the replay or kill some read-only queries and move forward. Hot Standby also adds some data in WAL logs for the standby database and a conflict resolution mechanism.

To setup Hot standby, you just have to set up the production database as follows:

postgresql.conf, Primary:

 wal_level = 'hot standby' # Adds the required data in the WAL logs
 # vacuum_defer_cleanup_age # You may want to set this up, but it could be complicated (see documentation)

Then, create a standby database (the same way you did with previous standby databases: pg_start_backup on primary, copy the files (on the , pg_end_backup on primary).

Then, you just have to copy logs and data files on the secondary server and add this in its postgresql.conf:

postgresql.conf, Secondary:

 hot_standby=on
 max_standby_delay=30s # -1= always wait, 0= never wait, else wait for this

and use a program such as pg_standby on the secondary to help replay the logs (this parameter goes in recovery.conf, but there are new options there too, see next feature).

max_standby_delay determines the behaviour of the standby database when conflicts between replay and read-only queries occur. In this situation, the standby database will wait at most until it's lagging max_standby_delay behind the production database before killing the conflicting read-only queries.

Of course, you're strongly encouraged to read the documentation before putting this feature in place… if only to understand the consequences of tuning max_standby_delay and vacuum_defer_cleanup_age, which are not easy to grasp.


Streaming Replication

Complimenting Hot Standby, Streaming Replication is the second half of the "great leap forward" for PostgreSQL. This time, the goal is improving the archiving mechanism to make it as continuous as possible and to not rely on file shipping. Standby databases can now connect to the master and get sent, whenever they want, what they are missing from the Write Ahead Log, not in terms of complete files ('wal segments'), but in terms of records in the WAL (you can think of them as fragments of these files).

Streaming Replication is an asynchronous mechanism; the standby database lags behind the master. But unlike other replication methods, this lag is very short, and can be as little as a single transaction, depending on network speed, database activity, and Hot Standby settings. Also, the load on the master for each slave is minimal, allowing a single master to support dozens of slaves.

Production and standby databases are identical at the binary level (well, almost, we won't get into details, but don't worry if your datafiles don't have the same checksum).

For Streaming Replication, wal_level should be 'archive' (or 'hot standby') to do continuous archiving.

postgresql.conf, Primary:

 max_wal_senders = x # Maximum 'wal_senders', processes responsible for managing a connection with a standby server
 wal_keep_segments # How many WAL segments(files) should be kept on the primary, whatever may happen (you won't have to copy them manually on the standby if the standby gets too far behind)

On the secondary:

recovery.conf, Secondary:

 standby_mode = on
 primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' # connection string to reach the primary database

postgresql.conf, Secondary:

 wal_level # same value as on the primary (you'll need this after a failover, to build a new standby)
 hot_standby=on/off # Do you want to use Hot Standby at the same time ?

pg_hba.conf file:

There must be an entry here for the replication connections. The fake database is 'replication', the designated user should be superuser. Be careful not to give broad access to this account: a lot of privileged data can be extracted from WAL records.

pg_hba.conf, Primary:

 host    replication     foo             192.168.1.100/32        md5

As for Hot Standby, this feature is rich and complex. It's advised to read the documentation. And to perform failovers and switchovers tests when everything is in place.

One thing should be stressed about these two features: you can use them together. This means you can have a near-realtime standby database, and run read-only queries on it, such as reporting queries. You can also use them independently; a standby database can be Hot Standby with file shipping only, and a Streaming Replication database can stream without accepting queries.


New features


Exclusion constraints

Exclusion constraints are very similar to unique constraints. They could be seen as unique constraints using other operators than '=': A unique constraint defines a set of columns for which two records in the table cannot be identical.

To illustrate this, we will use the example provided by this feature's author, using the temporal data type, that he also developed. This datatype stores time ranges, that is 'the time range from 10:15 to 11:15'.

First, we need to retrieve the temporal module here: http://pgfoundry.org/projects/temporal/ , then compile and install it as a contrib (run the provided sql script).

 CREATE TABLE reservation
 (
   room      TEXT,
   professor TEXT,
   during    PERIOD);
 ALTER TABLE reservation ADD CONSTRAINT test_exclude EXCLUDE   USING gist (room WITH =,during WITH &&);

Doing this, we declare that a record should be rejected (exclusion constraint) if there already is one verifying the two conditions 'the same room' and 'be in intersection for the time range' (the && operator).

 marc=# INSERT INTO reservation (professor,room,during) VALUES ( 'mark', 'tech room', period('2010-06-16 09:00:00', '2010-06-16 10:00:00'));
 INSERT 0 1
 marc=# INSERT INTO reservation (professor,room,during) VALUES ( 'john', 'chemistry room', period('2010-06-16 09:00:00', '2010-06-16 11:00:00'));
 INSERT 0 1
 marc=# INSERT INTO reservation (professor,room,during) VALUES ( 'mark', 'chemistry room', period('2010-06-16 10:00:00', '2010-06-16 11:00:00'));
 ERROR:  conflicting key value violates exclusion constraint "test_exclude"
 DETAIL:  Key (room, during)=(chemistry room, [2010-06-16 10:00:00+02, 2010-06-16 11:00:00+02)) conflicts with existing key (room, during)=(chemistry room, [2010-06-16 09:00:00+02, 2010-06-16 11:00:00+02)).

The insert is forbidden, as the chemistry room is already reserved from 9 to 11.

Exclusion constraints may also be used with arrays, geographic data, or other non-scalar data in order to implement advanced scientific and calendaring applications. No other database system has this feature.


Column triggers

Column triggers fire only when a specific column is explicitly UPDATED. They allow you to avoid adding lots of conditional logic and value comparisons in your trigger code.

Example:

 CREATE TRIGGER foo BEFORE UPDATE OF a ON t1 FOR EACH ROW EXECUTE PROCEDURE my_trigger();

This trigger fires only when t1's 'a' column of the t1 table has been updated.

Column triggers are not executed if columns are set to DEFAULT.


WHEN Triggers

Completing PostgreSQL's effort to limit IF ... THEN code in triggers, conditional triggers define simple conditions under which the trigger will be executed. This can dramatically decrease the number of trigger executions and reduce CPU load on the database server.

For example, this trigger would check that an account was correctly balanced only when the balance changes:

 CREATE TRIGGER check_update
 BEFORE UPDATE ON accounts
 FOR EACH ROW
 WHEN (OLD.balance IS DISTINCT FROM NEW.balance)
 EXECUTE PROCEDURE check_account_update();

And this trigger will only log a row update when the row actually changes. It's very helpful with framework or ORM applications, which may attempt to save unchanged rows:

 CREATE TRIGGER log_update
 AFTER UPDATE ON accounts
 FOR EACH ROW
 WHEN (OLD.* IS DISTINCT FROM NEW.*)
 EXECUTE PROCEDURE log_account_update();

You could even further than this and decide not to save a row at all if it hasn't changed:

 CREATE TRIGGER log_update
 AFTER UPDATE ON accounts
 FOR EACH ROW
 WHEN (OLD.* IS NOT DISTINCT FROM NEW.*)
 EXECUTE PROCEDURE no_op();


DEFERRABLE UNIQUE CONSTRAINTS

This feature will also be very useful. Here is an example, using a primary key instead of a simple unique key:

 marc=# CREATE TABLE test (a int primary key);
 marc=# INSERT INTO test values (1), (2);
 marc=# UPDATE test set a = a+1;
 ERROR:  duplicate key value violates unique constraint "test_pkey"
 DETAIL:  Key (a)=(2) already exists.

That's normal, but a pity nevertheless: at the end of the transaction, my data would have been consistent, as far as this constraint is concerned. Even worse, if the table had been physically sorted by descending order, the query would have worked! With 8.4, there was no easy way out, we had to find a trick to update the records in the right order.

We can now do this:

 marc=# CREATE TABLE test (a int primary key deferrable);
 marc=# INSERT INTO test values (2),(1);
 marc=# UPDATE test set a = a+1;
 ERROR:  duplicate key value violates unique constraint "test_pkey"
 DETAIL:  Key (a)=(2) already exists.

Oops, it doesn't work.

As a matter of fact, this is on purpose: It's a little reminder on deferrable/deferred constraints: a deferrable constraint CAN be checked at the end of a transaction. You still have to ask PostgreSQL to defer it.

One can, for the current session, require that all the constraints be deferred:

 marc=# SET CONSTRAINTS ALL DEFERRED;
 SET CONSTRAINTS
 marc=# UPDATE test set a = a+1;
 UPDATE 2

If one doesn't want to perform a SET CONSTRAINTS each time, the constraint can also be declared as INITIALLY DEFERRED:

 CREATE TABLE test (a int PRIMARY KEY DEFERRABLE INITIALLY DEFERRED);

Another reminder is necessary here: DEFERRED constraints are slower than IMMEDIATE constraints. Furthermore, the list of records to be checked at the end of the transaction has to be stored somewhere. So be careful of not doing this for millions of records at once. This is one of the reasons that DEFERRABLE constraints aren't INITIALLY DEFERRED by default.


Anonymous Functions

This new feature is for creating run-once functions. Effectively, this allows you to run stored procedure code on the command line or dynamically as you can on SQL Server and Oracle. Unlike those, however, PostgreSQL allows you to run an anonymous function in any procedural language which is installed of the more than a dozen which PostgreSQL supports.

This feature will be very useful for schema upgrade scripts for instance. Here is a slightly different version of the 'GRANT SELECT ON ALL TABLES' that will be seen later in this document, giving SELECT rights to a bunch of tables, depending on the table owner, and not taking into account two schemas.:

 DO language plpgsql $$
 DECLARE
 vr record;
 
 BEGIN
 
 FOR vr IN SELECT tablename FROM pg_tables WHERE tableowner = 'marc' AND schemaname NOT IN ('pg_catalog','information_schema')
 LOOP
   EXECUTE 'GRANT SELECT ON ' || vr.tablename || ' TO toto';
 END LOOP;
 END
 $$;

As of 8.4, this would have required creating a function (with CREATE FUNCTION), running it, then removing it (with DROP FUNCTION). All of this requiring having rights to do this. 9.0 simplifies performing this kind of procedures.

Anonymous functions are also called "anonymous code blocks" in the software industry.


Named parameters

Combined with the Default Parameters introduced in version 8.4, named parameters allow for dynamic calling of functions with variable numbers of arguments, much as they would be inside a programming language. Named parameters are familiar to users of SQL Server or Sybase, but PostgreSQL does one better by supporting both named parameter calls and function overloading.

The chosen syntax to name parameters is the following:

 CREATE FUNCTION test (a int, b text) RETURNS text AS $$
 DECLARE
   value text;
 BEGIN
   value := 'a is ' || a::text || ' and b is ' || b;
   RETURN value;
 END;
 $$ LANGUAGE plpgsql;

Until now, we wrote:

 SELECT test(1,'foo');
 test          
 -------------------------
 a is 1 and b is foo
 (1 row)

Now this explicit syntax can be used:

 SELECT test( b:='foo', a:=1);
 test          
 -------------------------
 a is 1 and b is foo
 (1 row)

Named parameters should eliminate the need to write many overloaded "wrapper" functions. Note that this does add a backwards compatibility issue; you are no longer able to rename function parameters using a REPLACE command, but must now drop and recreate the function.


GRANT/REVOKE IN SCHEMA

One annoying limitation in PostgreSQL has been the lack of global GRANT/REVOKE capabilities. With 9.0 it's now possible to set privileges on all tables, sequences and functions within a schema using without having to write a script or a stored procedure:

 GRANT SELECT ON ALL TABLES IN SCHEMA public TO toto;

And reverting this:

 REVOKE SELECT ON ALL TABLES IN SCHEMA public FROM toto;

See the GRANT documentation page for further details.

Note that the above only works for existing objects. However, it's now also possible to define default permissions for new objects:


ALTER DEFAULT PRIVILEGES

This feature also makes permission management more efficient.

 ALTER DEFAULT PRIVILEGES FOR ROLE marc GRANT SELECT ON TABLES TO PUBLIC;
 CREATE TABLE test_priv (a int);
 \z test_priv
                             Access privileges
 Schema |    Name    | Type  | Access privileges | Column access privileges
--------+------------+-------+-------------------+--------------------------
 public | test_priv  | table | =r/marc          +|
        |            |       | marc=arwdDxt/marc |

These new informations are stored in the pg_default_acl system table.


PL/pgSQL changes which may cause regressions

There are two changes in PL/pgSQL which may break code which works in 8.4 or earlier, meaning PL/pgSQL functions should be audited before before migrating to 9.0 to prevent possible runtime errors.


Removal of column/variable name ambiguity

In 8.4 and earlier, PL/PgSQL variables will take preference over a table or view column with the same name. While this behaviour is consistent, it is a potential source of coding errors. 9.0 will throw a runtime error if this situation occurs:

 marc=# DO LANGUAGE plpgsql
 $$
 DECLARE
   a int;
 BEGIN
   SELECT a FROM test;
 END
 $$
 ;
 ERROR:  column reference "a" is ambiguous
 LINE 1: select a from test
 DETAIL:  It could refer to either a PL/pgSQL variable or a table column.
 QUERY:  select a from test
 CONTEXT:  PL/pgSQL function "inline_code_block" line 4 at SQL statement

This behaviour can be altered globally in postgresql.conf, or on a per function basis by inserting one of these three options in the function declaration:

 #variable_conflict error        (default)
 #variable_conflict use_variable (variable name name takes precedence - pre-9.0 behaviour)
 #variable_conflict use_column   (column name takes precedence)

The manual contains more details.


Reserved words

From 9.0, use of unquoted reserved words as PL/PgSQL variable names is no longer permitted:

 marc=# DO LANGUAGE plpgsql
 $$
 DECLARE
 table int;
 BEGIN
 table :=table+1;
 END
 $$
 ;
 ERROR:  syntax error at or near "table"
 LINE 6: table :=table+1;

The correct syntax is:

 marc=# DO LANGUAGE plpgsql
 $$
 DECLARE
 "table" int;
 BEGIN
 "table" :="table"+1;
 END
 $$
 ;
 DO

Best practice is of course to avoid reserved words completely.


Improvements

The query planner got a lot of improvements in this version. So we'll start by talking about these:


Join Removal

 marc=# CREATE TABLE t1 (a int);
 CREATE TABLE
 marc=# CREATE TABLE t2 (b int);
 CREATE TABLE
 marc=# CREATE TABLE t3 (c int);
 CREATE TABLE

We put a little bit of data with a generate_series…

 marc=# EXPLAIN SELECT t1.a,t2.b from t1 join t2 on (t1.a=t2.b) left join t3 on (t1.a=t3.c);
 QUERY PLAN                                 
 ------------------------------------------------------------------------------
 Merge Right Join  (cost=506.24..6146.24 rows=345600 width=8)
     Merge Cond: (t3.c = t1.a)
   ->  Sort  (cost=168.75..174.75 rows=2400 width=4)
         Sort Key: t3.c
         ->  Seq Scan on t3  (cost=0.00..34.00 rows=2400 width=4)
   ->  Materialize  (cost=337.49..853.49 rows=28800 width=8)
         ->  Merge Join  (cost=337.49..781.49 rows=28800 width=8)
               Merge Cond: (t1.a = t2.b)
               ->  Sort  (cost=168.75..174.75 rows=2400 width=4)
                     Sort Key: t1.a
                     ->  Seq Scan on t1  (cost=0.00..34.00 rows=2400 width=4)
               ->  Sort  (cost=168.75..174.75 rows=2400 width=4)
                     Sort Key: t2.b
                     ->  Seq Scan on t2  (cost=0.00..34.00 rows=2400 width=4)

For now, everything is normal, and we have the same behavior in 8.4. But let's imagine that on t3, there is a UNIQUE constraint on the 'c' column. In this case, the join on t3 doesn't serve any purpose, theoretically speaking: the number of rows returned won't change, neither will their content. It's because the column is UNIQUE, the join is a LEFT JOIN, and no column of t3 is retrieved. If the column wasn't UNIQUE, the join could bring more rows. If that wasn't a LEFT JOIN, the join could ignore some rows.

With 9.0:

 marc=# ALTER TABLE t3 ADD UNIQUE (c);
 NOTICE:  ALTER TABLE / ADD UNIQUE will create implicit index "t3_c_key" for table "t3"
 ALTER TABLE
 marc=# EXPLAIN SELECT t1.a,t2.b from t1 join t2 on (t1.a=t2.b) left join t3 on (t1.a=t3.c);
                             QUERY PLAN                           
 ------------------------------------------------------------------
 Merge Join  (cost=337.49..781.49 rows=28800 width=8)
   Merge Cond: (t1.a = t2.b)
   ->  Sort  (cost=168.75..174.75 rows=2400 width=4)
         Sort Key: t1.a
         ->  Seq Scan on t1  (cost=0.00..34.00 rows=2400 width=4)
   ->  Sort  (cost=168.75..174.75 rows=2400 width=4)
         Sort Key: t2.b
         ->  Seq Scan on t2  (cost=0.00..34.00 rows=2400 width=4)
 (8 rows)

This enhancement will sometimes be very nice, among other things when using an ORM (Object Relation Mapping). These tools have a tendency to write unnecessary joins. In this simple case, the estimated cost has been divided by 10.

This could also help a lot for programs using a lot of joins and nested views.

This feature is another reason to declare the constraints in the database: without these constraints, there is no way for the engine to be sure these rewrites can be done.


IS NOT NULL can now use indexes

For this demonstration, we will compare the 8.4 and 9.0 versions (the table I created contains mostly nulls):

With 8.4:

 marc=# EXPLAIN ANALYZE SELECT max(a) from test;
 QUERY PLAN                                                                  
 ------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.03..0.04 rows=1 width=0) (actual time=281.320..281.321 rows=1 loops=1)
 InitPlan 1 (returns $0)
   ->  Limit  (cost=0.00..0.03 rows=1 width=4) (actual time=281.311..281.313 rows=1 loops=1)
   ->  Index Scan Backward using idxa on test  (cost=0.00..29447.36 rows=1001000 width=4) (actual time=281.307..281.307 rows=1 loops=1)
 Filter: (a IS NOT NULL)
 Total runtime: 281.360 ms
 (6 rows)

With 9.0:

 marc=# EXPLAIN ANALYZE SELECT max(a) from test;
 QUERY PLAN                                                                
 --------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.08..0.09 rows=1 width=0) (actual time=0.100..0.102 rows=1 loops=1)
 InitPlan 1 (returns $0)
   ->  Limit  (cost=0.00..0.08 rows=1 width=4) (actual time=0.092..0.093 rows=1 loops=1)
   ->  Index Scan Backward using idxa on test  (cost=0.00..84148.06 rows=1001164 width=4) (actual time=0.089..0.089 rows=1 loops=1)
 Index Cond: (a IS NOT NULL)
 Total runtime: 0.139 ms
 (6 rows)

The difference is that 9.0 only scans the not-null keys in the index. 8.4 has to go check in the table (Filter step, when 9.0 uses an index condition). In this precise use case, the gain is really big.


Use of index to get better statistics on the fly

Before starting to explain this new feature, let's talk about histograms: PostgreSQL, like some other databases, uses a statistical optimizer. This means that when planning a query it has (or should have) an approximately correct idea of how many records each step of the query will bring back. In order to do this, it uses statistics, such as the approximate number of records in a table, its size, most common values, and histograms. PostgreSQL use these to get estimates about the number of records brought back by a WHERE clause on a column, depending on the value or range asked in this WHERE clause.

In some cases, these histograms are rapidly out of date, and become a problem, for certain SQL queries. For instance, a log table in which timestamped records would be inserted, and from which we would most of the time want to get the records from the last 5 minutes.

In this specific case, it was impossible before 9.0 to get correct statistics. Now, when PostgreSQL detects while planning that a query asks for a 'range scan' on a value larger than the largest of the histogram (or smaller than the smallest), that is, the largest detected value during the last statistics calculation, and this column has an index, it gets the max (or min) value for this column using the index BEFORE really executing the query, in order to get more realistic statistics. As PostgreSQL uses an index for this, there HAS to be an index, of course.

Here comes an example. The a column of the test table has already been filled with a lot of dates, all in the past. It's statistics are up to date.

It's 13:37, and I haven't inserted anything after 13:37 yet.

 marc=# EXPLAIN ANALYZE select * from test where a > '2010-06-03 13:37:00';
 QUERY PLAN                                                 
 --------------------------------------------------------------------------------------------------------------
 Index Scan using idxtsta on test  (cost=0.00..8.30 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=1)
 Index Cond: (a > '2010-06-03 13:37:00'::timestamp without time zone)
 Total runtime: 0.027 ms
 (3 rows)

Everything's normal. The upper boundary of the histogram is '2010-06-03 13:36:16.830007' (this information comes from pg_stats). There is no way of guessing how many records are larger than 13:37, and with 8.4, PostgreSQL would have continued estimating '1' until the next analyze.

 marc=# DO LANGUAGE plpgsql
 $$
 DECLARE
   i int;
 BEGIN
   FOR i IN 1..10000 LOOP
   INSERT INTO test VALUES (clock_timestamp());
   END LOOP;
 END
 $$
 ;
 DO

(I must say I really like 'DO').
We just inserted 10000 records with a date larger than 13:37.

 marc=# EXPLAIN ANALYZE SELECT * FROM test WHERE a > '2010-06-03 13:37:00';
 QUERY PLAN                                                      
 -----------------------------------------------------------------------------------------------------------------------
 Index Scan using idxtsta on test  (cost=0.00..43.98 rows=1125 width=8) (actual time=0.012..13.590 rows=10000 loops=1)
 Index Cond: (a > '2010-06-03 13:37:00'::timestamp without time zone)
 Total runtime: 23.567 ms
 (3 rows)

The estimated rows isn't 0 or 1 anymore. The statistics haven't been updated, though:

 marc=# SELECT last_autoanalyze FROM pg_stat_user_tables WHERE relname = 'test';
 last_autoanalyze       
 -------------------------------
 2010-06-03 13:36:21.553477+02
 (1 row)

We still have a one magnitude error in the evaluation (10 times). But it's not that bad: without this enhancement, it would be of four magnitudes (10,000). Anyway, a much smaller error makes it more likely we'll get a good plan out of this kind of queries.


Per tablespace seq_page_cost/random_page_cost

 marc=# ALTER TABLESPACE pg_default SET ( random_page_cost = 10, seq_page_cost=5);
 ALTER TABLESPACE

We just changed random_page_cost and seq_page_cost for all the objects contained in pg_default. What for ?

The use case is when different tablespaces have different performance: for instance, you have some critical data on a SSD drive, or historical data on an older disk array, slower than the brand new array you use for active data. This makes it possible to tell PostgreSQL that all your tablespaces don't always behave the same way, from a performance point of view. This is only useful, of course, for quite big databases.


Force distinct statistics on a column

This makes it possible to set the number of different values for a column. This mustn't be used lightly, but only when ANALYZE on this column can't get a good value.

Here's how to do this:

 marc=# ALTER TABLE test ALTER COLUMN a SET (n_distinct = 2);
 ALTER TABLE

ANALYZE has to be run again for this to be taken into account:

 marc=# ANALYZE test;
 ANALYZE

Let's try now:

 marc=# EXPLAIN SELECT distinct * from test;
 QUERY PLAN                           
 ------------------------------------------------------------------
 HashAggregate  (cost=6263.00..6263.02 rows=2 width=8)
 ->  Seq Scan on test  (cost=0.00..5338.00 rows=370000 width=8)
 (2 rows)

This is an example of what SHOULDN'T be done : there REALLY is 370 000 distinct values in my table. Now my execution plans may be very bad.

If n_distinct is positive, it's the number of distinct values.

If it's negative (between 0 and -1), it's the multiplying factor regarding the number of estimated records in the table: for instance, -0.2 means that there is a distinct value for each 5 records of the table.

0 brings the behavior back to normal (ANALYZE estimates distinct by itself).

Don't change this parameter, unless you are completely sure you have correctly diagnosed you problem. Else, be assured performance will be degraded.


Many other aspects of the database have been improved. Here are a few amongst them:


Better VACUUM FULL

Until now, VACUUM FULL was very slow. This statement can recover free space from a table to reduce its size, mostly when VACUUM itself hasn't been run frequently enough.

It was slow because of the way it operated: records were read and moved one by one from their source bloc to a bloc closer to the beginning of the table. Once the end of the table was emptied, this empty part was removed.

This strategy was very inefficient: moving records one by one creates a lot of random IO. Moreover, during this reorganization, indexes had to be maintained, making everything even more costly, and fragmenting indexes. It was therefore advised to reindex a table just after a VACUUM FULL.

The VACUUM FULL statement, as of version 9.0, creates a new table from the current one, copying all the records sequentially. Once all records are copied, index are created back, and the old table is destroyed and replaced.

This has the advantage of being much faster. VACUUM FULL still needs an AccessExclusiveLock while running though. The only drawback of this method compared to the old one, is that VACUUM FULL can use as much as two times the size of the table on disk, as it is creating a new version of it.

Let's now compare the runtimes of the two methods. In both cases, we prepare the test data as follows (for 8.4 and 9.0)

 marc=# CREATE TABLE test (a int);
 CREATE TABLE
 marc=# CREATE INDEX idxtsta on test (a);
 CREATE INDEX
 marc=# INSERT INTO test SELECT generate_series(1,1000000);
 INSERT 0 1000000
 marc=# DELETE FROM test where a%3=0; -- making holes everywhere
 DELETE 333333
 marc=# VACUUM test;
 VACUUM

With 8.4:

 marc=# \timing
 Timing is on.
 marc=# VACUUM FULL test;
 VACUUM
 Time: 6306,603 ms
 marc=# REINDEX TABLE test;
 REINDEX
 Time: 1799,998 ms

So around 8 seconds.
With 9.0:

 marc=# \timing
 Timing is on.
 marc=# VACUUM FULL test;
 VACUUM
 Time: 2563,467 ms

That still doesn't mean that VACUUM FULL is a good idea in production. If you need it, it's probably because your VACUUM policy isn't appropriate.


64 bits for windows.

There is now a 64 bits version of PostgreSQL for Windows. There is no evidence for now the 500MB shared_buffers size limit before performance degrades seen on the 32 bits version for Windows is solved with this 64 bit version, though. But there are many other reasons you may want to use this 64 bit version, such as larger work_mem.


PL/pgSQL by default

You won't have to add PL/pgSQL in databases, as it will be installed by default.


Many improvements on PL languages.

Many languages have been vastly improved, PLPerl for instance. Read the release notes if you want more details, there are too many to detail here.


ALIAS keyword

ALIAS can now be used. As its name suggests, it can be used to alias variable names to other names.

The syntax is new_name ALIAS FOR old_name. This is put in the DECLARE section of PL/pgSQL code.

It has two main use cases:

  • to give names to PL functions variables:
 myparam ALIAS FOR $0
  • to rename potentially conflicting variables. In a trigger for instance:
 new_value ALIAS FOR new
(without this, we might have conflicted with the NEW variable in the trigger function).


Message passing in NOTIFY/pg_notify

Messages can now be passed using NOTIFY. Here is how:

  • Subscribe in session 1 to the 'instant_messenging' queue.
Session 1:
 marc=# LISTEN instant_messenging;
 LISTEN
  • Send a notification through 'instant_messenging', from another session
Session 2:
 marc=# NOTIFY instant messenging, 'You just received a message';
 NOTIFY
  • Check the content of the queue in the first session
Session 1:
 marc=# LISTEN instant_messenging;
 LISTEN
 Asynchronous notification "instant_messenging" with payload "You just received a message" received from server process with PID 5943.

So we can now associate messages (payloads) with notifications, making NOTIFY even more useful.

Let's also mention the new pg_notify function. With it, the second session's code can also be:

 SELECT pg_notify('instant_messenging','You just received a message');

This can simplify some code, in the case of a program managing a lot of different queues.


get_bit and set_bit for bit strings

Here is a very simple example. This tool can manipulate bits in a bit() independently.

 marc=# SELECT set_bit('1111'::bit(4),2,0);
 set_bit
 ---------
 1101
 (1 row)


 marc=# SELECT get_bit('1101'::bit(4),2);
 get_bit
 ---------
       0
 (1 row)


application_name in pg_stat_activity

In a monitoring session:

 marc=# SELECT * from pg_stat_activity where procpid= 5991;
 datid | datname | procpid | usesysid | usename | application_name | client_addr | client_port |         backend_start         | xact_start | query_start | waiting | current_query
 ------+---------+---------+----------+---------+------------------+-------------+-------------+-------------------------------+------------+-------------+---------+----------------
 16384 | marc    |    5991 |       10 | marc    | psql             |             |          -1 | 2010-05-16 13:48:10.154113+02 |            |             | f       | <IDLE>
 (1 row)

In the '5991' session:

 marc=# SET application_name TO 'my_app';
 SET

Back to the monitoring session:

 >marc=# SELECT * from pg_stat_activity where procpid= 5991;
 datid | datname | procpid | usesysid | usename | application_name | client_addr | client_port |         backend_start         | xact_start |          query_start          | waiting | current_query
 ------+---------+---------+----------+---------+------------------+-------------+-------------+-------------------------------+------------+-------------+---------+-----------------+----------------
 16384 | marc    |    5991 |       10 | marc    | my_app           |             |          -1 | 2010-05-16 13:48:10.154113+02 |            | 2010-05-16 13:49:13.107413+02 | f       | <IDLE>
 (1 row)

It's your job to set this up correctly in your program or your sessions. Your DBA will thank you for this, at last knowing who runs what on the database easily.


Per database+role configuration

Instead of being able to set up configuration variables per database or per user, one can now set them up for a certain user in a certain database:

 marc=# ALTER ROLE marc IN database marc set log_statement to 'all';
 ALTER ROLE

To know who has which variables set-up in which user+database, there is a new psql command:

 marc=# \drds
         List of settings
 role | database |     settings
 -----+----------+-----------------
 marc | marc     | log_statement=all
 (1 row)

There was a catalog change to store this:

 Table "pg_catalog.pg_db_role_setting"
   Column    |  Type  | Modifier
 ------------+--------+----------
 setdatabase | oid    | not null
 setrole     | oid    | not null
 setconfig   | text   |


Log all changed parameters on a postgresql.conf reload

Here is an example, the log_line_prefix parameter has been changed:

 LOG:  received SIGHUP, reloading configuration files
 <%> LOG:  parameter "log_line_prefix" changed to "<%u%%%d> "


New frame options for window functions

If you don't know window functions yet, you'd better learn about them. You can start here : <a href='http://www.depesz.com/index.php/2009/01/21/waiting-for-84-window-functions' />. They make writing certain kind of queries much easier.

New options have been added for declaring frames of windowing functions. Let's use this table (not having a better example…)

 marc=# SELECT * FROM salary ;
 entity     | name      | salary | start_date
 -----------+-----------+---------+---------------
 R&D        | marc      |  700.00 | 2010-02-15
 Accounting | jack      |  800.00 | 2010-05-01
 R&D        | maria     |  700.00 | 2009-01-01
 R&D        | kevin     |  500.00 | 2009-05-01
 R&D        | john      | 1000.00 | 2008-07-01
 R&D        | tom       | 1100.00 | 2005-01-01
 Accounting | millicent |  850.00 | 2006-01-01

Here is a window function example, without declaring the frame:

 marc=# SELECT entity, name, salary, start_date, avg(salary) OVER (PARTITION BY entity ORDER BY start_date) FROM salary;
 entity     | name      | salary | start_date    |          avg         
 -----------+-----------+---------+---------------+-----------------------
 Accounting | millicent |  850.00 | 2006-01-01    |  850.0000000000000000
 Accounting | jack      |  800.00 | 2010-05-01    |  825.0000000000000000
 R&D        | tom       | 1100.00 | 2005-01-01    | 1100.0000000000000000
 R&D        | john      | 1000.00 | 2008-07-01    | 1050.0000000000000000
 R&D        | maria     |  700.00 | 2009-01-01    |  933.3333333333333333
 R&D        | kevin     |  500.00 | 2009-05-01    |  825.0000000000000000
 R&D        | marc      |  700.00 | 2010-02-15    |  800.0000000000000000

The frame is the group of records over which the window function is run. Of course, if the frame isn't explicitly declared, there is a default one.

Here is the same query, with an explicit frame:

 marc=# SELECT entity, name, salary, start_date, avg(salary) OVER (PARTITION BY entity ORDER BY start_date    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM salary;
 entity     | name      | salary | start_date    |          avg         
 -----------+-----------+---------+---------------+-----------------------
 Accounting | millicent |  850.00 | 2006-01-01    |  850.0000000000000000
 Accounting | jack      |  800.00 | 2010-05-01    |  825.0000000000000000
 R&D        | tom       | 1100.00 | 2005-01-01    | 1100.0000000000000000
 R&D        | john      | 1000.00 | 2008-07-01    | 1050.0000000000000000
 R&D        | maria     |  700.00 | 2009-01-01    |  933.3333333333333333
 R&D        | kevin     |  500.00 | 2009-05-01    |  825.0000000000000000
 R&D        | marc      |  700.00 | 2010-02-15    |  800.0000000000000000

In this example, the frame is a 'range' frame, between the start of the partition (the group of similar rows) and the current row (not exactly the current row, but let's put that aside for now, read the documentation if you want to learn more). One can see, the average (avg) function is evaluated from the frame's first row (grouped together records) and the current row.

First new feature: as of 9.0, the frame can be declared to be between the current row and the end of the partition:

 marc=# SELECT entity, name, salary, start_date, avg(salary)  OVER (PARTITION BY entity ORDER BY start_date    RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM salary;
 entity     | name      | salary | start_date    |         avg         
 -----------+-----------+---------+---------------+----------------------
 Accounting | millicent |  850.00 | 2006-01-01    | 825.0000000000000000
 Accounting | jack      |  800.00 | 2010-05-01    | 800.0000000000000000
 R&D        | tom       | 1100.00 | 2005-01-01    | 800.0000000000000000
 R&D        | john      | 1000.00 | 2008-07-01    | 725.0000000000000000
 R&D        | maria     |  700.00 | 2009-01-01    | 633.3333333333333333
 R&D        | kevin     |  500.00 | 2009-05-01    | 600.0000000000000000
 R&D        | marc      |  700.00 | 2010-02-15    | 700.0000000000000000

Second new feature: frames can be declared as 'x previous records to y next records'. There is no point with this example, but let's do it anyway::

 marc=# SELECT entity, name, salary, start_date, avg(salary)  OVER (PARTITION BY entity ORDER BY start_date    RANGE ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM salary;
 entity     | name      | salary | start_date    |          avg         
 -----------+-----------+---------+---------------+-----------------------
 Accounting | millicent |  850.00 | 2006-01-01    |  825.0000000000000000
 Accounting | jack      |  800.00 | 2010-05-01    |  825.0000000000000000
 R&D        | tom       | 1100.00 | 2005-01-01    | 1050.0000000000000000
 R&D        | john      | 1000.00 | 2008-07-01    |  933.3333333333333333
 R&D        | maria     |  700.00 | 2009-01-01    |  733.3333333333333333
 R&D        | kevin     |  500.00 | 2009-05-01    |  633.3333333333333333
 R&D        | marc      |  700.00 | 2010-02-15    |  600.0000000000000000

The frame is still limited to the partition (see tom's record, for instance: jack's record isn't use for it's average).

If one wanted the same query, with a moving average on three rows, not reset on each partition switch (still no practical use):

 marc=# SELECT entity, name, salary, start_date, avg(salary) OVER (ORDER BY entity, start_date    ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM salary;
 entity     | name      | salary | start_date    |         avg         
 -----------+-----------+---------+---------------+----------------------
 Accounting | millicent |  850.00 | 2006-01-01    | 825.0000000000000000
 Accounting | jack      |  800.00 | 2010-05-01    | 916.6666666666666667
 R&D        | tom       | 1100.00 | 2005-01-01    | 966.6666666666666667
 R&D        | john      | 1000.00 | 2008-07-01    | 933.3333333333333333
 R&D        | maria     |  700.00 | 2009-01-01    | 733.3333333333333333
 R&D        | kevin     |  500.00 | 2009-05-01    | 633.3333333333333333
 R&D        | marc      |  700.00 | 2010-02-15    | 600.0000000000000000

In short, a power full tool to be mastered, even if I couldn't provide a good example.


Sort in aggregates

This feature is a subtle one: the result of an aggregate function may depend on the order it receives the data.

Of course, we're not talking about count, avg, but of array_agg, string_agg…

This is nice, as this will showcase string_agg, which is another 9.0 feature, killing two birds with one stone.

Let's start again with our salary table. We want the list of employees, concatenated as a single value, grouped by entity. It's going into a spreadsheet…

 marc=# SELECT entity,string_agg(name,', ') FROM salary GROUP BY entity;
 entity     |          string_agg          
 -----------+-------------------------------
 Accounting | jack, millicent
 R&D        | marc, maria, kevin, john, tom

That's already nice. But I want them sorted in alphabetical order, because I don't know how to write a macro in my spreadsheet to sort this data.

 marc=# SELECT entity,string_agg(name,', ' ORDER BY name) FROM salary GROUP BY entity;
 entity     |          string_agg          
 -----------+-------------------------------
 Accounting | etienne, stephanie
 R&D        | john, kevin, marc, maria, tom

To use this new feature, the sort clause must be inserted inside the aggregate function, without a comma to separate it from the parameters.


Better unique constraints error messages

With 8.4:

 marc=# INSERT INTO test VALUES (1);
 ERROR:  duplicate key value violates unique constraint "test_a_key"

With 9.0:

 marc=# INSERT INTO test VALUES (1);
 ERROR:  duplicate key value violates unique constraint "test_a_key"
 DETAIL:  Key (a)=(1) already exists.

This will make diagnosing constraint violation errors much easier.


Explain buffers, hashing statistics, xml, json, yaml, new optional explain syntax

Here is EXPLAIN ANALYZE as we all know it:

 marc=# EXPLAIN ANALYZE SELECT a, sum(c) FROM pere JOIN fils ON (pere.a = fils.b) WHERE b BETWEEN 1000 AND 300000 GROUP BY a;
                                                           QUERY PLAN                                                          
 ---------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=905.48..905.86 rows=31 width=8) (actual time=0.444..0.453 rows=6 loops=1)
   ->  Nested Loop  (cost=10.70..905.32 rows=31 width=8) (actual time=0.104..0.423 rows=6 loops=1)
         ->  Bitmap Heap Scan on fils  (cost=10.70..295.78 rows=31 width=8) (actual time=0.040..0.154 rows=30 loops=1)
               Recheck Cond: ((b >= 1000) AND (b <= 300000))
               ->  Bitmap Index Scan on fils_pkey  (cost=0.00..10.69 rows=31 width=0) (actual time=0.023..0.023 rows=30 loops=1)
                     Index Cond: ((b >= 1000) AND (b <= 300000))
         ->  Index Scan using pere_pkey on pere  (cost=0.00..19.65 rows=1 width=4) (actual time=0.005..0.005 rows=0 loops=30)
               Index Cond: (pere.a = fils.b)
 Total runtime: 0.560 ms
 (9 rows)

To get access to the new available information, use the new syntax::

 EXPLAIN [ ( { ANALYZE boolean | VERBOSE boolean | COSTS boolean | BUFFERS boolean | FORMAT { TEXT | XML | JSON | YAML } } [, ...] ) ] instruction

For instance:

 marc=# EXPLAIN (ANALYZE true, VERBOSE true, BUFFERS true) SELECT a, sum(c) FROM pere JOIN fils ON (pere.a = fils.b) WHERE b BETWEEN 1000 AND 300000 GROUP BY a;
                                                             QUERY PLAN
 -------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=905.48..905.86 rows=31 width=8) (actual time=1.326..1.336 rows=6 loops=1)
   Output: pere.a, sum(fils.c)
   Buffers: shared hit=58 read=40
   ->  Nested Loop  (cost=10.70..905.32 rows=31 width=8) (actual time=0.278..1.288 rows=6 loops=1)
         Output: pere.a, fils.c
         Buffers: shared hit=58 read=40
         ->  Bitmap Heap Scan on public.fils  (cost=10.70..295.78 rows=31 width=8) (actual time=0.073..0.737 rows=30 loops=1)
               Output: fils.b, fils.c
               Recheck Cond: ((fils.b >= 1000) AND (fils.b <= 300000))
               Buffers: shared hit=4 read=28
               ->  Bitmap Index Scan on fils_pkey  (cost=0.00..10.69 rows=31 width=0) (actual time=0.030..0.030 rows=30 loops=1)
                     Index Cond: ((fils.b >= 1000) AND (fils.b <= 300000))
                     Buffers: shared hit=3
         ->  Index Scan using pere_pkey on public.pere  (cost=0.00..19.65 rows=1 width=4) (actual time=0.013..0.014 rows=0 loops=30)
               Output: pere.a
               Index Cond: (pere.a = fils.b)
               Buffers: shared hit=54 read=12
 Total runtime: 1.526 ms
 (18 rows)

VERBOSE displays the 'Output' lines (it already existed on 8.4).

BUFFERS displays data about buffers (input-output operations performed by the query): hit is the number of blocks obtained directly from shared_buffers, read is the number of blocs asked to the operating system. Here, there was very little data in shared_buffers.

One can also ask for another formatting than plain text. For a user, it's not useful. For people developing GUIs over EXPLAIN, it simplifies development as they can get rid of an 'explain' parser (and its potential bugs), and use a more standard one, such as XML.

Costs display can also be deactivated with COSTS false.


Unaccent filtering dictionary

Filtering dictionaries can now be set up. This is about Full Text Search dictionaries.

These dictionaries' purpose it applying a first filter on words before lexemizing them. The module presented here is the first one to use this mechanism. Filtering can consist in removing words or modifying them.

Unaccent doesn't remove words, it removes accents (all diacritic signs, as a matter of fact), replacing accentuated characters with non-accentuated ones (many people, at least in French, don't type them). Unaccent is a contrib module.

Installing it, as all contrib modules, is as easy as

 psql mydb < contribs_path/unaccent.sql.

We'll now follow unaccent's documentation, the example being filtering french words.

Let's create a new 'fr' dictionary (keeping standard 'french' dictionary clean):

 marc=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french );
 CREATE TEXT SEARCH CONFIGURATION

The next statement alters the 'fr' setup for word and alike lexemes. These now have to go through unaccent and french_stem instead of only french_stem.

 marc=# ALTER TEXT SEARCH CONFIGURATION fr
 >ALTER MAPPING FOR hword, hword_part, word
 >WITH unaccent, french_stem;
 >ALTER TEXT SEARCH CONFIGURATION
 
 SELECT to_tsvector('fr','Hôtels de la Mer');
 to_tsvector   
 -------------------
 'hotel':1 'mer':4
 (1 row)
 
 marc=# SELECT to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels');
 ?column?
 ----------
 t
 (1 row)

It's now easy, without changing even one line of code in the client application, and keeping accentuated characters in the database, to look up words without taking accents into account.


vacuumdb --analyze-only

As the parameter name's indicates, one can now use vacuumdb to run analyzes only. It may be useful for cronjobs for instance.


Hstore contrib enhancements

This already powerful contrib module has become even more powerful:

  • Keys and values size limit has been removed.
  • GROUP BY and DISTINCT can now be used.
  • New operators and functions have been added.

An example would take too long, this module has a lot of features. Read the documentation at once !


Statement logged by auto_explain

auto_explain contrib module will now print the statement with its plan, which will make it much easier to use.


Buffers accounting for pg_stat_statements

This already very useful contrib module now provides data about buffers. pg_stat_statements, as a reminder, collects statistics on the queries run on the database. Until now, it stored the query's code, number of executions, accumulated runtime, accumulated returned records. It now collects buffer operations too.

 marc=# SELECT * from pg_stat_statements order by total_time desc limit 2;
 -[ RECORD 1 ]-------+---------------------
 userid              | 10
 dbid                | 16485
 query               | SELECT * from table1 ;
 calls               | 2
 total_time          | 0.491229
 rows                | 420000
 shared_blks_hit     | 61
 shared_blks_read    | 2251
 shared_blks_written | 0
 local_blks_hit      | 0
 local_blks_read     | 0
 local_blks_written  | 0
 temp_blks_read      | 0
 temp_blks_written   | 0
 -[ RECORD 2 ]-------+---------------------
 userid              | 10
 dbid                | 16485
 query               | SELECT * from table2;
 calls               | 2
 total_time          | 0.141445
 rows                | 200000
 shared_blks_hit     | 443
 shared_blks_read    | 443
 shared_blks_written | 0
 local_blks_hit      | 0
 local_blks_read     | 0
 local_blks_written  | 0
 temp_blks_read      | 0
 temp_blks_written   | 0

When this contrib is installed, one can now answer these questions:

  • Which query has the biggest accumulated runtime ?
  • Which query generates the most IO operations ? (we still can't know if data has been found in the Operating System's cache)
  • Which query uses mostly the cache (and hence won't be faster if we make it bigger) ?
  • Which query modifies the most blocks ?
  • Who does sorting ?

'temp' and 'local' are the buffer operations relative to temporary tables and other local operations (sorts, hashs) to a database backend.


passwordcheck

This contrib module can check passwords, and prevent the worst of them from getting in. After having it installed and set up as described in the documentation, here is the result:

 marc=# ALTER USER marc password 'marc12';
 <marc%marc> ERROR:  password is too short
 <marc%marc> STATEMENT:  ALTER USER marc password 'marc12';
 ERROR:  password is too short
 marc=# ALTER USER marc password 'marc123456';
 <marc%marc> ERROR:  password must not contain user name
 <marc%marc> STATEMENT:  ALTER USER marc password 'marc123456';
 ERROR:  password must not contain user name

This module has limitations, mostly due to PostgreSQL accepting already encrypted passwords to be declared, making correct verification impossible. Nevertheless, it's a step forward in the right direction.

Moreover, its code is well documented, and can be easily adapted to suit specific needs (one can activate cracklib very easily, for instance)

<!--
NewPP limit report
Preprocessor node count: 50/1000000
Post-expand include size: 0/2097152 bytes
Template argument size: 0/2097152 bytes
#ifexist count: 0/100
-->

<!-- Saved in parser cache with key wikidb:pcache:idhash:1423-0!1!0!!en!2!edit=0 and timestamp 20100629052345 -->

在gentoo Linux上安装Trac 0.12.0

上接, 在gentoo Linux上安装Trac 0.11.5

前一篇安装基本系统,不包括中文和插件,subversion/git等,本文就以这些为主进行说明,其他基础部分参考前文。

自trac-0.12开始,加入了完整的国际化基础架构,安装国际化功能不用再像之前版本那样复杂了。

一、中文化包安装:
1. 安装依赖包cldr: (由于包依赖问题,babel能使用cldr-1.7.x版本)
svn co http://www.unicode.org/repos/cldr/tags/release-1-7-2/common/ cldr-1.7

2. 安装依赖基础包babel-0.9.5:
svn co http://svn.edgewall.org/repos/babel/tags/0.9.5 babel-0.9.5
cd babel-0.9.5
./setup.py egg_info
python ./scripts/import_cldr.py ../cldr-1.7/
python setup.py install --prefix=/serv/stow/trac/
这时重新启动tracd时,访问http://localhost:8000会出现错误:
OSError: [Errno 2] 没有那个文件或目录: '/serv/stow/trac/lib64/python2.6/site-packages/Trac-0.12-py2.6.egg/trac/locale'
也就是说trac现在会搜索i10n的Local翻译文件了。

3. 其实还依赖两个包,genshi 和setuptools,不过这两个包系统中已经安装了。
如果需要让trac支持PostgreSQL,则还需要安装包 http://pypi.python.org/pypi/psycopg2

4. 如果前面安装过trac-0.12, 则重新安装trac-0.12
cd trac-0.12
python setup.py install --prefix=/serv/stow/trac/
这一步现在能看到处理locale相关数据。

如果中间没有出现错误,安装完成, 可以启动测试。
使用中文浏览器 http://localhost:8000/mytrac显示中文了。

Syndicate content