跨平台下载软件的那些事

在国内windows一统天下的时代,讨论跨平台软件的开发的思想是不入流的,浪费才力物力人力,却无任何好处,反而由于竟争对手对windows平台的专注而可能超越自己.
不过现在情况有些复杂了,桌面操作系统不断向多元化方向发展,使用不同操作系统的用户逐渐均衡分布,大大提高对通用软件跨平台特性的要求.
呼声比较高的软件像IM,下载工具,在线视频等.
在几年前,我从中选择一个相对比较容易的下载工具,作为目标,但由于自己水平有限,时间有限,孤军作战,到现在也没有做出一个成熟的跨平台下载工具.今年实现一个跨平台命令行工具aria2c的UI项目,效果比较好,准备好好完善.并且还构思了一种方便的资源收集与集成方案,但工作生活时间紧张,成了半拉子项目.
偶尔有一天收到一个rss新闻,看到flashget出了linux版本,下载试用,运行无问题,界面简单,下载测试速度的确比较快,特别是通过P2SP资源搜索功能,对一些死链接,慢速链接相当好用.服了呀,虽然现在功能比较简单,如果继续发展,应该会成为与windows版本一样强大的工具的.
浏览了一下flashget linux版本论坛,发现有人也希望官方出命令行版本,或者提供库 (当然是闭源的),以便适应不同用户的需要.这个当然不错,如果官方也这两种形式,本人不才,愿意做一个qt界面的linux flashget,或者参与到其他基于flashget API的项目中,给喜欢用kde的人使用.如果不出,希望官方的能继续开发出更好更强的linux版flashget吧.我也继续开源版本下载工具的开发,给linux使用者多一个选择,自己也在能在空闲时间做点网络学习.

另外在im方面,TX的linux QQ估计为了占坑吧,已经停止出版本了,并且最后一个beta版本基本不能用,崩溃问题很严重.虽然有一个web版本,但缺点也有,偶尔才用用.

视频方面,曾经ppstream有一个版本,但一直没用成功,后来被官方通过协议屏蔽了,算是完了.这个方面目录还没有什么希望用上了.还得继续VM+windowsxp.

对有核心技术的官方来说,开发一个跨平台的核心模块应该比较容易,接下来界面部分,简洁更好,不用太复杂,反倒win平台这些程序的界面都搞的特别复杂.
继续支持跨平台通用软件的公司团体和个人,自己也多努力,做出一个成熟的跨平台通用软件.

PostgreSQL 9.0新功能示例

PostgreSQL 9.0 release notes

This document showcases many of the latest developments in PostgreSQL 9.0, compared to the last major release – PostgreSQL 8.4. There are more than 200 improvements in this release. While all the important advances are explained and demonstrated, some of the more esoteric changes are not covered, but these are unlikely to be known issues for most users.

Hot Standby and Streaming Replication are the two new features that mark Version 9.0 as a landmark in PostgreSQL's development.

While there are several third party replication solutions available for PostgreSQL that meet a range of specific needs, this new release brings a simple, sturdy and integrated version that will probably be used as a default in most High Availability installations using PostgreSQL.

Changes are arranged in four chapters:

  • The two new features you can't ignore
  • Other new features
  • Potential issues when upgrading existing systems
  • Other improvements


Contents

[hide]


The two features you can't ignore

The significance of these two new features is the motivation for allocating a full version number to this release – 9.0 (not 8.5).


Hot Standby

This feature allows users to create a 'Standby' database – that is, a database replaying the primary's binary log, while making it available for read-only queries. It is substantially similar to enterprise standby database features of top proprietary databases, such as Oracle's DataGuard.

Implementation of this feature took over two years and is quite complex. While read-only queries are running, the standby database has to be able to replay the binary modifications coming from the production database, decide if these modifications are conflicting with the read-only queries and which action should be taken as a consequence: pause the replay or kill some read-only queries and move forward. Hot Standby also adds some data in WAL logs for the standby database and a conflict resolution mechanism.

To setup Hot standby, you just have to set up the production database as follows:

postgresql.conf, Primary:

 wal_level = 'hot standby' # Adds the required data in the WAL logs
 # vacuum_defer_cleanup_age # You may want to set this up, but it could be complicated (see documentation)

Then, create a standby database (the same way you did with previous standby databases: pg_start_backup on primary, copy the files (on the , pg_end_backup on primary).

Then, you just have to copy logs and data files on the secondary server and add this in its postgresql.conf:

postgresql.conf, Secondary:

 hot_standby=on
 max_standby_delay=30s # -1= always wait, 0= never wait, else wait for this

and use a program such as pg_standby on the secondary to help replay the logs (this parameter goes in recovery.conf, but there are new options there too, see next feature).

max_standby_delay determines the behaviour of the standby database when conflicts between replay and read-only queries occur. In this situation, the standby database will wait at most until it's lagging max_standby_delay behind the production database before killing the conflicting read-only queries.

Of course, you're strongly encouraged to read the documentation before putting this feature in place… if only to understand the consequences of tuning max_standby_delay and vacuum_defer_cleanup_age, which are not easy to grasp.


Streaming Replication

Complimenting Hot Standby, Streaming Replication is the second half of the "great leap forward" for PostgreSQL. This time, the goal is improving the archiving mechanism to make it as continuous as possible and to not rely on file shipping. Standby databases can now connect to the master and get sent, whenever they want, what they are missing from the Write Ahead Log, not in terms of complete files ('wal segments'), but in terms of records in the WAL (you can think of them as fragments of these files).

Streaming Replication is an asynchronous mechanism; the standby database lags behind the master. But unlike other replication methods, this lag is very short, and can be as little as a single transaction, depending on network speed, database activity, and Hot Standby settings. Also, the load on the master for each slave is minimal, allowing a single master to support dozens of slaves.

Production and standby databases are identical at the binary level (well, almost, we won't get into details, but don't worry if your datafiles don't have the same checksum).

For Streaming Replication, wal_level should be 'archive' (or 'hot standby') to do continuous archiving.

postgresql.conf, Primary:

 max_wal_senders = x # Maximum 'wal_senders', processes responsible for managing a connection with a standby server
 wal_keep_segments # How many WAL segments(files) should be kept on the primary, whatever may happen (you won't have to copy them manually on the standby if the standby gets too far behind)

On the secondary:

recovery.conf, Secondary:

 standby_mode = on
 primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass' # connection string to reach the primary database

postgresql.conf, Secondary:

 wal_level # same value as on the primary (you'll need this after a failover, to build a new standby)
 hot_standby=on/off # Do you want to use Hot Standby at the same time ?

pg_hba.conf file:

There must be an entry here for the replication connections. The fake database is 'replication', the designated user should be superuser. Be careful not to give broad access to this account: a lot of privileged data can be extracted from WAL records.

pg_hba.conf, Primary:

 host    replication     foo             192.168.1.100/32        md5

As for Hot Standby, this feature is rich and complex. It's advised to read the documentation. And to perform failovers and switchovers tests when everything is in place.

One thing should be stressed about these two features: you can use them together. This means you can have a near-realtime standby database, and run read-only queries on it, such as reporting queries. You can also use them independently; a standby database can be Hot Standby with file shipping only, and a Streaming Replication database can stream without accepting queries.


New features


Exclusion constraints

Exclusion constraints are very similar to unique constraints. They could be seen as unique constraints using other operators than '=': A unique constraint defines a set of columns for which two records in the table cannot be identical.

To illustrate this, we will use the example provided by this feature's author, using the temporal data type, that he also developed. This datatype stores time ranges, that is 'the time range from 10:15 to 11:15'.

First, we need to retrieve the temporal module here: http://pgfoundry.org/projects/temporal/ , then compile and install it as a contrib (run the provided sql script).

 CREATE TABLE reservation
 (
   room      TEXT,
   professor TEXT,
   during    PERIOD);
 ALTER TABLE reservation ADD CONSTRAINT test_exclude EXCLUDE   USING gist (room WITH =,during WITH &&);

Doing this, we declare that a record should be rejected (exclusion constraint) if there already is one verifying the two conditions 'the same room' and 'be in intersection for the time range' (the && operator).

 marc=# INSERT INTO reservation (professor,room,during) VALUES ( 'mark', 'tech room', period('2010-06-16 09:00:00', '2010-06-16 10:00:00'));
 INSERT 0 1
 marc=# INSERT INTO reservation (professor,room,during) VALUES ( 'john', 'chemistry room', period('2010-06-16 09:00:00', '2010-06-16 11:00:00'));
 INSERT 0 1
 marc=# INSERT INTO reservation (professor,room,during) VALUES ( 'mark', 'chemistry room', period('2010-06-16 10:00:00', '2010-06-16 11:00:00'));
 ERROR:  conflicting key value violates exclusion constraint "test_exclude"
 DETAIL:  Key (room, during)=(chemistry room, [2010-06-16 10:00:00+02, 2010-06-16 11:00:00+02)) conflicts with existing key (room, during)=(chemistry room, [2010-06-16 09:00:00+02, 2010-06-16 11:00:00+02)).

The insert is forbidden, as the chemistry room is already reserved from 9 to 11.

Exclusion constraints may also be used with arrays, geographic data, or other non-scalar data in order to implement advanced scientific and calendaring applications. No other database system has this feature.


Column triggers

Column triggers fire only when a specific column is explicitly UPDATED. They allow you to avoid adding lots of conditional logic and value comparisons in your trigger code.

Example:

 CREATE TRIGGER foo BEFORE UPDATE OF a ON t1 FOR EACH ROW EXECUTE PROCEDURE my_trigger();

This trigger fires only when t1's 'a' column of the t1 table has been updated.

Column triggers are not executed if columns are set to DEFAULT.


WHEN Triggers

Completing PostgreSQL's effort to limit IF ... THEN code in triggers, conditional triggers define simple conditions under which the trigger will be executed. This can dramatically decrease the number of trigger executions and reduce CPU load on the database server.

For example, this trigger would check that an account was correctly balanced only when the balance changes:

 CREATE TRIGGER check_update
 BEFORE UPDATE ON accounts
 FOR EACH ROW
 WHEN (OLD.balance IS DISTINCT FROM NEW.balance)
 EXECUTE PROCEDURE check_account_update();

And this trigger will only log a row update when the row actually changes. It's very helpful with framework or ORM applications, which may attempt to save unchanged rows:

 CREATE TRIGGER log_update
 AFTER UPDATE ON accounts
 FOR EACH ROW
 WHEN (OLD.* IS DISTINCT FROM NEW.*)
 EXECUTE PROCEDURE log_account_update();

You could even further than this and decide not to save a row at all if it hasn't changed:

 CREATE TRIGGER log_update
 AFTER UPDATE ON accounts
 FOR EACH ROW
 WHEN (OLD.* IS NOT DISTINCT FROM NEW.*)
 EXECUTE PROCEDURE no_op();


DEFERRABLE UNIQUE CONSTRAINTS

This feature will also be very useful. Here is an example, using a primary key instead of a simple unique key:

 marc=# CREATE TABLE test (a int primary key);
 marc=# INSERT INTO test values (1), (2);
 marc=# UPDATE test set a = a+1;
 ERROR:  duplicate key value violates unique constraint "test_pkey"
 DETAIL:  Key (a)=(2) already exists.

That's normal, but a pity nevertheless: at the end of the transaction, my data would have been consistent, as far as this constraint is concerned. Even worse, if the table had been physically sorted by descending order, the query would have worked! With 8.4, there was no easy way out, we had to find a trick to update the records in the right order.

We can now do this:

 marc=# CREATE TABLE test (a int primary key deferrable);
 marc=# INSERT INTO test values (2),(1);
 marc=# UPDATE test set a = a+1;
 ERROR:  duplicate key value violates unique constraint "test_pkey"
 DETAIL:  Key (a)=(2) already exists.

Oops, it doesn't work.

As a matter of fact, this is on purpose: It's a little reminder on deferrable/deferred constraints: a deferrable constraint CAN be checked at the end of a transaction. You still have to ask PostgreSQL to defer it.

One can, for the current session, require that all the constraints be deferred:

 marc=# SET CONSTRAINTS ALL DEFERRED;
 SET CONSTRAINTS
 marc=# UPDATE test set a = a+1;
 UPDATE 2

If one doesn't want to perform a SET CONSTRAINTS each time, the constraint can also be declared as INITIALLY DEFERRED:

 CREATE TABLE test (a int PRIMARY KEY DEFERRABLE INITIALLY DEFERRED);

Another reminder is necessary here: DEFERRED constraints are slower than IMMEDIATE constraints. Furthermore, the list of records to be checked at the end of the transaction has to be stored somewhere. So be careful of not doing this for millions of records at once. This is one of the reasons that DEFERRABLE constraints aren't INITIALLY DEFERRED by default.


Anonymous Functions

This new feature is for creating run-once functions. Effectively, this allows you to run stored procedure code on the command line or dynamically as you can on SQL Server and Oracle. Unlike those, however, PostgreSQL allows you to run an anonymous function in any procedural language which is installed of the more than a dozen which PostgreSQL supports.

This feature will be very useful for schema upgrade scripts for instance. Here is a slightly different version of the 'GRANT SELECT ON ALL TABLES' that will be seen later in this document, giving SELECT rights to a bunch of tables, depending on the table owner, and not taking into account two schemas.:

 DO language plpgsql $$
 DECLARE
 vr record;
 
 BEGIN
 
 FOR vr IN SELECT tablename FROM pg_tables WHERE tableowner = 'marc' AND schemaname NOT IN ('pg_catalog','information_schema')
 LOOP
   EXECUTE 'GRANT SELECT ON ' || vr.tablename || ' TO toto';
 END LOOP;
 END
 $$;

As of 8.4, this would have required creating a function (with CREATE FUNCTION), running it, then removing it (with DROP FUNCTION). All of this requiring having rights to do this. 9.0 simplifies performing this kind of procedures.

Anonymous functions are also called "anonymous code blocks" in the software industry.


Named parameters

Combined with the Default Parameters introduced in version 8.4, named parameters allow for dynamic calling of functions with variable numbers of arguments, much as they would be inside a programming language. Named parameters are familiar to users of SQL Server or Sybase, but PostgreSQL does one better by supporting both named parameter calls and function overloading.

The chosen syntax to name parameters is the following:

 CREATE FUNCTION test (a int, b text) RETURNS text AS $$
 DECLARE
   value text;
 BEGIN
   value := 'a is ' || a::text || ' and b is ' || b;
   RETURN value;
 END;
 $$ LANGUAGE plpgsql;

Until now, we wrote:

 SELECT test(1,'foo');
 test          
 -------------------------
 a is 1 and b is foo
 (1 row)

Now this explicit syntax can be used:

 SELECT test( b:='foo', a:=1);
 test          
 -------------------------
 a is 1 and b is foo
 (1 row)

Named parameters should eliminate the need to write many overloaded "wrapper" functions. Note that this does add a backwards compatibility issue; you are no longer able to rename function parameters using a REPLACE command, but must now drop and recreate the function.


GRANT/REVOKE IN SCHEMA

One annoying limitation in PostgreSQL has been the lack of global GRANT/REVOKE capabilities. With 9.0 it's now possible to set privileges on all tables, sequences and functions within a schema using without having to write a script or a stored procedure:

 GRANT SELECT ON ALL TABLES IN SCHEMA public TO toto;

And reverting this:

 REVOKE SELECT ON ALL TABLES IN SCHEMA public FROM toto;

See the GRANT documentation page for further details.

Note that the above only works for existing objects. However, it's now also possible to define default permissions for new objects:


ALTER DEFAULT PRIVILEGES

This feature also makes permission management more efficient.

 ALTER DEFAULT PRIVILEGES FOR ROLE marc GRANT SELECT ON TABLES TO PUBLIC;
 CREATE TABLE test_priv (a int);
 \z test_priv
                             Access privileges
 Schema |    Name    | Type  | Access privileges | Column access privileges
--------+------------+-------+-------------------+--------------------------
 public | test_priv  | table | =r/marc          +|
        |            |       | marc=arwdDxt/marc |

These new informations are stored in the pg_default_acl system table.


PL/pgSQL changes which may cause regressions

There are two changes in PL/pgSQL which may break code which works in 8.4 or earlier, meaning PL/pgSQL functions should be audited before before migrating to 9.0 to prevent possible runtime errors.


Removal of column/variable name ambiguity

In 8.4 and earlier, PL/PgSQL variables will take preference over a table or view column with the same name. While this behaviour is consistent, it is a potential source of coding errors. 9.0 will throw a runtime error if this situation occurs:

 marc=# DO LANGUAGE plpgsql
 $$
 DECLARE
   a int;
 BEGIN
   SELECT a FROM test;
 END
 $$
 ;
 ERROR:  column reference "a" is ambiguous
 LINE 1: select a from test
 DETAIL:  It could refer to either a PL/pgSQL variable or a table column.
 QUERY:  select a from test
 CONTEXT:  PL/pgSQL function "inline_code_block" line 4 at SQL statement

This behaviour can be altered globally in postgresql.conf, or on a per function basis by inserting one of these three options in the function declaration:

 #variable_conflict error        (default)
 #variable_conflict use_variable (variable name name takes precedence - pre-9.0 behaviour)
 #variable_conflict use_column   (column name takes precedence)

The manual contains more details.


Reserved words

From 9.0, use of unquoted reserved words as PL/PgSQL variable names is no longer permitted:

 marc=# DO LANGUAGE plpgsql
 $$
 DECLARE
 table int;
 BEGIN
 table :=table+1;
 END
 $$
 ;
 ERROR:  syntax error at or near "table"
 LINE 6: table :=table+1;

The correct syntax is:

 marc=# DO LANGUAGE plpgsql
 $$
 DECLARE
 "table" int;
 BEGIN
 "table" :="table"+1;
 END
 $$
 ;
 DO

Best practice is of course to avoid reserved words completely.


Improvements

The query planner got a lot of improvements in this version. So we'll start by talking about these:


Join Removal

 marc=# CREATE TABLE t1 (a int);
 CREATE TABLE
 marc=# CREATE TABLE t2 (b int);
 CREATE TABLE
 marc=# CREATE TABLE t3 (c int);
 CREATE TABLE

We put a little bit of data with a generate_series…

 marc=# EXPLAIN SELECT t1.a,t2.b from t1 join t2 on (t1.a=t2.b) left join t3 on (t1.a=t3.c);
 QUERY PLAN                                 
 ------------------------------------------------------------------------------
 Merge Right Join  (cost=506.24..6146.24 rows=345600 width=8)
     Merge Cond: (t3.c = t1.a)
   ->  Sort  (cost=168.75..174.75 rows=2400 width=4)
         Sort Key: t3.c
         ->  Seq Scan on t3  (cost=0.00..34.00 rows=2400 width=4)
   ->  Materialize  (cost=337.49..853.49 rows=28800 width=8)
         ->  Merge Join  (cost=337.49..781.49 rows=28800 width=8)
               Merge Cond: (t1.a = t2.b)
               ->  Sort  (cost=168.75..174.75 rows=2400 width=4)
                     Sort Key: t1.a
                     ->  Seq Scan on t1  (cost=0.00..34.00 rows=2400 width=4)
               ->  Sort  (cost=168.75..174.75 rows=2400 width=4)
                     Sort Key: t2.b
                     ->  Seq Scan on t2  (cost=0.00..34.00 rows=2400 width=4)

For now, everything is normal, and we have the same behavior in 8.4. But let's imagine that on t3, there is a UNIQUE constraint on the 'c' column. In this case, the join on t3 doesn't serve any purpose, theoretically speaking: the number of rows returned won't change, neither will their content. It's because the column is UNIQUE, the join is a LEFT JOIN, and no column of t3 is retrieved. If the column wasn't UNIQUE, the join could bring more rows. If that wasn't a LEFT JOIN, the join could ignore some rows.

With 9.0:

 marc=# ALTER TABLE t3 ADD UNIQUE (c);
 NOTICE:  ALTER TABLE / ADD UNIQUE will create implicit index "t3_c_key" for table "t3"
 ALTER TABLE
 marc=# EXPLAIN SELECT t1.a,t2.b from t1 join t2 on (t1.a=t2.b) left join t3 on (t1.a=t3.c);
                             QUERY PLAN                           
 ------------------------------------------------------------------
 Merge Join  (cost=337.49..781.49 rows=28800 width=8)
   Merge Cond: (t1.a = t2.b)
   ->  Sort  (cost=168.75..174.75 rows=2400 width=4)
         Sort Key: t1.a
         ->  Seq Scan on t1  (cost=0.00..34.00 rows=2400 width=4)
   ->  Sort  (cost=168.75..174.75 rows=2400 width=4)
         Sort Key: t2.b
         ->  Seq Scan on t2  (cost=0.00..34.00 rows=2400 width=4)
 (8 rows)

This enhancement will sometimes be very nice, among other things when using an ORM (Object Relation Mapping). These tools have a tendency to write unnecessary joins. In this simple case, the estimated cost has been divided by 10.

This could also help a lot for programs using a lot of joins and nested views.

This feature is another reason to declare the constraints in the database: without these constraints, there is no way for the engine to be sure these rewrites can be done.


IS NOT NULL can now use indexes

For this demonstration, we will compare the 8.4 and 9.0 versions (the table I created contains mostly nulls):

With 8.4:

 marc=# EXPLAIN ANALYZE SELECT max(a) from test;
 QUERY PLAN                                                                  
 ------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.03..0.04 rows=1 width=0) (actual time=281.320..281.321 rows=1 loops=1)
 InitPlan 1 (returns $0)
   ->  Limit  (cost=0.00..0.03 rows=1 width=4) (actual time=281.311..281.313 rows=1 loops=1)
   ->  Index Scan Backward using idxa on test  (cost=0.00..29447.36 rows=1001000 width=4) (actual time=281.307..281.307 rows=1 loops=1)
 Filter: (a IS NOT NULL)
 Total runtime: 281.360 ms
 (6 rows)

With 9.0:

 marc=# EXPLAIN ANALYZE SELECT max(a) from test;
 QUERY PLAN                                                                
 --------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.08..0.09 rows=1 width=0) (actual time=0.100..0.102 rows=1 loops=1)
 InitPlan 1 (returns $0)
   ->  Limit  (cost=0.00..0.08 rows=1 width=4) (actual time=0.092..0.093 rows=1 loops=1)
   ->  Index Scan Backward using idxa on test  (cost=0.00..84148.06 rows=1001164 width=4) (actual time=0.089..0.089 rows=1 loops=1)
 Index Cond: (a IS NOT NULL)
 Total runtime: 0.139 ms
 (6 rows)

The difference is that 9.0 only scans the not-null keys in the index. 8.4 has to go check in the table (Filter step, when 9.0 uses an index condition). In this precise use case, the gain is really big.


Use of index to get better statistics on the fly

Before starting to explain this new feature, let's talk about histograms: PostgreSQL, like some other databases, uses a statistical optimizer. This means that when planning a query it has (or should have) an approximately correct idea of how many records each step of the query will bring back. In order to do this, it uses statistics, such as the approximate number of records in a table, its size, most common values, and histograms. PostgreSQL use these to get estimates about the number of records brought back by a WHERE clause on a column, depending on the value or range asked in this WHERE clause.

In some cases, these histograms are rapidly out of date, and become a problem, for certain SQL queries. For instance, a log table in which timestamped records would be inserted, and from which we would most of the time want to get the records from the last 5 minutes.

In this specific case, it was impossible before 9.0 to get correct statistics. Now, when PostgreSQL detects while planning that a query asks for a 'range scan' on a value larger than the largest of the histogram (or smaller than the smallest), that is, the largest detected value during the last statistics calculation, and this column has an index, it gets the max (or min) value for this column using the index BEFORE really executing the query, in order to get more realistic statistics. As PostgreSQL uses an index for this, there HAS to be an index, of course.

Here comes an example. The a column of the test table has already been filled with a lot of dates, all in the past. It's statistics are up to date.

It's 13:37, and I haven't inserted anything after 13:37 yet.

 marc=# EXPLAIN ANALYZE select * from test where a > '2010-06-03 13:37:00';
 QUERY PLAN                                                 
 --------------------------------------------------------------------------------------------------------------
 Index Scan using idxtsta on test  (cost=0.00..8.30 rows=1 width=8) (actual time=0.007..0.007 rows=0 loops=1)
 Index Cond: (a > '2010-06-03 13:37:00'::timestamp without time zone)
 Total runtime: 0.027 ms
 (3 rows)

Everything's normal. The upper boundary of the histogram is '2010-06-03 13:36:16.830007' (this information comes from pg_stats). There is no way of guessing how many records are larger than 13:37, and with 8.4, PostgreSQL would have continued estimating '1' until the next analyze.

 marc=# DO LANGUAGE plpgsql
 $$
 DECLARE
   i int;
 BEGIN
   FOR i IN 1..10000 LOOP
   INSERT INTO test VALUES (clock_timestamp());
   END LOOP;
 END
 $$
 ;
 DO

(I must say I really like 'DO').
We just inserted 10000 records with a date larger than 13:37.

 marc=# EXPLAIN ANALYZE SELECT * FROM test WHERE a > '2010-06-03 13:37:00';
 QUERY PLAN                                                      
 -----------------------------------------------------------------------------------------------------------------------
 Index Scan using idxtsta on test  (cost=0.00..43.98 rows=1125 width=8) (actual time=0.012..13.590 rows=10000 loops=1)
 Index Cond: (a > '2010-06-03 13:37:00'::timestamp without time zone)
 Total runtime: 23.567 ms
 (3 rows)

The estimated rows isn't 0 or 1 anymore. The statistics haven't been updated, though:

 marc=# SELECT last_autoanalyze FROM pg_stat_user_tables WHERE relname = 'test';
 last_autoanalyze       
 -------------------------------
 2010-06-03 13:36:21.553477+02
 (1 row)

We still have a one magnitude error in the evaluation (10 times). But it's not that bad: without this enhancement, it would be of four magnitudes (10,000). Anyway, a much smaller error makes it more likely we'll get a good plan out of this kind of queries.


Per tablespace seq_page_cost/random_page_cost

 marc=# ALTER TABLESPACE pg_default SET ( random_page_cost = 10, seq_page_cost=5);
 ALTER TABLESPACE

We just changed random_page_cost and seq_page_cost for all the objects contained in pg_default. What for ?

The use case is when different tablespaces have different performance: for instance, you have some critical data on a SSD drive, or historical data on an older disk array, slower than the brand new array you use for active data. This makes it possible to tell PostgreSQL that all your tablespaces don't always behave the same way, from a performance point of view. This is only useful, of course, for quite big databases.


Force distinct statistics on a column

This makes it possible to set the number of different values for a column. This mustn't be used lightly, but only when ANALYZE on this column can't get a good value.

Here's how to do this:

 marc=# ALTER TABLE test ALTER COLUMN a SET (n_distinct = 2);
 ALTER TABLE

ANALYZE has to be run again for this to be taken into account:

 marc=# ANALYZE test;
 ANALYZE

Let's try now:

 marc=# EXPLAIN SELECT distinct * from test;
 QUERY PLAN                           
 ------------------------------------------------------------------
 HashAggregate  (cost=6263.00..6263.02 rows=2 width=8)
 ->  Seq Scan on test  (cost=0.00..5338.00 rows=370000 width=8)
 (2 rows)

This is an example of what SHOULDN'T be done : there REALLY is 370 000 distinct values in my table. Now my execution plans may be very bad.

If n_distinct is positive, it's the number of distinct values.

If it's negative (between 0 and -1), it's the multiplying factor regarding the number of estimated records in the table: for instance, -0.2 means that there is a distinct value for each 5 records of the table.

0 brings the behavior back to normal (ANALYZE estimates distinct by itself).

Don't change this parameter, unless you are completely sure you have correctly diagnosed you problem. Else, be assured performance will be degraded.


Many other aspects of the database have been improved. Here are a few amongst them:


Better VACUUM FULL

Until now, VACUUM FULL was very slow. This statement can recover free space from a table to reduce its size, mostly when VACUUM itself hasn't been run frequently enough.

It was slow because of the way it operated: records were read and moved one by one from their source bloc to a bloc closer to the beginning of the table. Once the end of the table was emptied, this empty part was removed.

This strategy was very inefficient: moving records one by one creates a lot of random IO. Moreover, during this reorganization, indexes had to be maintained, making everything even more costly, and fragmenting indexes. It was therefore advised to reindex a table just after a VACUUM FULL.

The VACUUM FULL statement, as of version 9.0, creates a new table from the current one, copying all the records sequentially. Once all records are copied, index are created back, and the old table is destroyed and replaced.

This has the advantage of being much faster. VACUUM FULL still needs an AccessExclusiveLock while running though. The only drawback of this method compared to the old one, is that VACUUM FULL can use as much as two times the size of the table on disk, as it is creating a new version of it.

Let's now compare the runtimes of the two methods. In both cases, we prepare the test data as follows (for 8.4 and 9.0)

 marc=# CREATE TABLE test (a int);
 CREATE TABLE
 marc=# CREATE INDEX idxtsta on test (a);
 CREATE INDEX
 marc=# INSERT INTO test SELECT generate_series(1,1000000);
 INSERT 0 1000000
 marc=# DELETE FROM test where a%3=0; -- making holes everywhere
 DELETE 333333
 marc=# VACUUM test;
 VACUUM

With 8.4:

 marc=# \timing
 Timing is on.
 marc=# VACUUM FULL test;
 VACUUM
 Time: 6306,603 ms
 marc=# REINDEX TABLE test;
 REINDEX
 Time: 1799,998 ms

So around 8 seconds.
With 9.0:

 marc=# \timing
 Timing is on.
 marc=# VACUUM FULL test;
 VACUUM
 Time: 2563,467 ms

That still doesn't mean that VACUUM FULL is a good idea in production. If you need it, it's probably because your VACUUM policy isn't appropriate.


64 bits for windows.

There is now a 64 bits version of PostgreSQL for Windows. There is no evidence for now the 500MB shared_buffers size limit before performance degrades seen on the 32 bits version for Windows is solved with this 64 bit version, though. But there are many other reasons you may want to use this 64 bit version, such as larger work_mem.


PL/pgSQL by default

You won't have to add PL/pgSQL in databases, as it will be installed by default.


Many improvements on PL languages.

Many languages have been vastly improved, PLPerl for instance. Read the release notes if you want more details, there are too many to detail here.


ALIAS keyword

ALIAS can now be used. As its name suggests, it can be used to alias variable names to other names.

The syntax is new_name ALIAS FOR old_name. This is put in the DECLARE section of PL/pgSQL code.

It has two main use cases:

  • to give names to PL functions variables:
 myparam ALIAS FOR $0
  • to rename potentially conflicting variables. In a trigger for instance:
 new_value ALIAS FOR new
(without this, we might have conflicted with the NEW variable in the trigger function).


Message passing in NOTIFY/pg_notify

Messages can now be passed using NOTIFY. Here is how:

  • Subscribe in session 1 to the 'instant_messenging' queue.
Session 1:
 marc=# LISTEN instant_messenging;
 LISTEN
  • Send a notification through 'instant_messenging', from another session
Session 2:
 marc=# NOTIFY instant messenging, 'You just received a message';
 NOTIFY
  • Check the content of the queue in the first session
Session 1:
 marc=# LISTEN instant_messenging;
 LISTEN
 Asynchronous notification "instant_messenging" with payload "You just received a message" received from server process with PID 5943.

So we can now associate messages (payloads) with notifications, making NOTIFY even more useful.

Let's also mention the new pg_notify function. With it, the second session's code can also be:

 SELECT pg_notify('instant_messenging','You just received a message');

This can simplify some code, in the case of a program managing a lot of different queues.


get_bit and set_bit for bit strings

Here is a very simple example. This tool can manipulate bits in a bit() independently.

 marc=# SELECT set_bit('1111'::bit(4),2,0);
 set_bit
 ---------
 1101
 (1 row)


 marc=# SELECT get_bit('1101'::bit(4),2);
 get_bit
 ---------
       0
 (1 row)


application_name in pg_stat_activity

In a monitoring session:

 marc=# SELECT * from pg_stat_activity where procpid= 5991;
 datid | datname | procpid | usesysid | usename | application_name | client_addr | client_port |         backend_start         | xact_start | query_start | waiting | current_query
 ------+---------+---------+----------+---------+------------------+-------------+-------------+-------------------------------+------------+-------------+---------+----------------
 16384 | marc    |    5991 |       10 | marc    | psql             |             |          -1 | 2010-05-16 13:48:10.154113+02 |            |             | f       | <IDLE>
 (1 row)

In the '5991' session:

 marc=# SET application_name TO 'my_app';
 SET

Back to the monitoring session:

 >marc=# SELECT * from pg_stat_activity where procpid= 5991;
 datid | datname | procpid | usesysid | usename | application_name | client_addr | client_port |         backend_start         | xact_start |          query_start          | waiting | current_query
 ------+---------+---------+----------+---------+------------------+-------------+-------------+-------------------------------+------------+-------------+---------+-----------------+----------------
 16384 | marc    |    5991 |       10 | marc    | my_app           |             |          -1 | 2010-05-16 13:48:10.154113+02 |            | 2010-05-16 13:49:13.107413+02 | f       | <IDLE>
 (1 row)

It's your job to set this up correctly in your program or your sessions. Your DBA will thank you for this, at last knowing who runs what on the database easily.


Per database+role configuration

Instead of being able to set up configuration variables per database or per user, one can now set them up for a certain user in a certain database:

 marc=# ALTER ROLE marc IN database marc set log_statement to 'all';
 ALTER ROLE

To know who has which variables set-up in which user+database, there is a new psql command:

 marc=# \drds
         List of settings
 role | database |     settings
 -----+----------+-----------------
 marc | marc     | log_statement=all
 (1 row)

There was a catalog change to store this:

 Table "pg_catalog.pg_db_role_setting"
   Column    |  Type  | Modifier
 ------------+--------+----------
 setdatabase | oid    | not null
 setrole     | oid    | not null
 setconfig   | text   |


Log all changed parameters on a postgresql.conf reload

Here is an example, the log_line_prefix parameter has been changed:

 LOG:  received SIGHUP, reloading configuration files
 <%> LOG:  parameter "log_line_prefix" changed to "<%u%%%d> "


New frame options for window functions

If you don't know window functions yet, you'd better learn about them. You can start here : <a href='http://www.depesz.com/index.php/2009/01/21/waiting-for-84-window-functions' />. They make writing certain kind of queries much easier.

New options have been added for declaring frames of windowing functions. Let's use this table (not having a better example…)

 marc=# SELECT * FROM salary ;
 entity     | name      | salary | start_date
 -----------+-----------+---------+---------------
 R&D        | marc      |  700.00 | 2010-02-15
 Accounting | jack      |  800.00 | 2010-05-01
 R&D        | maria     |  700.00 | 2009-01-01
 R&D        | kevin     |  500.00 | 2009-05-01
 R&D        | john      | 1000.00 | 2008-07-01
 R&D        | tom       | 1100.00 | 2005-01-01
 Accounting | millicent |  850.00 | 2006-01-01

Here is a window function example, without declaring the frame:

 marc=# SELECT entity, name, salary, start_date, avg(salary) OVER (PARTITION BY entity ORDER BY start_date) FROM salary;
 entity     | name      | salary | start_date    |          avg         
 -----------+-----------+---------+---------------+-----------------------
 Accounting | millicent |  850.00 | 2006-01-01    |  850.0000000000000000
 Accounting | jack      |  800.00 | 2010-05-01    |  825.0000000000000000
 R&D        | tom       | 1100.00 | 2005-01-01    | 1100.0000000000000000
 R&D        | john      | 1000.00 | 2008-07-01    | 1050.0000000000000000
 R&D        | maria     |  700.00 | 2009-01-01    |  933.3333333333333333
 R&D        | kevin     |  500.00 | 2009-05-01    |  825.0000000000000000
 R&D        | marc      |  700.00 | 2010-02-15    |  800.0000000000000000

The frame is the group of records over which the window function is run. Of course, if the frame isn't explicitly declared, there is a default one.

Here is the same query, with an explicit frame:

 marc=# SELECT entity, name, salary, start_date, avg(salary) OVER (PARTITION BY entity ORDER BY start_date    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM salary;
 entity     | name      | salary | start_date    |          avg         
 -----------+-----------+---------+---------------+-----------------------
 Accounting | millicent |  850.00 | 2006-01-01    |  850.0000000000000000
 Accounting | jack      |  800.00 | 2010-05-01    |  825.0000000000000000
 R&D        | tom       | 1100.00 | 2005-01-01    | 1100.0000000000000000
 R&D        | john      | 1000.00 | 2008-07-01    | 1050.0000000000000000
 R&D        | maria     |  700.00 | 2009-01-01    |  933.3333333333333333
 R&D        | kevin     |  500.00 | 2009-05-01    |  825.0000000000000000
 R&D        | marc      |  700.00 | 2010-02-15    |  800.0000000000000000

In this example, the frame is a 'range' frame, between the start of the partition (the group of similar rows) and the current row (not exactly the current row, but let's put that aside for now, read the documentation if you want to learn more). One can see, the average (avg) function is evaluated from the frame's first row (grouped together records) and the current row.

First new feature: as of 9.0, the frame can be declared to be between the current row and the end of the partition:

 marc=# SELECT entity, name, salary, start_date, avg(salary)  OVER (PARTITION BY entity ORDER BY start_date    RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM salary;
 entity     | name      | salary | start_date    |         avg         
 -----------+-----------+---------+---------------+----------------------
 Accounting | millicent |  850.00 | 2006-01-01    | 825.0000000000000000
 Accounting | jack      |  800.00 | 2010-05-01    | 800.0000000000000000
 R&D        | tom       | 1100.00 | 2005-01-01    | 800.0000000000000000
 R&D        | john      | 1000.00 | 2008-07-01    | 725.0000000000000000
 R&D        | maria     |  700.00 | 2009-01-01    | 633.3333333333333333
 R&D        | kevin     |  500.00 | 2009-05-01    | 600.0000000000000000
 R&D        | marc      |  700.00 | 2010-02-15    | 700.0000000000000000

Second new feature: frames can be declared as 'x previous records to y next records'. There is no point with this example, but let's do it anyway::

 marc=# SELECT entity, name, salary, start_date, avg(salary)  OVER (PARTITION BY entity ORDER BY start_date    RANGE ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM salary;
 entity     | name      | salary | start_date    |          avg         
 -----------+-----------+---------+---------------+-----------------------
 Accounting | millicent |  850.00 | 2006-01-01    |  825.0000000000000000
 Accounting | jack      |  800.00 | 2010-05-01    |  825.0000000000000000
 R&D        | tom       | 1100.00 | 2005-01-01    | 1050.0000000000000000
 R&D        | john      | 1000.00 | 2008-07-01    |  933.3333333333333333
 R&D        | maria     |  700.00 | 2009-01-01    |  733.3333333333333333
 R&D        | kevin     |  500.00 | 2009-05-01    |  633.3333333333333333
 R&D        | marc      |  700.00 | 2010-02-15    |  600.0000000000000000

The frame is still limited to the partition (see tom's record, for instance: jack's record isn't use for it's average).

If one wanted the same query, with a moving average on three rows, not reset on each partition switch (still no practical use):

 marc=# SELECT entity, name, salary, start_date, avg(salary) OVER (ORDER BY entity, start_date    ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM salary;
 entity     | name      | salary | start_date    |         avg         
 -----------+-----------+---------+---------------+----------------------
 Accounting | millicent |  850.00 | 2006-01-01    | 825.0000000000000000
 Accounting | jack      |  800.00 | 2010-05-01    | 916.6666666666666667
 R&D        | tom       | 1100.00 | 2005-01-01    | 966.6666666666666667
 R&D        | john      | 1000.00 | 2008-07-01    | 933.3333333333333333
 R&D        | maria     |  700.00 | 2009-01-01    | 733.3333333333333333
 R&D        | kevin     |  500.00 | 2009-05-01    | 633.3333333333333333
 R&D        | marc      |  700.00 | 2010-02-15    | 600.0000000000000000

In short, a power full tool to be mastered, even if I couldn't provide a good example.


Sort in aggregates

This feature is a subtle one: the result of an aggregate function may depend on the order it receives the data.

Of course, we're not talking about count, avg, but of array_agg, string_agg…

This is nice, as this will showcase string_agg, which is another 9.0 feature, killing two birds with one stone.

Let's start again with our salary table. We want the list of employees, concatenated as a single value, grouped by entity. It's going into a spreadsheet…

 marc=# SELECT entity,string_agg(name,', ') FROM salary GROUP BY entity;
 entity     |          string_agg          
 -----------+-------------------------------
 Accounting | jack, millicent
 R&D        | marc, maria, kevin, john, tom

That's already nice. But I want them sorted in alphabetical order, because I don't know how to write a macro in my spreadsheet to sort this data.

 marc=# SELECT entity,string_agg(name,', ' ORDER BY name) FROM salary GROUP BY entity;
 entity     |          string_agg          
 -----------+-------------------------------
 Accounting | etienne, stephanie
 R&D        | john, kevin, marc, maria, tom

To use this new feature, the sort clause must be inserted inside the aggregate function, without a comma to separate it from the parameters.


Better unique constraints error messages

With 8.4:

 marc=# INSERT INTO test VALUES (1);
 ERROR:  duplicate key value violates unique constraint "test_a_key"

With 9.0:

 marc=# INSERT INTO test VALUES (1);
 ERROR:  duplicate key value violates unique constraint "test_a_key"
 DETAIL:  Key (a)=(1) already exists.

This will make diagnosing constraint violation errors much easier.


Explain buffers, hashing statistics, xml, json, yaml, new optional explain syntax

Here is EXPLAIN ANALYZE as we all know it:

 marc=# EXPLAIN ANALYZE SELECT a, sum(c) FROM pere JOIN fils ON (pere.a = fils.b) WHERE b BETWEEN 1000 AND 300000 GROUP BY a;
                                                           QUERY PLAN                                                          
 ---------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=905.48..905.86 rows=31 width=8) (actual time=0.444..0.453 rows=6 loops=1)
   ->  Nested Loop  (cost=10.70..905.32 rows=31 width=8) (actual time=0.104..0.423 rows=6 loops=1)
         ->  Bitmap Heap Scan on fils  (cost=10.70..295.78 rows=31 width=8) (actual time=0.040..0.154 rows=30 loops=1)
               Recheck Cond: ((b >= 1000) AND (b <= 300000))
               ->  Bitmap Index Scan on fils_pkey  (cost=0.00..10.69 rows=31 width=0) (actual time=0.023..0.023 rows=30 loops=1)
                     Index Cond: ((b >= 1000) AND (b <= 300000))
         ->  Index Scan using pere_pkey on pere  (cost=0.00..19.65 rows=1 width=4) (actual time=0.005..0.005 rows=0 loops=30)
               Index Cond: (pere.a = fils.b)
 Total runtime: 0.560 ms
 (9 rows)

To get access to the new available information, use the new syntax::

 EXPLAIN [ ( { ANALYZE boolean | VERBOSE boolean | COSTS boolean | BUFFERS boolean | FORMAT { TEXT | XML | JSON | YAML } } [, ...] ) ] instruction

For instance:

 marc=# EXPLAIN (ANALYZE true, VERBOSE true, BUFFERS true) SELECT a, sum(c) FROM pere JOIN fils ON (pere.a = fils.b) WHERE b BETWEEN 1000 AND 300000 GROUP BY a;
                                                             QUERY PLAN
 -------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=905.48..905.86 rows=31 width=8) (actual time=1.326..1.336 rows=6 loops=1)
   Output: pere.a, sum(fils.c)
   Buffers: shared hit=58 read=40
   ->  Nested Loop  (cost=10.70..905.32 rows=31 width=8) (actual time=0.278..1.288 rows=6 loops=1)
         Output: pere.a, fils.c
         Buffers: shared hit=58 read=40
         ->  Bitmap Heap Scan on public.fils  (cost=10.70..295.78 rows=31 width=8) (actual time=0.073..0.737 rows=30 loops=1)
               Output: fils.b, fils.c
               Recheck Cond: ((fils.b >= 1000) AND (fils.b <= 300000))
               Buffers: shared hit=4 read=28
               ->  Bitmap Index Scan on fils_pkey  (cost=0.00..10.69 rows=31 width=0) (actual time=0.030..0.030 rows=30 loops=1)
                     Index Cond: ((fils.b >= 1000) AND (fils.b <= 300000))
                     Buffers: shared hit=3
         ->  Index Scan using pere_pkey on public.pere  (cost=0.00..19.65 rows=1 width=4) (actual time=0.013..0.014 rows=0 loops=30)
               Output: pere.a
               Index Cond: (pere.a = fils.b)
               Buffers: shared hit=54 read=12
 Total runtime: 1.526 ms
 (18 rows)

VERBOSE displays the 'Output' lines (it already existed on 8.4).

BUFFERS displays data about buffers (input-output operations performed by the query): hit is the number of blocks obtained directly from shared_buffers, read is the number of blocs asked to the operating system. Here, there was very little data in shared_buffers.

One can also ask for another formatting than plain text. For a user, it's not useful. For people developing GUIs over EXPLAIN, it simplifies development as they can get rid of an 'explain' parser (and its potential bugs), and use a more standard one, such as XML.

Costs display can also be deactivated with COSTS false.


Unaccent filtering dictionary

Filtering dictionaries can now be set up. This is about Full Text Search dictionaries.

These dictionaries' purpose it applying a first filter on words before lexemizing them. The module presented here is the first one to use this mechanism. Filtering can consist in removing words or modifying them.

Unaccent doesn't remove words, it removes accents (all diacritic signs, as a matter of fact), replacing accentuated characters with non-accentuated ones (many people, at least in French, don't type them). Unaccent is a contrib module.

Installing it, as all contrib modules, is as easy as

 psql mydb < contribs_path/unaccent.sql.

We'll now follow unaccent's documentation, the example being filtering french words.

Let's create a new 'fr' dictionary (keeping standard 'french' dictionary clean):

 marc=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french );
 CREATE TEXT SEARCH CONFIGURATION

The next statement alters the 'fr' setup for word and alike lexemes. These now have to go through unaccent and french_stem instead of only french_stem.

 marc=# ALTER TEXT SEARCH CONFIGURATION fr
 >ALTER MAPPING FOR hword, hword_part, word
 >WITH unaccent, french_stem;
 >ALTER TEXT SEARCH CONFIGURATION
 
 SELECT to_tsvector('fr','Hôtels de la Mer');
 to_tsvector   
 -------------------
 'hotel':1 'mer':4
 (1 row)
 
 marc=# SELECT to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels');
 ?column?
 ----------
 t
 (1 row)

It's now easy, without changing even one line of code in the client application, and keeping accentuated characters in the database, to look up words without taking accents into account.


vacuumdb --analyze-only

As the parameter name's indicates, one can now use vacuumdb to run analyzes only. It may be useful for cronjobs for instance.


Hstore contrib enhancements

This already powerful contrib module has become even more powerful:

  • Keys and values size limit has been removed.
  • GROUP BY and DISTINCT can now be used.
  • New operators and functions have been added.

An example would take too long, this module has a lot of features. Read the documentation at once !


Statement logged by auto_explain

auto_explain contrib module will now print the statement with its plan, which will make it much easier to use.


Buffers accounting for pg_stat_statements

This already very useful contrib module now provides data about buffers. pg_stat_statements, as a reminder, collects statistics on the queries run on the database. Until now, it stored the query's code, number of executions, accumulated runtime, accumulated returned records. It now collects buffer operations too.

 marc=# SELECT * from pg_stat_statements order by total_time desc limit 2;
 -[ RECORD 1 ]-------+---------------------
 userid              | 10
 dbid                | 16485
 query               | SELECT * from table1 ;
 calls               | 2
 total_time          | 0.491229
 rows                | 420000
 shared_blks_hit     | 61
 shared_blks_read    | 2251
 shared_blks_written | 0
 local_blks_hit      | 0
 local_blks_read     | 0
 local_blks_written  | 0
 temp_blks_read      | 0
 temp_blks_written   | 0
 -[ RECORD 2 ]-------+---------------------
 userid              | 10
 dbid                | 16485
 query               | SELECT * from table2;
 calls               | 2
 total_time          | 0.141445
 rows                | 200000
 shared_blks_hit     | 443
 shared_blks_read    | 443
 shared_blks_written | 0
 local_blks_hit      | 0
 local_blks_read     | 0
 local_blks_written  | 0
 temp_blks_read      | 0
 temp_blks_written   | 0

When this contrib is installed, one can now answer these questions:

  • Which query has the biggest accumulated runtime ?
  • Which query generates the most IO operations ? (we still can't know if data has been found in the Operating System's cache)
  • Which query uses mostly the cache (and hence won't be faster if we make it bigger) ?
  • Which query modifies the most blocks ?
  • Who does sorting ?

'temp' and 'local' are the buffer operations relative to temporary tables and other local operations (sorts, hashs) to a database backend.


passwordcheck

This contrib module can check passwords, and prevent the worst of them from getting in. After having it installed and set up as described in the documentation, here is the result:

 marc=# ALTER USER marc password 'marc12';
 <marc%marc> ERROR:  password is too short
 <marc%marc> STATEMENT:  ALTER USER marc password 'marc12';
 ERROR:  password is too short
 marc=# ALTER USER marc password 'marc123456';
 <marc%marc> ERROR:  password must not contain user name
 <marc%marc> STATEMENT:  ALTER USER marc password 'marc123456';
 ERROR:  password must not contain user name

This module has limitations, mostly due to PostgreSQL accepting already encrypted passwords to be declared, making correct verification impossible. Nevertheless, it's a step forward in the right direction.

Moreover, its code is well documented, and can be easily adapted to suit specific needs (one can activate cracklib very easily, for instance)

<!--
NewPP limit report
Preprocessor node count: 50/1000000
Post-expand include size: 0/2097152 bytes
Template argument size: 0/2097152 bytes
#ifexist count: 0/100
-->

<!-- Saved in parser cache with key wikidb:pcache:idhash:1423-0!1!0!!en!2!edit=0 and timestamp 20100629052345 -->

在gentoo Linux上安装Trac 0.12.0

上接, 在gentoo Linux上安装Trac 0.11.5

前一篇安装基本系统,不包括中文和插件,subversion/git等,本文就以这些为主进行说明,其他基础部分参考前文。

自trac-0.12开始,加入了完整的国际化基础架构,安装国际化功能不用再像之前版本那样复杂了。

一、中文化包安装:
1. 安装依赖包cldr: (由于包依赖问题,babel能使用cldr-1.7.x版本)
svn co http://www.unicode.org/repos/cldr/tags/release-1-7-2/common/ cldr-1.7

2. 安装依赖基础包babel-0.9.5:
svn co http://svn.edgewall.org/repos/babel/tags/0.9.5 babel-0.9.5
cd babel-0.9.5
./setup.py egg_info
python ./scripts/import_cldr.py ../cldr-1.7/
python setup.py install --prefix=/serv/stow/trac/
这时重新启动tracd时,访问http://localhost:8000会出现错误:
OSError: [Errno 2] 没有那个文件或目录: '/serv/stow/trac/lib64/python2.6/site-packages/Trac-0.12-py2.6.egg/trac/locale'
也就是说trac现在会搜索i10n的Local翻译文件了。

3. 其实还依赖两个包,genshi 和setuptools,不过这两个包系统中已经安装了。
如果需要让trac支持PostgreSQL,则还需要安装包 http://pypi.python.org/pypi/psycopg2

4. 如果前面安装过trac-0.12, 则重新安装trac-0.12
cd trac-0.12
python setup.py install --prefix=/serv/stow/trac/
这一步现在能看到处理locale相关数据。

如果中间没有出现错误,安装完成, 可以启动测试。
使用中文浏览器 http://localhost:8000/mytrac显示中文了。

LLVM,Clang编译工具键项目介绍

llvm, LLVM 是 Low Level Virtual Machine 的简称,这个库提供了与编译器相关的支持,能够进行程序语言的编译期优化、链接优化、在线编译优化、代码生成。简而言之,可以作为多种语言编译器的后台来使用。
clang, Clang 是一个 C++ 编写、基于 LLVM、发布于 LLVM BSD 许可证下的 C/C++/Objective C/Objective C++ 编译器,其目标(之一)就是超越 GCC。
libc++, 如同每个C++编译器套装都自带一份对应的C++标准库一样,clang 在可使用GNU libstdc++作为标准库之后,开始了自己的C++标准库编写项目,这就是libc++。
libc++以C++0x 为目标,同样以BSD风格的llvm许可证发布,其特性及目标除了“编译/执行速度快”和“占用内存小”之外,还可在如异常对象、 RTTI(Runtime Type Identification 运行时类型识别)和内存分配等低级特性上与GCC的libstdc++保持 ABI兼容性。
lldb, 基于LLVM的 debugger新一代高性能调试器,集成LLVM反编译器和Clang表达式解析器等高阶组件,用于C/C++ /Objective-C 程序的调试,同样以LLVM许可证发布。

这几个项目之间的关系,clang是这些编译工具键组件的集合名称, 与 GCC工具键中的GCC对应,libc++ 与GCC工具键GNU libstdc++库对应, lldb与GCC工具键中的gdb对应。

虽然clang的终极目标是能取代gcc,但就日前来说,还有差距。从功能上来说,缺少gcc 中的gfortron编译器,ada编译器,java编译器。
clang项目主要由apple支持,在成为完善通用能取代gcc之前,可能会侧重Object-C。

llvm的未来,clang中的几个项目都使用BSD风格的llvm 许可证,与GCC的GPL很不相同,是否在成熟之后改成闭源也未可知。但就目前来说,开源界C/C++工具键多了一个选择。

接下来尝试创建这个工具键编译环境,测试些库或者软件用llvm编译的情况。

使用ffmpeg录音及桌面录像

in

仅录制声音:

ffmpeg -f oss -ar 44100 -i /dev/dsp -acodec mp2 -ab 128k test.mp3

桌面录像:
ffmpeg -f oss -i /dev/dsp -f x11grab -r 30 -s 1024x768 -i :0.0 -acodec pcm_s16le -vcodec libx264 -vpre lossless_ultrafast -threads 0 output.mkv

本系列文章导航

Windows下FFmpeg快速入门

ffmpeg参数解释

mencoder和ffmpeg参数详解(Java处理视频)

Java 生成视频缩略图(ffmpeg)

使用ffmpeg进行视频文件转换成FLV整理

java 视频处理 mencoder

java 视频处理 ffmped+mencoder

例子:ffmpeg -y -i "1.avi" -title "Test" -vcodec xvid -s 368x208 -r 29.97 -b 1500 -acodec aac -ac 2 -ar 24000 -ab 128 -vol 200 -f psp -muxvb 768 "output.wmv"

解释:以上命令可以在Dos命令行中输入,也可以创建到批处理文件中运行。不过,前提是:要在ffmpeg所在的目录中执行(转换君所在目录下面的cores子目录)。
参数:

-y 覆盖输出文件,即如果 output.wmv 文件已经存在的话,不经提示就覆盖掉
-i "1.avi" 输入文件是和ffmpeg在同一目录下的1.avi文件,可以自己加路径,改名字
-title "Test" 在PSP中显示的影片的标题
-vcodec xvid 使用XVID编码压缩视频,不能改的
-s 368x208 输出的分辨率为368x208,注意片源一定要是16:9的不然会变形
-r 29.97 帧数,一般就用这个吧
-b 1500 视频数据流量,用-b xxxx的指令则使用固定码率,数字随便改,1500以上没效果;还可以用动态码率如:-qscale 4和-qscale 6,4的质量比6高
-acodec aac 音频编码用AAC
-ac 2 声道数1或2
-ar 24000 声音的采样频率,好像PSP只能支持24000Hz
-ab 128 音频数据流量,一般选择32、64、96、128
-vol 200 200%的音量,自己改
-muxvb 768 好像是给PSP机器识别的码率,一般选择384、512和768,我改成1500,PSP就说文件损坏了
-f psp 输出psp专用格式
"output.wmv" 输出文件名,也可以加路径改文件名

示例1:
截取一张352x240尺寸大小的,格式为jpg的图片:
ffmpeg -i test.asf -y -f image2 -t 0.001 -s 352x240 a.jpg

示例2:
把视频的前30帧转换成一个Animated Gif :
ffmpeg -i test.asf -vframes 30 -y -f gif a.gif

示例3:
在视频的第8.01秒处截取 352*240 的缩略图
ffmpeg -i test2.asf -y -f image2 -ss 08.010 -t 0.001 -s 352x240 b.jpg
**************************************************************************************
a) 通用选项

-L license

-h 帮助

-fromats 显示可用的格式,编解码的,协议的。。。

-f fmt 强迫采用格式fmt

-I filename 输入文件

-y 覆盖输出文件

-t duration 设置纪录时间 hh:mm:ss[.xxx]格式的记录时间也支持

-ss position 搜索到指定的时间 [-]hh:mm:ss[.xxx]的格式也支持

-title string 设置标题

-author string 设置作者

-copyright string 设置版权

-comment string 设置评论

-target type 设置目标文件类型(vcd,svcd,dvd) 所有的格式选项(比特率,编解码以及缓冲区大小)自动设置 ,只需要输入如下的就可以了:
ffmpeg -i myfile.avi -target vcd /tmp/vcd.mpg

-hq 激活高质量设置

-itsoffset offset 设置以秒为基准的时间偏移,该选项影响所有后面的输入文件。该偏移被加到输入文件的时戳,定义一个正偏移意味着相应的流被延迟了 offset秒。 [-]hh:mm:ss[.xxx]的格式也支持

b) 视频选项

-b bitrate 设置比特率,缺省200kb/s

-r fps 设置帧频 缺省25

-s size 设置帧大小 格式为WXH 缺省160X128.下面的简写也可以直接使用:
Sqcif 128X96 qcif 176X144 cif 252X288 4cif 704X576

-aspect aspect 设置横纵比 4:3 16:9 或 1.3333 1.7777

-croptop size 设置顶部切除带大小 像素单位

-cropbottom size –cropleft size –cropright size

-padtop size 设置顶部补齐的大小 像素单位

-padbottom size –padleft size –padright size –padcolor color 设置补齐条颜色(hex,6个16进制的数,红:绿:兰排列,比如 000000代表黑色)

-vn 不做视频记录

-bt tolerance 设置视频码率容忍度kbit/s

-maxrate bitrate设置最大视频码率容忍度

-minrate bitreate 设置最小视频码率容忍度

-bufsize size 设置码率控制缓冲区大小

-vcodec codec 强制使用codec编解码方式。 如果用copy表示原始编解码数据必须被拷贝。

-sameq 使用同样视频质量作为源(VBR)

-pass n 选择处理遍数(1或者2)。两遍编码非常有用。第一遍生成统计信息,第二遍生成精确的请求的码率

-passlogfile file 选择两遍的纪录文件名为file


c)高级视频选项

-g gop_size 设置图像组大小

-intra 仅适用帧内编码

-qscale q 使用固定的视频量化标度(VBR)

-qmin q 最小视频量化标度(VBR)

-qmax q 最大视频量化标度(VBR)

-qdiff q 量化标度间最大偏差 (VBR)

-qblur blur 视频量化标度柔化(VBR)

-qcomp compression 视频量化标度压缩(VBR)

-rc_init_cplx complexity 一遍编码的初始复杂度

-b_qfactor factor 在p和b帧间的qp因子

-i_qfactor factor 在p和i帧间的qp因子

-b_qoffset offset 在p和b帧间的qp偏差

-i_qoffset offset 在p和i帧间的qp偏差

-rc_eq equation 设置码率控制方程 默认tex^qComp

-rc_override override 特定间隔下的速率控制重载

-me method 设置运动估计的方法 可用方法有 zero phods log x1 epzs(缺省) full

-dct_algo algo 设置dct的算法 可用的有 0 FF_DCT_AUTO 缺省的DCT 1 FF_DCT_FASTINT 2 FF_DCT_INT 3 FF_DCT_MMX 4 FF_DCT_MLIB 5 FF_DCT_ALTIVEC

-idct_algo algo 设置idct算法。可用的有 0 FF_IDCT_AUTO 缺省的IDCT 1 FF_IDCT_INT 2 FF_IDCT_SIMPLE 3 FF_IDCT_SIMPLEMMX 4 FF_IDCT_LIBMPEG2MMX 5 FF_IDCT_PS2 6 FF_IDCT_MLIB 7 FF_IDCT_ARM 8 FF_IDCT_ALTIVEC 9 FF_IDCT_SH4 10 FF_IDCT_SIMPLEARM

-er n 设置错误残留为n 1 FF_ER_CAREFULL 缺省 2 FF_ER_COMPLIANT 3 FF_ER_AGGRESSIVE 4 FF_ER_VERY_AGGRESSIVE

-ec bit_mask 设置错误掩蔽为bit_mask,该值为如下值的位掩码 1 FF_EC_GUESS_MVS (default=enabled) 2 FF_EC_DEBLOCK (default=enabled)

-bf frames 使用frames B 帧,支持mpeg1,mpeg2,mpeg4

-mbd mode 宏块决策 0 FF_MB_DECISION_SIMPLE 使用mb_cmp 1 FF_MB_DECISION_BITS 2 FF_MB_DECISION_RD

-4mv 使用4个运动矢量 仅用于mpeg4

-part 使用数据划分 仅用于mpeg4

-bug param 绕过没有被自动监测到编码器的问题

-strict strictness 跟标准的严格性

-aic 使能高级帧内编码 h263+

-umv 使能无限运动矢量 h263+

-deinterlace 不采用交织方法

-interlace 强迫交织法编码 仅对mpeg2和mpeg4有效。当你的输入是交织的并且你想要保持交织以最小图像损失的时候采用该选项。可选的方法是不交织,但是损失更大

-psnr 计算压缩帧的psnr

-vstats 输出视频编码统计到vstats_hhmmss.log

-vhook module 插入视频处理模块 module 包括了模块名和参数,用空格分开

d)音频选项

-ab bitrate 设置音频码率

-ar freq 设置音频采样率

-ac channels 设置通道 缺省为1

-an 不使能音频纪录

-acodec codec 使用codec编解码

e)音频/视频捕获选项

-vd device 设置视频捕获设备。比如/dev/video0

-vc channel 设置视频捕获通道 DV1394专用

-tvstd standard 设置电视标准 NTSC PAL(SECAM)

-dv1394 设置DV1394捕获

-av device 设置音频设备 比如/dev/dsp


f)高级选项

-map file:stream 设置输入流映射

-debug 打印特定调试信息

-benchmark 为基准测试加入时间

-hex 倾倒每一个输入包

-bitexact 仅使用位精确算法 用于编解码测试

-ps size 设置包大小,以bits为单位

-re 以本地帧频读数据,主要用于模拟捕获设备

-loop 循环输入流。只工作于图像流,用于ffserver测试

nullfxp 新功能:文件管理器导航栏

in

最近有时间,为nullfxp添加了许多新功能,界面上的上主要改变是,添加一个与文件管理器类似的管理功能,如地址栏,前进,后退,向上导航按钮。
另一个是目录视图模式设置功能,可以修改目录视图的大小,适应不同环境应用。

技术总结,为显示不同视图模式,使用Qt中的MVC架构,不同视图模式下共享model数据,共享QSelectionModel相关知识。

nullfxp最新界面抓图,

把radioget的直播列表转换为smplayer直播列表的脚本

radioget是一个linux 下收听广播的小软件,可以媲美龙卷风很方便。
使用Qt4开发,移植性很好。
看了一下代码,发现广播列表在一个标准XML文件中,非常清晰。
本人经常用的是smplayer,所以写了一个脚本,把这个广播列表转换为smplayer的广播列表。
但smplayer不支持列表分类,最后的结果就是smplayer的广播列表非常长,但用起来还是相当方便的。

使用方法,下载并执行这个脚本,启动smplayer,在播放菜单中找广播列表。
执行的时候,如果smplayer正在运行,最好先退出。
随机测试了一下,大部分地址都还有效。

radioget_xml_to_smplayer_radio_list.sh:

#!/bin/sh
 
# format 
# #EXTM3U
# #EXTINF:0,vvv,
# mms://fff
# #EXTINF:0,fff,
# mms://ttt
 
# http://radioget.googlecode.com/svn/trunk/radiolist.xml
radio_get_xml=$HOME/.config/smplayer/radiolist.xml
smplayer_audio_file=$HOME/.config/smplayer/radio.m3u8
temp_audio_file=/tmp/radio.m3u8
 
if [ -f $radio_get_xml ] ; then
    mv -v $radio_get_xml ${radio_get_xml}.bak
fi
 
wget -O $radio_get_xml <a href="http://radioget.googlecode.com/svn/trunk/radiolist.xml</p>
<p>echo" title="http://radioget.googlecode.com/svn/trunk/radiolist.xml</p>
<p>echo">http://radioget.googlecode.com/svn/trunk/radiolist.xml</p>
<p>echo</p></a> "#EXTM3U" > $temp_audio_file
 
radio_caty=
while read radio
do
    # echo $radio
    xml_ver_line=`echo $radio|grep "xml version"`
    xml_root_line=`echo $radio|grep "RadioGet"`
    radio_cat_line=`echo $radio|grep "radiotag"`
    radio_addr_line=`echo $radio|grep "url"`
 
    if [ x"$xml_ver_line" = x"" ] ; then
        true;
    else
        continue;
    fi
 
    if [ x"$xml_root_line" = x"" ] ; then
        true;
    else
        continue;
    fi
 
    if [ x"$radio_cat_line" = x"" ] ; then
        true;
    else
        radio_caty=`echo $radio|awk -F\" '{print $2}'`
 
        if [ x"$radio_caty" = x"" ] ; then
            # echo $radio_caty
            true;
        fi
        continue;
    fi
 
    if [ x"$radio_addr_line" = x"" ] ; then
        true;
    else
        radio_addr=`echo $radio|awk -F\" '{print $4}'`
        radio_name=`echo $radio|awk -F\" '{print $2}'`
 
        echo $radio_caty "->" $radio_name "->" $radio_addr
        echo "#EXTINF:0,${radio_caty}->${radio_name}," >> $temp_audio_file
        echo "$radio_addr" >> $temp_audio_file
    fi
 
done < $radio_get_xml
 
mv -v $smplayer_audio_file ${smplayer_audio_file}.bak
cp -v $temp_audio_file ${smplayer_audio_file}

儿子幼儿园两天体验

in

前几日听说上奥有一幼儿园,可以先上两天体验课再决定要不要入园。于是,我决定把儿子送去试入园两天看看他的状态,如果他还比较能适应的话,就把他送幼儿园,我也可以工作了。

跟朋友商量好后,我们一起把俩孩子送去,幼儿园老师把儿子抱进去的时候,他还哇哇哭着叫妈妈,我跟他笑笑招招手说,没关、系,去玩吧。待老师把儿子抱进教室后,没多久就没再听见他的哭声了。我对儿子的适应能力是有信心的,因为他平时,就算我没在他身边,他也可以玩得挺好的,反而是如果我呆在他身边的话,他就总是会找我跟他一起玩。

不一会儿,就见老师出来跟我说我儿子现在没哭了,在玩积木,玩得挺好的,我跟老师交待了下翰林现在生活自理程度的状况。待朋友把她的儿子也放进幼儿园后,我们俩在外面呆了一会儿就去逛街了。

下午去接儿子时,老师跟我说了下儿子在幼儿园的情况,她告诉我说翰林在幼儿园里挺好的,就是十点钟时就困了,把他抱去睡觉,一直睡到下午一点钟,也没有哭,玩得也挺好。我看看翰林的裤子还是上午时我给他穿的那条,居然没尿湿裤子。我对这家幼儿园感觉挺不错的。问问儿子,明天还要不要再来幼儿园?翰林说,要来幼儿园。上楼的时候,翰林突然说觉觉,然后作趴桌子上睡的姿势。我挺意外的,因为他以前说觉觉都会趴在我肩上。而且他坐沙发上,我擦地板时,他又伸出手指来数数。他以前也会数,可是都没有掰过手指数过。我觉得他虽然还小,可以幼儿园里还是能学到东西的。

晚上,我就努力地想说服他爸把翰林送幼儿园。可翰林爸不同意,说他太小。

第二天,我又把儿子送去,这次把他一放下,跟他说进去吧,他就头也不回地往里走,快走到教室门口时,他回头看看我伸手叫妈妈,要我也去。我说你自己去玩吧。他正要往我这儿走,老师出来了,我让他跟老师进教室去,他向老师张开胳膊,老师顺势就把他抱进教室了。这一次也挺好。

中午的时候,幼儿园老师打电话来跟我说儿子在幼儿园的情况,说翰林中午吃完午饭后,自己跟别的小朋友一样,没用哄就在床上睡着了。我问她翰林有没有尿湿裤子,因为儿子就入园第一天我才给穿闭档裤的,之前都一直穿开档裤,有大小便都自己蹲下来解决的,所以我一直担心他不会说要大小便,会习惯性地蹲下,如果老师稍不注意,他会就尿湿裤子或者大便拉到裤子里。老师告诉我说,中午的时候尿湿过一次。我听着感觉挺好,我总觉得翰林现在的状态,有尿湿裤子才算正常。

下午去接儿子时,老师告诉我说要等一会儿,翰林正在拉大便,而且还有一个老师陪着他。我总算放下心来,这小子没把大便拉裤子里,而且对这儿的老师也更增一份好感。过了会儿,陪着翰林的老师牵着翰林出来边向我走来边说,翰林蹲着尿的,可能把裤子尿湿了一点点。而我看见翰林穿的还是那条我上午给穿的裤子,老师也告诉我说,中午尿湿过一次,难道没换吗?我把翰林接过来一看裤子,裤子后面湿了一大片,哪里是一点点啊,心里挺不是滋味的。我跟老师说,你把我带来的备用裤子拿给我吧,我自己给他换。老师这才进去把翰林的裤子拿了出来。我一把翰林身上的裤子脱下来,就闻到一股臭味,检查了一下,原来裤子里还有一坨大便。我眉头一皱,之前的好感一下子荡然无存。老师看见了也一个劲向我道歉。我把翰林裤子换好之后,跟老师说,我们回去再考虑下,明天再联系吧。其实,我已经决定不把翰林送到这儿了。还是翰林爸说的对,现在送确实早了点。

儿童专用Linux发行版-Qimo 2.0 发布

in

专门为儿童设计的 Linux 发行版 Qimo
已发布 2.0 版本。Qimo 2.0 有了一个新的角色,她的名字叫
Illa,一个聪明可爱的小北极熊;使用 Laby 替换了 eToys 游戏;如果你是 Ubuntu
10.04 用户的话,无需全新安装 Qimo,只要添加 qimo-session 包即可获得
Qimo。

下面是一些关于 Qimo 2.0 的截图:

你可以从这里下载 Qimo 2.0 的 ISO 映像

Read More:

nullfxp 中一个顽固的bug

in

自从本机上的Qt升级到 4.7.9999,这个bug显示出来,具体表现为,连接到ssh主机后列目录时挂掉,gdb的backtrace显示与QSortFilterProxyModel有关,出现mapToSource时异常。

调试了几天也没有找到问题关键,应该是原来的基础代码写的太乱了,如果再找不到考虑重新写下基础的RemoteDirModel这个类。

系统上装的非release版本软件太多,是不是这些问题也不得而知
gcc 4.5.0
qt-4.7.0-beta
glibc-2.11.1

AA:
几经调试,终于发现,还是原来的代码基础问题,没有严格按照Qt 4 的view/model架构修改model数据,导致数据更新后,打破有序执行序列,导致程序崩溃。
现在重写了代码,严格按照要求修改model中的数据,没有问题了。
改动比较大,nullfxp程序功能还没有完全恢复原有功能。

Syndicate content