PostgreSQL for Data Storage
PostgreSQL is a powerful, open source object-relational database system. It has more than 15 years of active development and a proven architecture that has earned it a strong reputation for reliability, data integrity, and correctness. It runs on all major operating systems, including Linux, UNIX (AIX, BSD, HP-UX, SGI IRIX, Mac OS X, Solaris, Tru64), and Windows. It is fully ACID compliant, has full support for foreign keys, joins, views, triggers, and stored procedures (in multiple languages). It includes most SQL:2008 data types, including INTEGER, NUMERIC, BOOLEAN, CHAR, VARCHAR, DATE, INTERVAL, and TIMESTAMP. It also supports storage of binary large objects, including pictures, sounds, or video. It has native programming interfaces for C/C++, Java, .Net, Perl, Python, Ruby, Tcl, ODBC, among others, and exceptional documentation.
An enterprise class database, PostgreSQL boasts sophisticated features such as Multi-Version Concurrency Control (MVCC), point in time recovery, tablespaces, asynchronous replication, nested transactions (savepoints), online/hot backups, a sophisticated query planner/optimizer, and write ahead logging for fault tolerance. It supports international character sets, multibyte character encodings, Unicode, and it is locale-aware for sorting, case-sensitivity, and formatting. It is highly scalable both in the sheer quantity of data it can manage and in the number of concurrent users it can accommodate. There are active PostgreSQL systems in production environments that manage in excess of 4 terabytes of data. Some general PostgreSQL limits are included in the table below.
|Maximum Database Size||Unlimited|
|Maximum Table Size||32 TB|
|Maximum Row Size||1.6 TB|
|Maximum Field Size||1 GB|
|Maximum Rows per Table||Unlimited|
|Maximum Columns per Table||250 – 1600 depending on column types|
|Maximum Indexes per Table||Unlimited|
PostgreSQL has won praise from its users and industry recognition, including the Linux New Media Award for Best Database System and five time winner of the The Linux Journal Editors’ Choice Award for best DBMS.
Multiversion concurrency control (MVCC)
PostgreSQL manages concurrency through a system known as multiversion concurrency control (MVCC), which gives each transaction a “snapshot” of the database, allowing changes to be made without being visible to other transactions until the changes are committed. This largely eliminates the need for read locks, and ensures the database maintains the ACID (atomicity, consistency, isolation, durability) principles in an efficient manner. PostgreSQL offers three levels of transaction isolation: Read Committed, Repeatable Read and Serializable. Because PostgreSQL is immune to dirty reads, requesting a Read Uncommitted transaction isolation level provides read committed instead. Prior to PostgreSQL 9.1, requesting Serializable provided the same isolation level as Repeatable Read. PostgreSQL 9.1 and later support full serializability via the serializable snapshot isolation (SSI) technique.
PostgreSQL is available for the following operating systems: Linux (all recent distributions), Windows (Windows 2000 SP4 and later) (compilable by e.g. Visual Studio, now with up to most recent 2015 version), DragonFly BSD, FreeBSD, OpenBSD, NetBSD, Mac OS X, AIX, BSD/OS, HP-UX, IRIX, OpenIndiana, OpenSolaris, SCO OpenServer, SCO UnixWare, Solaris and Tru64 Unix. In 2012, support for the following obsolete systems was removed (still supported in 9.1): DG/UX, NeXTSTEP, SunOS 4, SVR4, Ultrix 4, and Univel. Most other Unix-like systems should also work.
PostgreSQL works on any of the following instruction set architectures: x86 and x86-64 on Windows and other operating systems; these are supported on other than Windows: IA-64 Itanium, PowerPC, PowerPC 64, S/390, S/390x, SPARC, SPARC 64, Alpha, ARMv8-A (64-bit) and older ARM (32-bit, including older such as ARMv6 in Raspberry Pi), MIPS, MIPSel, M68k, and PA-RISC. It is also known to work on M32R, NS32k, and VAX. In addition to these, it is possible to build PostgreSQL for an unsupported CPU by disabling spinlocks.
Storage and replication
PostgreSQL, beginning with version 9.0, includes built-in binary replication, based on shipping the changes (write-ahead logs) to replica nodes asynchronously.
Version 9.0 also introduced the ability to run read-only queries against these replicated nodes, where earlier versions would only allow that after promoting them to be a new master. This allows splitting read traffic among multiple nodes efficiently. Earlier replication software that allowed similar read scaling normally relied on adding replication triggers to the master, introducing additional load onto it.
Beginning from version 9.1, PostgreSQL also includes built-in synchronous replication that ensures that, for each write transaction, the master waits until at least one replica node has written the data to its transaction log. Unlike other database systems, the durability of a transaction (whether it is asynchronous or synchronous) can be specified per-database, per-user, per-session or even per-transaction. This can be useful for work loads that do not require such guarantees, and may not be wanted for all data as it will have some negative effect on performance due to the requirement of the confirmation of the transaction reaching the synchronous standby.
There can be a mixture of synchronous and asynchronous standby servers. A list of synchronous standby servers can be specified in the configuration which determines which servers are candidates for synchronous replication. The first in the list which is currently connected and actively streaming is the one that will be used as the current synchronous server. When this fails, it falls to the next in line.
Synchronous multi-master replication is currently not included in the PostgreSQL core. Postgres-XC which is based on PostgreSQL provides scalable synchronous multi-master replication, available in version 1.2.1 (April 2015 version) is licensed under the same license as PostgreSQL. A similar project is called Postgres-XL and is available under the Mozilla Public License. Postgres-R is yet another older fork. Bi-Directional Replication (BDR) is an asynchronous multi-master replication system for PostgreSQL.
The community has also written some tools to make managing replication clusters easier, such as repmgr.
There are also several asynchronous trigger-based replication packages for PostgreSQL. These remain useful even after introduction of the expanded core capabilities, for situations where binary replication of an entire database cluster is not the appropriate approach:
- Londiste, part of SkyTools (developed by Skype)
- Bucardo multi-master replication (developed by Backcountry.com)
- SymmetricDS multi-master, multi-tier replication
PostgreSQL includes built-in support for regular B-tree and hash indexes, and four index access methods: generalized search trees (GiST), generalized inverted indexes (GIN), Space-Partitioned GiST (SP-GiST) and Block Range Indexes (BRIN). Hash indexes are implemented, but discouraged because they cannot be recovered after a crash or power loss. In addition, user-defined index methods can be created, although this is quite an involved process. Indexes in PostgreSQL also support the following features:
- Expression indexes can be created with an index of the result of an expression or function, instead of simply the value of a column.
- Partial indexes, which only index part of a table, can be created by adding a WHERE clause to the end of the CREATE INDEX statement. This allows a smaller index to be created.
- The planner is capable of using multiple indexes together to satisfy complex queries, using temporary in-memory bitmap index operations.
- As of PostgreSQL 9.1, k-nearest neighbors (k-NN) indexing (also referred to KNN-GiST) provides efficient searching of “closest values” to that specified, useful to finding similar words, or close objects or locations with geospatial data. This is achieved without exhaustive matching of values.
- In PostgreSQL 9.2 and above, index-only scans often allow the system to fetch data from indexes without ever having to access the main table.
- PostgreSQL 9.5 introduced Block Range Indexes (BRIN).
In PostgreSQL, a schema holds all objects (with the exception of roles and tablespaces). Schemas effectively act like namespaces, allowing objects of the same name to co-exist in the same database.
By default, newly created databases have a schema called “public”, but any additional schemas can be added, and the public schema isn’t mandatory. A “search_path” determines the order in which the system checks schemas for unqualified objects (those without a prefixed schema); one can configure search paths on a database or role level. The search path, by default, contains the special schema name of “$user”, which first looks for a schema named after the connected database user (e.g. if the user “dave” were connected, it would first look for a schema also named “dave” when referring to any objects). If such a schema is not found, it then proceeds to the next listed schema. New objects are created in whichever valid schema (one that presently exists) appears first in the search path.
A wide variety of native data types are supported, including:
- Arbitrary precision numerics
- Character (text, varchar, char)
- Date/time (timestamp/time with/without timezone, date, interval)
- Bit strings
- Text search type
- HStore (an extension enabled key-value store within PostgreSQL)
- Arrays (variable length and can be of any data type, including text and composite types) up to 1 GB
- in total storage size
- Geometric primitives
- IPv4 and IPv6 addresses
- CIDR blocks and MAC addresses
- XML supporting XPath queries
- JSON (since version 9.2), and a faster binary JSONB (since version 9.4; not the same as BSON
In addition, users can create their own data types which can usually be made fully indexable via PostgreSQL’s indexing infrastructures – GiST, GIN, SP-GiST. Examples of these include the geographic information system (GIS) data types from the PostGIS project for PostgreSQL.
There is also a data type called a “domain”, which is the same as any other data type but with optional constraints defined by the creator of that domain. This means any data entered into a column using the domain will have to conform to whichever constraints were defined as part of the domain.
Starting with PostgreSQL 9.2, a data type that represents a range of data can be used which are called range types. These can be discrete ranges (e.g. all integer values 1 to 10) or continuous ranges (e.g. any point in time between 10:00 am and 11:00 am). The built-in range types available include ranges of integers, big integers, decimal numbers, time stamps (with and without time zone) and dates.
Custom range types can be created to make new types of ranges available, such as IP address ranges using the inet type as a base, or float ranges using the float data type as a base. Range types support inclusive and exclusive range boundaries using the  and () characters respectively. (e.g. ‘[4,9)’ represents all integers starting from and including 4 up to but not including 9.) Range types are also compatible with existing operators used to check for overlap, containment, right of etc.
New types of almost all objects inside the database can be created, including:
- Data types
- Functions, including aggregate functions and window functions
- Indexes including custom indexes for custom types
- Operators (existing ones can be overloaded)
- Procedural languages
Tables can be set to inherit their characteristics from a “parent” table. Data in child tables will appear to exist in the parent tables, unless data is selected from the parent table using the ONLY keyword, i.e. SELECT * FROM ONLY parent_table;. Adding a column in the parent table will cause that column to appear in the child table.
Inheritance can be used to implement table partitioning, using either triggers or rules to direct inserts to the parent table into the proper child tables.
As of 2010, this feature is not fully supported yet—in particular, table constraints are not currently inheritable. All check constraints and not-null constraints on a parent table are automatically inherited by its children. Other types of constraints (unique, primary key, and foreign key constraints) are not inherited.
Inheritance provides a way to map the features of generalization hierarchies depicted in Entity Relationship Diagrams (ERD) directly into the PostgreSQL database.
Other storage feature
- Referential integrity constraints including foreign key constraints, column constraints, and row checks
- Binary and textual large-object storage
- Per-column collation (from 9.1)
- Online backup
- Point-in-time recovery, implemented using write-ahead logging
- In-place upgrades with pg_upgrade for less downtime (supports upgrades from 8.3.x and later)
PostgreSQL offers many advantages for your company or business over other database systems.
Immunity to over-deployment
Over-deployment is what some proprietary database vendors regard as their #1 licence compliance problem. With PostgreSQL, no-one can sue you for breaking licensing agreements, as there is no associated licensing cost for the software.
This has several additional advantages:
- More profitable business models with wide-scale deployment.
- No possibility of being audited for license compliance at any stage.
- Flexibility to do concept research and trial deployments without needing to include additional licensing costs.
Better support than the proprietary vendors
In addition to our strong support offerings, we have a vibrant community of PostgreSQL professionals and enthusiasts that your staff can draw upon and contribute to.
Significant saving on staffing costs
Our software has been designed and created to have much lower maintenance and tuning requirements than the leading proprietary databases, yet still retain all of the features, stability, and performance.
In addition to this, our training programs are generally regarded as being far more cost effective, manageable, and practical in the real world than that of the leading proprietary database vendors.
Legendary reliability and stability
Unlike many proprietary databases, it is extremely common for companies to report that PostgreSQL has never, ever crashed for them in several years of high activity operation. Not even once. It just works.
The source code is available to all at no charge. If your staff have a need to customise or extend PostgreSQL in any way then they are able to do so with a minimum of effort, and with no attached costs. This is complemented by the community of PostgreSQL professionals and enthusiasts around the globe that also actively extend PostgreSQL on a daily basis.
PostgreSQL is available for almost every brand of Unix (34 platforms with the latest stable release), and Windows compatibility is available via the Cygwin framework. Native Windows compatibility is also available with version 8.0 and above.
Designed for high volume environments
We use a multiple row data storage strategy called MVCC to make PostgreSQL extremely responsive in high volume environments. The leading proprietary database vendor uses this technology as well, for the same reasons.
GUI database design and administration tools
There are many high-quality GUI Tools available for PostgreSQL from both open source developers and commercial providers. A list is available on our wiki as a community guide to PostgreSQL GUI Tools.
PostgreSQL manages its internal security on a per-role basis. A role is generally regarded to be a user (a role that can log in), or a group (a role of which other roles are members). Permissions can be granted or revoked on any object down to the column level, and can also allow/prevent the creation of new objects at the database, schema or table levels.
The sepgsql extension (provided with PostgreSQL as of version 9.1) provides an additional layer of security by integrating with SELinux. This utilises PostgreSQL’s SECURITY LABEL feature.
PostgreSQL natively supports a broad number of external authentication mechanisms, including:
- password (either MD5 or plain-text)
- ident (maps O/S user-name as provided by an ident server to database user-name)
- peer (maps local user name to database user name)
- Active Directory
The GSSAPI, SSPI, Kerberos, peer, ident and certificate methods can also use a specified “map” file that lists which users matched by that authentication system are allowed to connect as a specific database user.
These methods are specified in the cluster’s host-based authentication configuration file (pg_hba.conf), which determines what connections are allowed. This allows control over which user can connect to which database, where they can connect from (IP address/IP address range/domain socket), which authentication system will be enforced, and whether the connection must use TLS.