PostgreSQL的clog属于日志还是数据，需要遵守write-WAL-before-data吗？

来源:腾讯云时间：2023年03月05日

(资料图片仅供参考)

总结

从原理上来看，MVCC需要给定事务ID后，能查询到事务的状态。

在PG中事务状态可以从几个路径获取：

在快照中查询（活跃事务）在元组头的状态为查询（不活跃事务）在CLOG中查询（不活跃事务）

如果不看实现只看概念，不活跃事务提交状态也可以在XLOG中查询，CLOG可以视作一种XLOG commit/rollback日志的缓存、映射，一种事务提交状态的快速查询方式。

所以在write-WAL-before-data中，CLOG也会按照data来处理，只有XLOG属于WAL。

Postgresql中clog写盘实现SlruPhysicalWritePage

postgresql中clog使用SLRU机制读写，在Slru写盘前，会有保证xlog先写的机制：

group_lsn表示32个事务一组中最大的日志序列号（LSN）。group_lsn主要用于事务提交非同步落盘的场景。

static boolSlruPhysicalWritePage(SlruCtl ctl, int pageno, int slotno, SlruWriteAll fdata){...if (shared->group_lsn != NULL){/* * We must determine the largest async-commit LSN for the page. This * is a bit tedious, but since this entire function is a slow path * anyway, it seems better to do this here than to maintain a per-page * LSN variable (which"d need an extra comparison in the * transaction-commit path). */XLogRecPtrmax_lsn;intlsnindex,lsnoff;lsnindex = slotno * shared->lsn_groups_per_page;max_lsn = shared->group_lsn[lsnindex++];for (lsnoff = 1; lsnoff < shared->lsn_groups_per_page; lsnoff++){XLogRecPtrthis_lsn = shared->group_lsn[lsnindex++];if (max_lsn < this_lsn)max_lsn = this_lsn;    <<<<<<<<<<<<<<<<<<<<<<<<< 找到最大的LSN}if (!XLogRecPtrIsInvalid(max_lsn)){/* * As noted above, elog(ERROR) is not acceptable here, so if * XLogFlush were to fail, we must PANIC.  This isn"t much of a * restriction because XLogFlush is just about all critical * section anyway, but let"s make sure. */START_CRIT_SECTION();XLogFlush(max_lsn);      <<<<<<<<<<<<<<<<<<<<<<<<< 先保证XLOG写到这个位点！END_CRIT_SECTION();}}  ...  if (pg_pwrite(fd, shared->page_buffer[slotno], BLCKSZ, offset) != BLCKSZ)  {    ...  }}

Postgresql中用户数据写盘实现FlushBuffer

数据页面同理，也是先找到页面lsn，刷xlog，在写数据。

static voidFlushBuffer(BufferDesc *buf, SMgrRelation reln){...buf_state = LockBufHdr(buf);/* * Run PageGetLSN while holding header lock, since we don"t have the * buffer locked exclusively in all cases. */recptr = BufferGetLSN(buf);   <<<<<<<<<<<<<<<<<<<<<<<<< 找到页面的LSN/* To check if block content changes while flushing. - vadim 01/17/97 */buf_state &= ~BM_JUST_DIRTIED;UnlockBufHdr(buf, buf_state);/* * Force XLOG flush up to buffer"s LSN.  This implements the basic WAL * rule that log updates must hit disk before any of the data-file changes * they describe do. * * However, this rule does not apply to unlogged relations, which will be * lost after a crash anyway.  Most unlogged relation pages do not bear * LSNs since we never emit WAL records for them, and therefore flushing * up through the buffer LSN would be useless, but harmless.  However, * GiST indexes use LSNs internally to track page-splits, and therefore * unlogged GiST pages bear "fake" LSNs generated by * GetFakeLSNForUnloggedRel.  It is unlikely but possible that the fake * LSN counter could advance past the WAL insertion point; and if it did * happen, attempting to flush WAL through that location would fail, with * disastrous system-wide consequences.  To make sure that can"t happen, * skip the flush if the buffer isn"t permanent. */if (buf_state & BM_PERMANENT)XLogFlush(recptr);         <<<<<<<<<<<<<<<<<<<<<<<<< 先保证XLOG写到这个位点！    ...smgrwrite(reln,  BufTagGetForkNum(&buf->tag),  buf->tag.blockNum,  bufToWrite,  false);  ...}

标签： PostgreSQL