fix: ensure embedded PostgreSQL databases use UTF-8 encoding
On macOS, `initdb` defaults to SQL_ASCII encoding because it infers locale from the system environment. When `ensurePostgresDatabase()` creates a database without specifying encoding, the new database inherits SQL_ASCII from the cluster. This causes string functions like `left()` to operate on bytes instead of characters, producing invalid UTF-8 when multi-byte characters are truncated. Two-part fix: 1. Pass `--encoding=UTF8 --locale=C` via `initdbFlags` to all EmbeddedPostgres constructors so the cluster defaults to UTF-8. 2. Explicitly set `encoding 'UTF8'` in the CREATE DATABASE statement with `template template0` (required because template1 may already have a different encoding) and `C` locale for portability. Existing databases created with SQL_ASCII are NOT automatically fixed; users must delete their local `data/db` directory and restart to re-initialize the cluster. Relates to #636 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -730,7 +730,7 @@ export async function ensurePostgresDatabase(
|
||||
`;
|
||||
if (existing.length > 0) return "exists";
|
||||
|
||||
await sql.unsafe(`create database "${databaseName}"`);
|
||||
await sql.unsafe(`create database "${databaseName}" encoding 'UTF8' lc_collate 'C' lc_ctype 'C' template template0`);
|
||||
return "created";
|
||||
} finally {
|
||||
await sql.end();
|
||||
|
||||
Reference in New Issue
Block a user