fix: ensure embedded PostgreSQL databases use UTF-8 encoding
On macOS, `initdb` defaults to SQL_ASCII encoding because it infers locale from the system environment. When `ensurePostgresDatabase()` creates a database without specifying encoding, the new database inherits SQL_ASCII from the cluster. This causes string functions like `left()` to operate on bytes instead of characters, producing invalid UTF-8 when multi-byte characters are truncated. Two-part fix: 1. Pass `--encoding=UTF8 --locale=C` via `initdbFlags` to all EmbeddedPostgres constructors so the cluster defaults to UTF-8. 2. Explicitly set `encoding 'UTF8'` in the CREATE DATABASE statement with `template template0` (required because template1 may already have a different encoding) and `C` locale for portability. Existing databases created with SQL_ASCII are NOT automatically fixed; users must delete their local `data/db` directory and restart to re-initialize the cluster. Relates to #636 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -334,6 +334,7 @@ export async function startServer(): Promise<StartedServer> {
|
||||
password: "paperclip",
|
||||
port,
|
||||
persistent: true,
|
||||
initdbFlags: ["--encoding=UTF8", "--locale=C"],
|
||||
onLog: appendEmbeddedPostgresLog,
|
||||
onError: appendEmbeddedPostgresLog,
|
||||
});
|
||||
|
||||
Reference in New Issue
Block a user