DEV Community

Rafael Bernard Araújo
Rafael Bernard Araújo

Posted on • Originally published at rafael.bernard-araujo.com on

Recover and replace win1252 content to utf-8 for a PostgreSQL database

I shouldn't be the first to need to export data from a PostgreSQL database installed on Windows with a win1252 code to a database with UTF-8 code (in my case, on a Linux server).

It is not enough to transform the transfer file to UTF-8, as the characters of win1252 (left double quote, right double quote, single quote and dash) will be there, with a weird value in your database.

In my experience, I had to import data as-is and afterwards perform an update using the function to rewrite data for UTF-8 characters correctly.

The following code examples are for: 1 - transform to HTML characters; 2 - transform to single characters.

HTML (https://gist.github.com/rafaelbernard/1029869):

-- DROP FUNCTION substitui_win1252_html(texto);

CREATE OR REPLACE FUNCTION substitui_win1252_html(texto text)
    RETURNS text AS
$$
BEGIN

    texto := replace(replace(replace(replace(replace(texto, '’', '’'), '“', '“'), '”', '”'), '•','•'), '–', '–')::text; 
    RETURN texto;

END;
$$
LANGUAGE plpgsql;
Enter fullscreen mode Exit fullscreen mode

Simple (https://gist.github.com/rafaelbernard/1029899):

- DROP FUNCTION substitui_win1252(texto);

CREATE OR REPLACE FUNCTION substitui_win1252(texto text)
    RETURNS text AS
$$
BEGIN

    texto := replace(replace(replace(replace(replace(texto, '’', ''''), '“', '"'), '”', '"'), '•','•'), '–', '-')::text;  
    RETURN texto;

END;
$$
LANGUAGE plpgsql;
Enter fullscreen mode Exit fullscreen mode

Don't worry about the squares that you might see. If you copy it to a good text editor, you will their actual value.

Top comments (0)