Java 17+ Output Unicode Characters on Windows Showing Question Marks Reason and Solution

Java 17+ outputs Unicode characters on Windows platform showing question marks, this article will introduce the reason and solution for this issue.

Description

When using Unicode characters output to Windows, it will show question marks, as shown in the following image (using Java 21 here):

1
System.out.println("\uD83D\uDE02");

A question mark, indicating a possible encoding issue

Reason

Java 18 implemented a change, JEP 400: UTF-8 by Default:

Specify UTF-8 as the default charset of the standard Java APIs. With this change, APIs that depend upon the default charset will behave consistently across all implementations, operating systems, locales, and configurations.

It looks good, but one of the goals of this change is:

Standardize on UTF-8 throughout the standard Java APIs, except for console I/O.

Although our console encoding is UTF-8, PrintStream still uses the default encoding to output content to the console, which may be GBK or Windows-1252 encoding on Windows.

Additionally, you can check ((OutputStreamWriter) System.out.textOut.out).se.cs in debug to confirm the output encoding.

More detailed discussions can be found in this answer[1].

Solution

  1. Downgrade Java version to 17 or below.
  2. Set the console output character set to UTF-8 encoding.
1
System.setOut(new PrintStream(new FileOutputStream(FileDescriptor.out), true, StandardCharsets.UTF_8));

Then it's resolved

References


Java 17+ Output Unicode Characters on Windows Showing Question Marks Reason and Solution
https://blog.zhanganzhi.com/en/2024/04/da97145e767c/
Author
Andy Zhang
Posted on
April 11, 2024
Licensed under