using unicode in Javascript

Question

Welcome To Ask or Share your Answers For Others

using unicode in Javascript

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

using unicode in Javascript

In JavaScript we can use the below line of code(which uses Unicode) for displaying copyright symbol:

var x = "u00A9 RPeripherals";

Why can't we type the copyright symbol directly using ALT code (alt+0169) like below :

var x = "? RPeripherals" ;

What is the difference between these two methods?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:23:41+0000

Why can't we type the copyright symbol directly using ALT code (alt+0169) like below :

Who says so? Of course you can. Just configure your code editor to use UTF-8 encoding for source files. You should never use anything else to begin with...

What is the difference between these two methods?

The difference is that using the uXXXX scheme you are transmitting at best 2 and at worst 5 extra bytes on the wire. This kind of spelling may help if you need to embed characters in your source code, which your font cannot display properly. For example, I don't have traditional Chinese characters in the font I'm using for programming, so if I type Chinese characters into my code editor, I'll see a bunch of question marks or rectangles with Unicode codepoint digits instead of actual characters. But someone who has Chinese glyphs in the font wouldn't have that problem.

If me and that person want to share our source code, it would be preferable that the other person uses uXXXX scheme, as I would be able to verify which character is that by looking it up in the Unicode table. That's about all the difference.

EDIT

ECMAScript standard (v 262/5.1) says specifically that

A conforming implementation of this Standard shall interpret characters in conformance with the Unicode Standard, Version 3.0 or later and ISO/IEC 10646-1 with either UCS-2 or UTF-16 as the adopted encoding form, implementation level 3. If the adopted ISO/IEC 10646-1 subset is not otherwise specified, it is presumed to be the BMP subset, collection 300. If the adopted encoding form is not otherwise specified, it presumed to be the UTF-16 encoding form.

So, the standard guarantees that character encoding is Unicode, and enforces the use of UTF-16 (that's strange, I thought it was UTF-8), but I don't think that this is what happens in practice... I believe that browsers use UTF-8 as default. Perhaps this have changed in the later standards, but this is the one last universally accepted.

Categories

using unicode in Javascript

using unicode in Javascript

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags