There are many types of "hyphen".

2020/05/12

When you hear the ``*'' asterisk, you feel almighty, and it's Yugeta, whose brain resembles a regular expression.

I was surprised by the variety of hyphens

Some people may have noticed that when you press the "-" (minus) key in full-width mode on your computer and a list of conversion candidates is displayed, there are a lot of "-" characters.

If you write out things that are not included, it will look like the following.

1. - : long sound
2. ➖ : Emoji (not hyphen)
3. 〰 : [all] emojis (not hyphens)
4. - : [half] hyphen minus
5. − : [all] hyphen minus
6. ~: [full] wave dash (not a hyphen)
7. ~ : [half] wave dash (not a hyphen)
8. ― : [all] dash
9. - : [all] hyphen
10. - : [All]
11. - : [Half]
12. − : [all]
13. - : [half] katakana

Since it is displayed on a Mac, some characters may be garbled depending on the OS you are viewing. Some of them are not hyphens, but I notice that there are several names for them, such as hyphens, dashes, minuses, etc. Of course, the character code seems to be different as well. (Code is ascii code and hexadecimal number)

[-: 12540, 0x30FC] Long sound
[ - : 45 , 0x2D] [half] hyphen minus
[−: 8722, 0x2212] [All] Hyphen Minus
[-: 8213, 0x2015] [all] dash
[-: 8208, 0x2010] [all] hyphen
[-: 12540, 0x30FC] [All]
[ - : 45 , 0x2D] [Half]
[−: 8722, 0x2212] [All]
[-: 65392, 0xFF70] [Half] Katakana

You can see that [half] and [full] at the bottom are the same as "hyphen, minus", but other than that, they are different. The difference between half-width katakana and half-width hyphen cannot be determined with the human eye as far as I can see on a Mac terminal.

Hyphen troubles that system engineers may encounter

Perhaps you should be concerned about whether the text is half-width or full-width, such as when you are entering text in a sentence, However, you need to be careful about the character strings you register in the system. For example, when you register a user's zip code or phone number on an e-commerce site, you need to be careful about hyphens when you write 000-0000 or 03-0000-0000, However, it is honestly unknown which character code the user will use to convert the hyphen to. In fact, when we try it on a smartphone site, we find that not only half-width and full-width characters but also underscore “_” are used by some users, and considering the work required to break them down into area codes, it is a specification that makes engineers cry. Translated with DeepL.com (free version)

So-so usable regular expression samples(for japan)

postal code /([0-9０-９]{3}).*?([0-9０-９]{4})/ Phone number /([0-9０-９]{2,4}).*?([0-9０-９]{2,4}).*?([0-9０-９]{2,4})/ Regular expression samples using javascript code

"0ー1-2−3―4‐5ー6ｰ7".match(/(\u30FC).*?(\x2D).*?(\u2212).*?(\u2015).*?(\u2010).*?(\u30FC).*?(\uFF70)/)

```
0: "ー1-2−3―4‐5ー6ｰ"
1: "ー"
2: "-"
3: "−"
4: "―"
5: "‐"
6: "ー"
7: "ｰ"
```

All of them were successfully caught. Note that one-byte characters are prefixed with “x” and double-byte characters with “u”.

There are many types of "hyphen".

I was surprised by the variety of hyphens

Hyphen troubles that system engineers may encounter

So-so usable regular expression samples(for japan)

0 件のコメント:

コメントを投稿

人気の投稿

このブログを検索

ごあいさつ

ブログアーカイブ

ラベル

There are many types of "hyphen".

I was surprised by the variety of hyphens

Hyphen troubles that system engineers may encounter

So-so usable regular expression samples(for japan)

0 件のコメント:

コメントを投稿

人気の投稿

このブログを検索

ごあいさつ

ブログ アーカイブ

ラベル

ブログアーカイブ