Why twitter sometimes counts the Arabic words for “peace be upon him” as a single character

This video from the programmer and linguist Tom Scott points out something I’d never realised about twitter:

Essentially, what it counts as a character is not necessarily a letter or a punctuation mark. Sometimes Unicode, the system which is used to standardise writing across programs and website, will encode particular common combinations of letters with a single character. عليه السلام is apparently one of them.

Disclaimer: I’ve tried and failed to replicate this myself. However, I did manage something similar but less elegant with some English words.

Advertisements

One thought on “Why twitter sometimes counts the Arabic words for “peace be upon him” as a single character

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s