Treatment of Unicode canoncal decomposition among operating systems

11/28/2017
by   Efstratios Rappos, et al.
0

This article shows how the text characters that have multiple representations under the Unicode standard are treated by popular operating systems. Whilst most characters have a unique representation in Unicode, some characters such as the accented European letters, can have multiple representations due to a feature of Unicode called normalization. These characters are treated differently by popular operating systems, leading to additional challenges during interoperability of computer programs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/13/2021

What Do We See: An Investigation Into the Representation of Disability in Video Games

There has been a large body of research focused on the representation of...
research
10/06/2022

Computing groups of Hecke characters

We describe algorithms to represent and compute groups of Hecke characte...
research
04/06/2018

Learning Joint Gaussian Representations for Movies, Actors, and Literary Characters

Understanding of narrative content has become an increasingly popular to...
research
10/16/2014

Improve CAPTCHA's Security Using Gaussian Blur Filter

Providing security for webservers against unwanted and automated registr...
research
11/10/2020

On-Device Language Identification of Text in Images using Diacritic Characters

Diacritic characters can be considered as a unique set of characters pro...
research
08/10/2019

RISC-V: #AlphanumericShellcoding

We explain how to design RISC-V shellcodes capable of running arbitrary ...
research
01/05/2021

edATLAS: An Efficient Disambiguation Algorithm for Texting in Languages with Abugida Scripts

Abugida refers to a phonogram writing system where each syllable is repr...

Please sign up or login with your details

Forgot password? Click here to reset