<?xml version="1.0" encoding="UTF-8"?>
Apart from these there is BOM issue while saving the XMLs with Unicode characters. Many Windows based text editors add the bytes 0xEF,0xBB,0xBF at the start of document saved in UTF-8 encoding. These set of bytes are Unicode byte-order mark (BOM) though are not relevant to byte order. The BOM can also appear if another encoding with a BOM is translated to UTF-8 without stripping it.
The presence of the UTF-8 BOM may cause interoperability problems with existing software that could otherwise handle UTF-8, for example:
- Older text editors may display the BOM as "" at the start of the document, even if the UTF-8 file contains only ASCII and would otherwise display correctly.
- Programming language parsers can often handle UTF-8 in string constants and comments, but cannot parse the BOM at the start of the file.
- Programs that identify file types by leading characters may fail to identify the file if a BOM is present even if the user of the file could skip the BOM. Or conversely they will identify the file when the user cannot handle the BOM. An example is the UNIX shebang syntax.
- Programs that insert information at the start of a file will result in a file with the BOM somewhere in the middle of it (this is also a problem with the UTF-16 BOM). One example is offline browsers that add the originating URL to the start of the file
No comments:
Post a Comment