Archive

Posts Tagged ‘unicode’

Sample Tomcat7 setup

February 8th, 2013 skotfred No comments

There are a few steps that I generally take to setup a new Tomcat server instance, this enables the following:

  • The manager console
  • HTTP compression
  • UTF-8 encoding

Steps:

  1. tomcat-users.xml – add to bottom:

    <role rolename="manager-gui"/>
    <user username="tomcat" password="s3cr3t" roles="manager-gui"/>

  2. server.xml – add compression and URIEncoding, change port if desired:

    <Connector port="8080" protocol="HTTP/1.1"
    connectionTimeout="20000"
    redirectPort="8443" compression="on" URIEncoding="UTF-8" />

  3. server.xml – relocate webapps by adding ../ to appBase

    <Host name="localhost" appBase="../webapps"
    unpackWARs="true" autoDeploy="true">

  4. Restart your server, on Ubuntu use:

    sudo service tomcat7 restart

Browser performance impact of charset/codepage assignment

November 17th, 2012 skotfred No comments

Most developers (myself included) are often unaware of the performance impact of the Content-Type / charset of a web page. Ideally you should set this as an HTTP Header vs. using META http-equiv. It’s often though that this only helps with the transport and display of data, however, the browser also makes use of it when parsing CSS & JS assets. Tags related to those provide an optional ‘charset‘ attribute should they ever need to vary from your content.

General guidance is to set this at the very top of the <head> before <title&gt; and within the first 1024 bytes, though there are reports that Firefox will look at the first 2048 bytes of the page for this META information.

Not doing so may cause the browser to do a codepage restart to re-parse assets that were interpreted in the potentially incorrect codepage.

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

REFERENCES:

nbsp; and other common entities do not validate as HTML5!

October 4th, 2012 skotfred No comments

The only built-in entities in XML are &, <, >, " and ' XHTML added the others via a DTD that is not a part of HTML5. As such, validators will report them as errors.

Safe replacements are the decimal notation: &#160; or the character itself U+00A0;

Quite a few other common symbols are not available without similar changes.

  • &lt; = &#60;
  • &gt; = &#62;
  • &amp; = &#38;
  • &apos; = &#39;
  • &quot; = &#34;
  • &nbsp;&#160;
  • &copy; = &#169;
  • &reg; = &#174;
  • &trade; = &#8482;

REFERENCES:

Defining word-break and word-wrap in CSS

November 29th, 2011 skotfred No comments

I recently found a case where WebKit (Chromium, and Safari) was acting as if ‘overflow-x:visible;‘ was set in cases where text could not be wrapped inside a DIV due to a lack of spaces or hyphenation as it was a java stack trace. In this case I had to explicitly set the ‘word-wrap:break-word;‘ attribute for the problematic DIV.


.breakword { word-wrap: break-word; }

Also, for Unicode languages where there are other rules to complex to describe here…

.wordbreak { word-break: keep-all; }

Rupee

November 28th, 2011 skotfred No comments

I’ve done a lot of Internationalization(I18N) and Localization (L10N) work in my various development positions. One particularly troubling area is currency support. Support of number formats is generally well supported (or can be accomplished with some trivial input translation). However, the tricky area come with support for currency symbols, western currencies such as USD (US$) and CAD(C$) and the Euro (EUR or €) are well supported across character sets and fonts some are not. One particular item is for the Indian Rupee (INR). Ubuntu 10.10 is the first operating system to ship with a font that supports this character ₹

Unicode = &#x20b9;

JavaScript TextNode for special characters

November 22nd, 2011 skotfred No comments

It can be difficult to create or output some characters as JavaScript TextNodes. Typically you might try to use the ampersand notation, unfortunately the ampersand itself gets escaped to & as such   becomes &nbsp; Use of the Unicode notation can easily resolve this issue.

Example for NBSP:
var nsbptextnode = document.createTextNode('\u00A0');

Eclipse ResourceBundle Editor

February 21st, 2008 skotfred No comments

I typically use the open-source Eclipse IDE for most of my Java and PHP work. For my corporate work, this means that I use IBM’s packaged RAD and WSAD offerings that are based on various versions of the Eclipse framework.

When working on Internationalized (I18n) applications, most experienced Java architects rely on ResourceBundles to store the various text that is needed for different languages, problem is that editing these files becomes problematic, especially when dealing with multi-byte character sets as are often used in Unicode (non Latin-1, aka ISO-8859-1) languages.

The best editor I’ve found for this case is, as you may have guessed, free for download.

Here’s the links:

Cheers!

Downloadable WebFonts

February 15th, 2008 skotfred No comments

To maintain accessibility and SEO (Search Engine Optimization), there’s often a need to be creative with fonts. This is sometimes due to aesthetics, but often to meet technical needs like foreign non-Latin languages that have unique characters/glyphs not normally installed on workstations. Producing images for each character would be very time consuming, bandwidth intensive and destroy search engine rankings.

Create embedded fonts using one of 2 available formats:

1. Portable Font Resources (.pfr): TrueDoc technology was developed by Bitstream and licensed by Netscape. It can be viewed by Navigator 4.0+ and Explorer 4.0+ on Windows, Mac, and Unix platforms.

<link rel = “fontdef” src=”myfont.pfr” />

2. Embeddable Open Type (.eot): Compatible only with Explorer 4.0+ on the Windows platform. Create .eot files using Microsoft’s free Web Embedding Font Tool (WEFT).

<style type=”text/css”>
<–!
@font-face {
src:url(/fonts/myfont.eot);
}
–>
</style>

References:

Tooling:

Tutorials:

Cheers!

UTF-8 (BOM) prevents java compilation

August 20th, 2007 skotfred No comments

I had an adventure tracking this one down lately, it seems that if your IDE saves files as UTF-8, the java compiler can’t always resolve the files.

Here’s the errors from the console output:

[INFO] ————————————————————————
[ERROR] BUILD FAILURE
[INFO] ————————————————————————
[INFO] Compilation failure

C:\Sandbox\Jars\example.jar\src\main\java\com\giantgeek\Example.java:[1,0] ‘class’ or ‘interface’ expected

C:\Sandbox\Jars\example.jar\src\main\java\com\giantgeek\Example.java:[1,1] illegal character: \187

C:\Sandbox\Jars\example.jar\src\main\java\com\giantgeek\Example.java:[1,2] illegal character: \191

Those character codes (\187 \191) may look a little familiar to some people, as they represent the Byte Order Mark (BOM) that prefixes a UTF-8 formatted file. If you look at them in a file editor (or text editor that doesn’t interpret UTF-8) they will look odd.

They look like “an i (two dots over), double right arrow, upside down question mark”.

Simple solution is to re-edit and save the file as ISO-8859-1.

An alternate approach that is available in some instances is to use the arguments to javac to allow the file encoding.

References:

Cheers!

Categories: Work Tags: , , ,