2011-07-20

How to trim NonBreakingSpace ( ) when parsing html

The 'non breaking space'-character will not be removed by the trim funktion in Java and Scala. In Java you can use regexp (see this blogpost) to do this, but Scala has a better way.

In Scala you can use higher order funktions to handle Strings, a String is an array of Char's and thus you can use the filterNot funktion.

scala > val ns = <div>1 2\u00A0 3&#160;</div>
ns: scala.xml.Elem = <div>1 2? 3?</div>

scala> ns.text.filterNot(_ == '\u00A0')
res1: String = 1 2 3