Tuesday, April 03, 2007

PowerShell, Export-CsvCultureAware

I stole this joke in an entry http://blogs.msdn.com/powershell/archive/2007/01/06/cliff-notes-of-convertto-html-other-language-blogs.aspx by Jeffry Snover -

  • How do you describe someone that can speak multiple languages?
    Multi-lingual.
  • How do you describe someone that can speak two languages?
    Bi-lingual.
  • How do you describe someone that can speak one language?
    American

And this also goes for PowerShell as well!

To be fair, I have only found few problems: Export-Csv and Import-Csv are not culture-aware!

In Denmark, amongst many other cultures, comma is used between integers and decimals and dot is used to make integers more readable. Semi-colons are used in argument list like between Excel function arguments and when saving a CSV-file from Excel. The normal American 1,234.56 is 1.234,56 here and 1,2,3 becomes 1;2;3. What does this mean? This means that Excel is using these culture rules when parsing input and consequently, arguments in Excel functions and fields in CSV-files are expected to be separated with semi-colon.


So when I do –


PS> Something | export-csv x.csv; ii x.csv


Excel is not able to parse my CSV-file directly. I either have to
1) change the regional settings before starting Excel or
2) use the text-to-data wizard in Excel.


The worst part is that PowerShell knows the concept of cultures – as it is based on .Net – and has the info –



PS> (Get-Culture).textinfo | select listSeparator

ListSeparator
-------------
;

So it should have been culture-aware or at least provided a -seperator argument.

I also tried setting $OFS - the output field seperator, but that did not change the result.



This work-around works (wrapped for formatting reasons) -



PS> something | % {$h=@()} { if(!$h) {$_.psobject.properties | % {}
{$h+=$_.name} {[string]::join(";",$h)}}; $r=$_;$_.psobject.properties |
% {$f=@()} {$name=$_.name;$f+=$r.$name} {[string]::join(";",$f)} }

Dirty ;)
But as always, the quick-timers can be converted to a nice function. The function also includes quoting of all values, but does not handle values with quotes -



function Export-CsvCultureAware($file=$(throw "File must be specified")) {
# First, construct the header by picking up all the property names
$sep=((get-culture).textinfo | select listseparator).listSeparator
$sep='"'+$sep+'"'
$header=@()
$input | % {
if (!$header) {
$_.psobject.properties | `
% {} `
{$header+=$_.name} `
{'"'+[string]::join($sep,$header)+'"'} `
}
# Next, process all the data records/objects
$record=$_
$record.psobject.properties | `
% {$fields=@()} `
{$name=$_.name;$fields+=$record.$name} `
{'"'+[string]::join($sep,$fields)+'"'} `

} | out-file $file -encoding ascii
# Only ASCII files seems to be supported in Excel for CSV
}


PS. The blogger editor gives me a hard time formatting these code
examples, so a vertical bar or such may have slipped. Any suggestions
of how to improve editing are highly welcome. I currently use pre-tags
but if I switch to Compose view, strange things happens.

2 comments:

Per Østergaard said...

It seems like CodeHTMLer at http://puzzleware.net/codehtmler/default.aspx is the solution to my formatting problems. It will try that the next time.

/\/\o\/\/ said...

CodeHTMLer will work if you use convert whitespace option and ...do NOT touch it in the webinterface after posting !.

PSA will not work correctly (all one line as pasting)

After figuring that all out I still had the problem to not been able to change anything after posting, as blogger will mess up the formatting again (e.g. removing pipeline symbols etc.).

My final solution was moving my blog to CS 2.1

here everything does work and the HTML does not get changed by the editor.
Greetings /\/\o\/\/