Monday, June 14, 2010

Get-Line and Got-NewJob

Hi. It has been a while as I have been busy with my new job. I’m now working LEGO as Senior Solution Architect for www.lego.com. Visit a great web site and any feedback is welcome.

Consequently, I’m not using PowerShell as much as I used – to at least not for so complicated solution. On the other hand I’m using it almost daily as is it so useful for test web related things.

But my new colleagues know that I know PowerShell so today Michael asked about how to get a few lines from a text file. This is easy, but if it was easy, Michael would have figured it out himself. The problem is that the file is huge (E.G. 1.5GB), so using Get-Content with Select-Object or similar would be very memory intensive and thus slow. I said that I would either call .Net directly or embed some C# code in a script.

Well, now it is evening and I’m watching the Italian-Paraguay match and – hey – why not do a little blogging while and again!

As thought, so done. Here’s my solution. It took as little more than an hour including help and validation. For newcomers, use Get-Line –? for displaying the help nicely formatted.

The script -

<#
.Synopsis
Get line or lines from a text file
.Description
Get one of more lines from the specified file. Line numbers are positive and the first line is number 1.
.Inputs
Path
.Outputs
Array of strings
.Example
Get-Line $env:temp\lines.txt 23,897,45
Get lines 23, 45 and 897. Lines are returned in increasing order. E.g. line 23 is returned first, then line 45 and finally, line 897

#>
param(
[parameter(Mandatory=$true)]
[alias("file")]
[alias("fullname")]
[alias("name")]
[string]
[ValidateScript({Test-Path $_})]
# The path of the file to get the lines from
$path,
[Parameter(Mandatory=$true)]
[alias("numbers")]
[int64[]]
[ValidateScript({$_ -gt 0})]
# One or more line numbers (e.g. first line is 1) to retrieve from the file
$lines
)

add-type -TypeDefinition @'
using System;
using System.Collections.Generic;
using System.IO;

namespace Per
{
public class FileFunctions
{
public List<string> GetLines(string file, System.Collections.Generic.Queue<long> lines)
{
FileStream fs;
StreamReader sr;
List<string> linesFound = new List<string>();
using (fs = new System.IO.FileStream(file, FileMode.Open, FileAccess.Read, FileShare.Read, 10 * 1024 * 1024))
{
using (sr = new System.IO.StreamReader(fs))
{
int lineCounter = 0;
int linesIndex = 0;
if (lines.Count == 0)
{
return new List<string>();
}
long findLine = lines.Dequeue();
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
lineCounter++;
if (lineCounter == findLine)
{
linesFound.Add(line);
linesIndex++;
if (lines.Count == 0)
{
break;
}
findLine = lines.Dequeue();

}

}
}
sr.Close();
}
fs.Close();

return linesFound;
}
}
}

'
@
$c=new-object Per.FileFunctions
[int64[]]$sortedLines=$lines | sort -unique | where {$_ -gt 0}
$c.GetLines((Resolve-Path $path),$sortedLines)





As always: Have fun!

3 comments:

Andy Tearle said...

Thanx Per.
Any idea why, when Get-Line works like a dream in PowerShell,
it fails in the ISE with " Exception calling "GetLines" with "2" argument(s): "The given path's format is not supported." " at ...
$c.GetLines((Resolve-Path $path),$sortedLines)
Regards
Andy

msgoodies said...

Could you provide me with an example of how you are calling it? It works fine for me. Per.

Hugues BOUTET said...

There also is the -TotalCount parameter that is usefull for reading huge logs in some case, or the beginning of large binary files .