Search This Blog

Saturday, July 09, 2016

AWS: Snowball fight... Part II :o) - Multiple Parallel copies

So I have previously written an article about the basic commands and process needed to copy to the AWS snowball devices.

In this article I provide a script I used to get round an issue with corrupt files in the source location when trying to copying to the snowball device.

We have a large SAN system that was going to be transferred to AWS via the snowball device. This SAN had been running for years.

Initially I just tried to copy the entire root folder (recursively), however I soon discovered that the snowball copy process, prior to actually performing the copy it will scan and analysis the entire folder structure. If it encounters an error, the whole process is brought to a halt. Now initially I tried to fix the offending file issues, which turned out to be spurious characters in the file names ( like trailing spaces). However the scan would take hours to run, only to fall over each time.

Now the folders within the root folder were organised by client, so I decided we should try the copy a client folder at a time. In addition I was hoping to run multiple copies at the same time.

So I created the script below, it will only run ten copies at a time and then it will only attempt to run the copy on the client folder once (it looks for a pre-existing log file). By doing this I could run 10 copies in parallel, and also ensure I could run through the folder structure once. Then I could work out which client folders failed and then attack them individually.

The script assumes the initial security setup to the snowball has been performed, please see my previous article for my details.

##This script assumes that the connection to the snowball device has been established previously

function Get-ScriptDirectory
    #Determine the folder in which the script lives.
    $Invocation = (Get-Variable MyInvocation -Scope 1).Value
    Split-Path $Invocation.MyCommand.Path

$scriptPath = Get-ScriptDirectory

[String]$scriptCurrentDateTime = Get-Date -format "yyyyMMddHHmmss";
[String]$computerName = $env:computername;
[string]$sourceFolder = '\\sourceserver\subfolder1\subfolder2';

#remember to stop-transcript (last command in script).
start-transcript -path $scriptPath\psOutput_$scriptCurrentDateTime.log -noclobber

#Set the amount of jobs to run in parallel
[int]$maxRunningJobs = 10;

ForEach ($item in (Get-ChildItem -Path $sourceFolder | ?{ $_.PSIsContainer })) 
    $running = @(Get-Job | Where-Object { $_.State -eq 'Running' });
    [string]$logLocation = "1>`"$scriptPath\" + $item.Name + ".log`" 2>&1"
    [string]$logLocationPath = "$scriptPath\" + $item.Name + ".log"
    #check if 10 jobs already running, and if a log file has already been created, 
    if ($running.Count -le $maxRunningJobs -and -not(Test-Path ($logLocationPath)))
        [string]$destinationFolder = 's3://awsSnowballJobName/subfolder1/subfolder2/' + $item.Name;
        $debugblock = {
            [string]$snowballProgram = 'C:\Program Files (x86)\SnowballClient\bin\snowball.bat';
            $commandToRun = "$snowballProgram";
            $commandToRun = "`"$commandToRun`" $($args[0]) $($args[1]) $($args[2]) $($args[3]) $($args[4])";
            $commandToRun | Add-Content -Path 'e:\test.txt';
            Invoke-Expression "& $commandToRun";
            start-job -ScriptBlock $debugblock -ArgumentList "cp","--recursive",$item.FullName,$destinationFolder,"$logLocation";
            $MyError = $_
            Throw $MyError

#Clear Up completed jobs
Get-Job | Where-Object { $_.State -eq 'Completed' } | Remove-Job

#stop the transcript

        #$CRLF added because usual `r`n in string doesnot work within trap.
        [string]$CRLF = [char]13 + [char]10
        $script:errorMessage += 'Error: {0}' -f $_.Exception.Message + $CRLF;


No comments:

Post a Comment