Skip to content

Conversation

@6v4tq8mhlO23a
Copy link

This fixes a performance issue in Import-Excel which occures with larger files. #1701

As scriptingstudio suggested in the issue using a list instead of an array is much more efficient. Especially the larger the file (The more Rows and Lines) the xlsx file has.

@dfinke
Copy link
Owner

dfinke commented May 25, 2025

Cool. Interesting. Does it run faster in the newer version of PowerShell where they optimized += on the array?

Also, did we chat about this by email? Do the test run locally?

@6v4tq8mhlO23a
Copy link
Author

6v4tq8mhlO23a commented May 26, 2025

Hi Doug,

this Ed. Yes we did chat about this by email. I was able get the test scripts running on my environment after fixing some minor issues that were caused by pester.

The results are obvious volatile but my tests showed a clear improvment regardless of the Version i tested it on:

I did the ran the tests a couple of times and took the average. For the test I just used Import-Excel and saved the result into an variable:

foreach($file in $files) {
    $time = Measure-Command {
                $data = Import-Excel -Path $file.Fullname
            }
    Write-Host $file.Name
    Write-Host "Seconds: $($time.TotalSeconds)"
}

With the Fix:

| rows: 	  | 3 		 | 3 	        | 3 		| 3 		| 3 		|
| cols: 	  | 10 	         | 100          | 1000 	        | 10000 	| 100000 	| Version
| Time in Seconds | 0,01 Seconds | 0,01 Seconds	| 0,04 Seconds	| 0,17 Seconds	| 1,73 Seconds	| PSVersion: 7.5.1
| Time in Seconds | 0,01 Seconds | 0,02 Seconds	| 0,05 Seconds	| 0,17 Seconds	| 1,75 Seconds	| PSVersion: 7.4.1
| Time in Seconds | 0,01 Seconds | 0,02 Seconds	| 0,05 Seconds	| 0,27 Seconds	| 2,07 Seconds	| PSVersion: 5.1.19041

Without the Fix:

| rows: 	  | 3 	         | 3 	        | 3            | 3            | 3              |
| cols: 	  | 10           | 100 	        | 1000         | 10000        | 100000         | Version
| Time in Seconds | 0,01 Seconds | 0,01 Seconds | 0,06 Seconds | 0,28 Seconds | 20,04 Seconds  | PSVersion: 7.5.1
| Time in Seconds | 0,01 Seconds | 0,02 Seconds | 0,05 Seconds | 1,16 Seconds | 110 Seconds    | PSVersion: 7.4.1
| Time in Seconds | 0,02 Seconds | 0,01 Seconds | 0,08 Seconds | 2,2 Seconds  | 300 Seconds    | PSVersion: 5.1.19041

Note: I used Powershell 7.4.1 because this version is still a Powershell Core Version but did not have the optimizations for += on arrays.

Hardware I tested this on:
CPU: AMD Ryzen 5 5600X 6-Core 3.70 GHz
RAM: 16 GB

Let me know if you need something else or if anything is unclear.

@dfinke dfinke requested a review from Copilot May 27, 2025 13:03
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a performance issue in Import-Excel by replacing inefficient array usage with a more performant list-based implementation.

  • Replaced array initialization with instantiation of a generic List for each worksheet.
  • Replaced array concatenation with the List.Add() method for adding rows.

@dfinke dfinke requested a review from Copilot May 27, 2025 13:14
@dfinke dfinke assigned dfinke and unassigned 6v4tq8mhlO23a May 27, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a performance issue in Import-Excel by replacing inefficient array operations with a more efficient list-based approach.

  • Replaces array initialization with a generic List for storing sheet data.
  • Uses the List.Add() method instead of array concatenation for improved handling of large datasets.


$targetSheetname = $sheet.Name
$xlBook["$targetSheetname"] = @()
$xlBook["$targetSheetname"] = [System.Collections.Generic.List[object]]::new()
Copy link

Copilot AI May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider preallocating the list capacity if the expected number of rows can be estimated to further optimize performance and reduce potential reallocations.

Suggested change
$xlBook["$targetSheetname"] = [System.Collections.Generic.List[object]]::new()
$rowCount = $sheet.Dimension.End.Row - $StartRow + 1
$xlBook["$targetSheetname"] = [System.Collections.Generic.List[object]]::new($rowCount)

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@scriptingstudio scriptingstudio May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good suggestion. Never thought of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made some tests using preallocating. no difference

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a thought, back in the Visual Basic days, before classes. When you did a ReDim on an already allocated array, we found that it was better to allocate 50% more than the size of the array as you went. Size 10, use it the ReDim to 15 not 11. Then 22, 33 and so on. Very effective.

No clue if that can play here.

@stale
Copy link

stale bot commented Jun 27, 2025

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jun 27, 2025
@6v4tq8mhlO23a 6v4tq8mhlO23a requested a review from dfinke June 28, 2025 19:53
@stale stale bot removed the wontfix label Jun 28, 2025
@dfinke dfinke closed this Jun 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants