Concerning Excel PowerPivot, the question is often asked how is it possible for Excel to now handle tens or hundreds of millions of rows – the way native Excel handles tens or hundreds of thousands of rows.
How you ask?
Well, the reason why is because Excel (with the PowerPivot add-in) is using the new In-Memory BI (IMBI) database. It utilizes a column-based store which has been seen academically and empirically to have some interesting characteristics within BI scenarios…namely, it can calculate many (but not all) BI aggregate very quickly. Because it is in-memory, it does not utilize disk I/O and performs all of its scans in-memory – i.e. much faster because you do not have disk contention. As well, it has a great compression algorithm in order to significantly reduce the size of the overall data stored. By the way, this new database architecture is called VertiPaq for all those interested in the trivia!
So while Vertipaq is really cool, what does this have to do with Excel PowerPivot? Well, the reason is because the PowerPivot database is actually in the workbook itself. To prove it, here’s what you can do:
Take your Excel PowerPivot workbook, which you’ll notice is typically much larger than your typical native Excel workbooks
Copy the Excel file and rename it using a .zip extension instead of a .xlsx. Note, the reason why is because .xlsx is actually XML
When you open up the .zip file, you’ll notice a lot of various folders within it (as noted, it’s all XML).
Open up the xl folder, and then you’ll notice the customData folder. When you navigate to it, you’ll notice the item1.data file which contains the bulk of the size of the Excel file
This item1.data file is your PowerPivot database stored right IN the Excel file.