Monday, November 11, 2019

Power Query and The Curious Case of Keep Top Rows

The Power Query Editor in Power BI has many transformations that can be set from the UI. Keep Top Rows is one of them. I'd like to take you on my journey through this curious case I ran into a couple of weeks back when preparing for my session on Query Folding in Power BI.

Transformations in Power Query

Problem

I was preparing my demo's for my session at dataMinds Connect on Query Folding in Power BI (slides). If you're not familiar with query folding: transformations in Power Query are transformed into the native (data source) language and executed at the source for better performance. For a more thorough overview have a look at the post How Query Folding Works from Matt Allington (B | @ExceleratorBI).

If we look at SQL Server, some transformations can be folded (like Sorting, a simple Group By and Keep Top Rows), others cannot (merging two columns from the UI, using the Buffer() M-function and Keep Top Rows). See what I did there? :) Keep Top Rows on its own is a foldable transformation, it is (obviously) transformed into the TOP operator in SQL.
However, when using Keep Top Rows in combination with some other transformations it (looks like it) is not folding anymore. Your best chance is to use the View Native Query option in Power Query SQL Server Profiler to check if the query is folding or not. More on that later :)
A bit of contemplationI do think the real use case of the Keep Top Rows transformation might be mostly in development environments, where you can limit the number of rows to import for example. I still think the findings were rather odd and worth sharing. So, let's have a look at how it works.

Setup

I used import mode for the Fact.Movement table from the WideWorldImportersDW database. I then added the Keep Top (100) Rows transformation in the Power Query Editor.
Keep Top Rows


As you can see it folds to SQL and all is good for now.

View Native Query with TOP 100






























Adding transformations


Let's add a transformation on the Quantity column and apply an absolute value to it.
Add Absolute value of Quantity

So now the option to View Native Query is gone. Let's see if Power Query is speaking the truth (spoiler: No) 😀
I started SQL Server Profiler and captured the queries running against the database after I Close and Apply my changes. As you can see, now both the TOP operator and ABS function are folded.
SQL Server Profiler: TOP 100 and ABS

Now let's try something different and add a filter on the Date Key. Something strange has happened, because I already filtered the first 100 rows, so only 2 dates remained after that. But when I click on the Calculated Absolute Value query step I suddenly have a lot more dates that shouldn't be there in my opinion.
Filter on Date Key


Let's add a field from the Customer dimension this time and see what happens.
Expand Customer Dimension

The query for Fact.Movement is still the same, so the TOP operator and ABS function are still folding. And this is what is sent to the database for the Customer dimension:
SQL Server Profiler: Customer Dimension with WHERE clause

The Curious Case

Now watch closely to the where clause: so it actually already filters out the Customer Key based on the steps performed earlier in the Query Editor. But this Customer dimension is a separate query that returns 1 row, which will be joined to the other query inside Power Query.
Back to that WHERE clause: that Customer Key = 0 is valid for my (broken) Calculated Absolute Value query step, not for the Kept First 100 Rows as you see in the picture above. So I actually get the wrong a different result back into Power Query than I would expect.
I see you're thinking: "But you didn't add an explicit order, right?" So the top 100 is a little arbitrary. That's true, so let's add that order by.
Add Sorted Rows

This is a slightly better result in my opinion. Because now the Power Query steps are in sync with each other. The issue with View Native Query still remains the same by the way. It stops working at the Calculated Absolute Value query step, but the query send to the database is the following, so including the ABS function:

View Native Query after Sort

Recap

I showed several variations in the Power Query Editor of using Keep Top Rows and showed that adding subsequent steps brakes the View Native Query option in the editor, although query folding sometimes does take place. And adding an order to your query before applying a TOP operator is always a good idea, not only in SQL.

The steps I used might not all be best practices when creating a data model in Power BI. In case of the join to the customer dimension for example, it might be better to model it as a separate dimension, because STAR SCHEMA ALL THE THINGS. But I used those steps to highlight the problem.

I already mentioned this 
bug feature to Patrick and Adam (members of the Power BI CAT team) from Guy in a Cube (B | @GuyInACube) when I was at the Precon at Data Saturday Holland. Patrick LeBlanc (B | @patrickdba) also hadn't seen this before and hopeully will take it up with one of his team members. I thought I'd make a write up for him to easily reproduce it.

No comments:

Post a Comment