My colleague Lavan came to me with a problem at work, and it was the perfect use case for Power Query (also know as Get & Transform).
Every month he has to get some data from a system and use that in his month end finance work.
This part of the system has no export button. The only way to get the data is to highlight it on screen and copy and paste the data into an Excel sheet.
Instead of copying into a nice table like it’s displayed on screen it copies into a single vertical column. He then needs to turn this single column back into a table to continue the rest of his work.
In this post, we’ll use power query to create a transformation that turns a single column into a table.
The Single Column of Data
In this example, we’ll look at some addresses that are stacked in a single column.
Each record in the column consists of 5 rows of data: the name, street address, city, country and a postal code.
Our transformation will rely on the fact that each record set has exactly 5 rows. In this example, it’s 5, but this will work for any number, the key is that it’s a fixed constant for each record.
Turn the Single Column into an Excel Table
We are going to be using power query to transform this data later using a From Table/Range query. This will automatically turn our data into an Excel Table, but we might as well do this step first and name our table appropriately.
Select the data and press Ctrl + T to turn the data into a table. You can also create a table from the Insert tab with the Table command.
Our data has the column heading Contact Information, so check off the My table has headers box in the Create Table dialog box.
Go to the Table Tools Design tab and change the name to InputData.
Create a From Table/Range Query
Select your InputData table and go to the Data tab and press the From Table/Range query button. This will open the query editor with our single column table loaded.
Create an ID Column to Group Address Records
The goal here is to create a column with a grouped index that will increment by 1 every time there is a new address record in the data. Because our data always has exactly 5 rows per address, this means we want to create a column that increments by 1 every 5 rows.
The first step is to add an index column. Go to the Add Column tab of the query editor and select the Index command. Press on the small black arrow to the right to reveal more options and choose From 1. This will start the index column at 1 in the first row instead of the default of 0.
Now we can use the Number.Mod function to find the remainder of our index when we divide by 5 (the number of rows per record). Go to the Add Column tab and select the Custom Column command.
Enter the =Number.Mod([Index],5)
in the custom column formula editor and rename the new column to Remainder. This formula will return the remainder of the index column on division by 5. This will result in a repeating sequence of {1,2,3,4,0} for each of the address records in our data.
Notice that the first row of each record has a 1 in the remainder column.
Now let’s add a column that indicates the starting row of an address record. We can add another custom column and create the formula if [Remainder]=1 then 1 else 0
in the custom column formula editor and rename the new column to Indicator.
This will contain a 1 when the record is the start of an address and 0 everywhere else.
Now let’s add an ID column that will be the same for all rows related to a given address record.
We can add another custom column and create the formula List.Sum(List.Range(#"Added Custom1"[Indicator],0,[Index]))
in the formula editor and rename the new column to ID.
This creates a running total based on the indicator column which will be our grouped index number.
Pivot the Data Based on the Remainder and ID Column
Now we are ready to pivot our data. This is where we turn the single column of data into a proper 5 column table of data with one row per address record.
Hold the Ctrl key to select both the Index and Indicator columns. Then right click on either column and choose Remove Columns from the menu.
Now we can pivot our data based on the remainder column. Select the Remainder column and go to the Transform tab and press the Pivot Column command.
Now we can select our values to pivot as the addresses in the Contact Information column and in the advanced options select Don’t Aggregate as the aggregate value function. Press the Ok button and data should be a familiar table format.
Now we can remove the ID column as it’s done its job. We can also rename the columns as Name, Address, City, Country and Postal Code respectively then Close & Load the query into an Excel table.
Conclusions
Now we have a proper table of address data where each row contains one address and each part of the data is in its own column.
The great thing about power query is now that we have built this query, we can use it each time we need to copy the data from our system.
All we need to do is paste the new data into the InputData table and hit the Refresh button in the Data tab.
We now have an easily repeatable process to clean up our data every month.
The Power Query revolution! I’m taking Ken Pul’s Power Query Academy this year. It’s great! Thanks John for the post! By the way…what version of Excel are you using?
Thanks
Kevin Lehrbass
https://www.myspreadsheetlab.com/blog/
Yes, power query and power pivot is amazing and it’s all I want to write about recently. I’m learning power BI now.
Office 365 pro plus monthly update track.
ok thanks John. I’m using Excel 2016 (non Office 365) and I received a message that the query was built in a newer version of Excel but fortunately everything worked. I’m curious how to solve this if it’s not always the same number of rows per set.
I think you would need your original data to have an ID column and a attribute column.
Where the ID is the same for each related row and the attribute column identities if it’s a name, Street, city, county, postal code.
Then it’s just a matter of pivoting the data.
But without that, its just random data and you wouldn’t be able to guarantee the right things end up in the right columns.
Thank you for sharing your knowledge!!!
I just need to ask that how to do when no of transaction will vary.
for example!
there might be 6 headers and then no of transaction are sometimes 4 ,sometimes it will be 1, sometime it will be 2 or 5 or 8.
how we can tackle this type of situation. As per above scenario we can guest the modulus 5 but what when it will vary.. ?
Can you please look into the above scenario, which i have shared with you.
Thank you