How to Deal with Changing Data Formats in Power Query

Power Query is an amazing tool since it allows you to set up a query once and then refresh it whenever you get new data files.

But what if your data format keeps changing? Then you either need to change the query to adapt to the new format or change the source data files before they get consumed by power query. Both options aren’t ideal since the whole point of using power query is to automate the whole data transformation process.

I recently came across this in my work, in order to get things done I changed the data each time at first. I obviously wanted to come up with a solution that would not require me changing data each time.

The Changing Data Format I encountered

The problem I encountered was two-fold.

I had some columns that were sometimes not in the data. Some data would contain only Product ID 1, some data would contain Product ID 1 and Product ID 2 and other data would contain Product ID 1, Product ID 2 and Product ID 3. I needed to refer to the last of these values in a custom column and had tried using a try otherwise to reference a field that might not be there.

I found this slowed down my queries too much to be effective, so I needed to create the column if it didn’t exist.

The second bit of changing format was the actual column heading value for one of the columns. Sometimes it would appear as Revenue ($), Revenue (EUR), Revenue (USD) or any other unknown variations.

Importing the Source Data

The first step is to import the source data based on one of the sample CSV files. For this we need to go to the Data tab and use the From Text/CSV command found in the Get & Transform Data section.

Navigate to the location of the data files and select any of the files to import.

Choose the Edit option after selecting your sample file to import.

Excel will automatically create some basic steps for the Power Query data import, but we will need to review and adjust where necessary to make sure they will work for all the possible data formats we are dealing with.

= Csv.Document(File.Contents("C:\Users\John\Google Drive - Excel\Excel Website\Get & Transform\How to Deal with Changing Data Formats in Power Query\Report Sample 1.csv"),[Delimiter=",", Columns=6, Encoding=1252, QuoteStyle=QuoteStyle.None])

Our source data will have a maximum of 6 columns in it but may have less. If the file you import as a sample has less, you will need to edit the Source step of the query. It should have Columns=N where N is the maximum number of columns. This will ensure the query successfully imports all the data columns from our CSV files.

We can change this step by typing over the number in the formula bar and then pressing Enter to confirm the change.

When there are fewer columns in the data than the 6 specified in the Source step, we will import extra columns with column headings Column 1, Column 2 etc… We will deal with these extra columns by removing them in a later step.

= Table.PromoteHeaders(Source, [PromoteAllScalars=true])

After the Source step of the query, Excel will create a Promote Headers step since the first row in our files contains column headings. Looking at the M code that is produced for this step, we can see it doesn’t reference any columns and we will want to promote the first row for all our files, so it’s ok to keep.

Excel will also try to guess the data type of each row during the import and changes the types accordingly. Looking at the M code for the Changed Type step which was created, we see that it has referenced the Product ID‘s and Revenue ($) columns to change their data types. Since these columns may not be present in our data, we need to delete this step in the query as it will produce errors if the columns don’t exist in the data.

Delete the step with a left click on the X to the left hand side of the step name in the Applied Steps area.

We now only have a Source step and Promote Headers step in the query. Change the query name to Source Data and then Close & Load To a connection only query.

This imports our files in the most generic way possible and we will be using this query to reference the data going forward.

Getting the Changing Column Name

We will also need to create a query that produces the name of the revenue column heading that changes.

To do this we will reference the Source Data query and transform the data to a single value output of the column heading. Go to the Queries & Connections pane and right click on the Source Data query and select Reference from the menu.

This will create a new query whose source is the Source Data query. Rename the query to ColumnToRename, we will refer to it later when we rename our revenue column heading. The goal here is to create a query that produces a single value of the revenue column heading regardless of what it’s called.

We need to do a few steps on this data to get to our single value.

First, we will get rid of all the data in the query since we are only interested in the column headings. To do this we can go to the Home tab and select Keep Rows then select Keep Top Rows.

In the input box to Specify how many rows to keep, we can enter 0 and then press the OK button. This means we will keep none of the data.

Now that we’re only left the column headers and no data, we can go demote the headers to the first row of data. Go to the Home tab and left click on the small downward arrow icon next to the Use First Row as Headers command then choose the Use Headers as First Row option.

This will create one row of data with the text that was the column names. Our new column names will be generic, but we don’t need to bother renaming them.

Now we can transpose this data from a row to a single column. Go to the Transform tab and press the Transpose command. Now we can rename our single column heading to something more meaningful like Column Headings.

We should now have something like this in our data preview area. A single column of data that contains all the column headings found in the CSV data file.

Now we only care about the revenue column that keeps changing. Fortunately, it always starts with Revenue so we can filter on any text that starts with Revenue to isolate this column heading name.

Left click on the filter icon and choose the Text Filters from the menu, then choose the Begins With filter.

Now we can filter on Revenue and we should be left with one row which contains our revenue column heading name.

Right now it’s a table with one column that has one row of data. We need to convert it from a table to a value. Right click on the value in the data preview area and select Drill Down.

let
    Source = #"Source Data",
    #"Kept First Rows" = Table.FirstN(Source,0),
    #"Demoted Headers" = Table.DemoteHeaders(#"Kept First Rows"),
    #"Transposed Table" = Table.Transpose(#"Demoted Headers"),
    #"Renamed Columns" = Table.RenameColumns(#"Transposed Table",{{"Column1", "Column Headings"}}),
    #"Filtered Rows" = Table.SelectRows(#"Renamed Columns", each Text.StartsWith([Column Headings], "Revenue")),
    #"Column Headings" = #"Filtered Rows"{0}[Column Headings]
in
    #"Column Headings"

The M code should look something like the above. Notice that the filtering step was essential to ensure we always have the revenue item in the first row and we can reference the drill down on the first row.

Now we can close and load this as a connection only type query.

Understanding How Append Works When Queries Have Different Columns

Let's take a look at simple example of appending two tables together to see what happens.

In our example, we have TableA and TableB. Each table has a common column that is labeled with Common as the column header. And each table has a column that's not in the other table (Table A Only and Table B Only column headings).

When we append these two tables together, we see that the common column gets appended properly as expected. The two other columns also appear in the results from appending but notice they contain null items where there was no data.

We are going to use this to ensure our data has all the needed columns by appending it to a template of column headings. When our data is missing a column, the append will contain the missing column and it will just contain null values.

Creating Column Headings to Append our Data Onto

Now we're going to create a query that contains all the possible column headings that our data might have. We can then append our source data to this and any missing columns will just be appended as null/blank data.

We can do this by creating a list in Excel of all the possible column headings that our data might contain. Notice that I haven't used Revenue ($) in the last row, I have instead used the column heading I want to appear for all my data. We will change this revenue heading in our data before appending.

The data might contain up to 3 product ID's and will have some form of a revenue column heading so our list should include Trans Date, Product ID 1, Product ID 2 and Product ID 3, Quantity and Revenue as potential column headings.

We can then use a From Table/Range query to pull this list into the Power Query editor.

We can then go to the Transform tab and use the Transpose command to turn the single column of data into a row.

Now we can promote this row of data to headers using the Use First Row as Headers command found in the Home tab. Now we have a query that contains all the columns that are possibly in our data files and has no data.

Let's rename this query to StandardHeadings. We will eventually append our source data to this query.

let
    Source = #"Source Data",
    #"Kept First Rows" = Table.FirstN(Source,0),
    #"Demoted Headers" = Table.DemoteHeaders(#"Kept First Rows"),
    #"Transposed Table" = Table.Transpose(#"Demoted Headers"),
    #"Renamed Columns" = Table.RenameColumns(#"Transposed Table",{{"Column1", "Column Headings"}}),
    #"Filtered Rows" = Table.SelectRows(#"Renamed Columns", each Text.StartsWith([Column Headings], "Revenue")),
    #"Column Headings" = #"Filtered Rows"{0}[Column Headings]
in
    #"Column Headings"

Your M code will look something like the above.

When you close and load (as a connection only), it will appear like this in the Queries & Connections pane with an ABC icon to the left instead of the usual table icon. This means it's a single value.

We can now reference this single value in other queries.

Changing the Column Heading Using Our Single Value Query

Now we are ready to change the revenue column heading in our data in a dynamic way by using our single value query which contains the value of the column heading we want to rename.

We can create a new query that references the Source Data query by right clicking on it from the Queries & Connections window pane and selecting the Reference option.

Now we can double left click on the Revenue ($) column heading to rename it to just show Revenue.

= Table.RenameColumns(Source,{{"Revenue ($)", "Revenue"}})

The M code will look something like the above. Notice that the Revenue ($) is a hardcoded static value. If this changes because it's named differently in data, then there will be an error and the query won't work.

We need to edit the M code a bit and replace "Revenue ($)" including the quotes to ColumnToRename which was the name of our single value query that contained the value of the column heading.

= Table.RenameColumns(Source,{{ColumnToRename, "Revenue"}})

After editing, the M code should look like above. Now our query is dynamic and we can close and load as a connection since this is still an intermediate step in the process.

Appending Data to the Template

Now we are ready for the last piece of the puzzle and we can append our data to the template query containing the full list of column headings.

In the Data tab, go to the Get Data command then navigate to Combine Queries and then Append.

Now append the Data with Renamed Columns query to the StandardHeadings query and hit the Edit button.

We might get extra columns here, so we need to edit the query. Select all the columns that should be in the query (select the first column then hold Shift and select the last column). Then right click on any of the selected columns and choose Remove Other Columns.

let
    Source = Table.Combine({StandardHeadings, #"Data with Renamed Column"}),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Trans Date", type date}, {"Product ID 1", type text}, {"Product ID 2", type text}, {"Product ID 3", type text}, {"Quantity", Int64.Type}, {"Revenue", Currency.Type}}),
    #"Removed Other Columns" = Table.SelectColumns(#"Changed Type",{"Trans Date", "Product ID 1", "Product ID 2", "Product ID 3", "Quantity", "Revenue"})
in
    #"Removed Other Columns"

The M code should look something like the above. We keep the columns appearing in our template list and remove the others. We now have a query that will work on all our files regardless of how many product ID columns there are and what is in the revenue column header.

Conclusions

If your data is changing too much, it might not be possible to generalize and adapt the queries to the change. Some sort of consistency is needed.

In my case, I was able to rely on the changing column heading to always start with Revenue. If this was changing from Revenue ($) to Amount USD to

We were also able to leverage the append query to do something it wasn't necessarily intended to do and this worked because we knew all the possible columns that we might encounter in the data. If there are always new columns coming in, then this wouldn't work.

You will need to find the consistencies in the data and use those as the anchors to adapt your queries.

Good luck and I would love to hear about any solutions you've come up with.

About the Author

John MacDougall

John MacDougall

John is a Microsoft MVP and qualified actuary with over 15 years of experience. He has worked in a variety of industries, including insurance, ad tech, and most recently Power Platform consulting. He is a keen problem solver and has a passion for using technology to make businesses more efficient.

Related Posts

Comments

4 Comments

  1. Paul

    Hi,

    great article thank you !

    I sometimes have this problems but I guess the columnd SHOULD always have the same name…

    By the way, I think if you omit the “columns=N” parameter in the Csv.Document function, Power Query automatically guesses the number of columns.

    Have a great day !

    Paul.

    • John

      Thanks Paul, I’ll try that out.

  2. p45cal

    regarding Getting the Changing column Name

    Perhaps you’re sticking to the user interface, perhaps a new function has appeared since you wrote, but another way to get that header:

    let
    Source = #”Source Data”,
    ColumnHeading = List.Select(Table.ColumnNames(Source),each Text.StartsWith(_,”Revenue”)){0}
    in
    ColumnHeading

    • John

      Yes, Table.ColumnNames is a great function! Going beyond the UI commands can definitely be much more efficient.

Get the Latest Microsoft Excel Tips

Follow Us

Follow us to stay up to date with the latest in Microsoft Excel!

Subscribe for awesome Microsoft Excel videos 😃