Goal

To parse a Parquet file stored in Azure Blob Storage via an Azure Function, and return the result to Power Automate for use in a flow.

Assumptions

  • This article assumes you have access to an instance of Power Automate
  • Reader is familiar with Azure Functions and has the appropriate tooling installed (see: Creating a C# Azure Function)
  • Reader is familiar with Azure Storage Accounts
  • Reader as starting from a point of completing the above tutorial

Steps

In this post, I’ll show how to parse a Parquet file stored in Azure Blob Storage via an Azure Function written in C# and return this to Power Automate for use.

To make things simple, we’ve assumed we are in a position where we have just completed the tutorial linked in the Assumptions above.

The first thing we need to do is ensure we have our source of Parquet files in Azure Blob Storage. I have uploaded them as part of the repo here:

Parquet Files

This tutorial will get you started creating an Azure Blob Storage Account:

Blob Storage Tutorial

Once you’ve set-up your files within the Storage Account, get the connection string via Access Keys down the left-hand side of the Azure Portal when viewing the Storage Account:

Try

And then click Show Keys to view your connection string:

Try

Make a note of this as you’ll need it later.

Head over to your Azure Function ou created as part of the tutorial in Assumptions.

Firstly, install the package in VS Code by adding the following to your .csproj file:

<PackageReference Include="Parquet.Net" Version="3.8.4" />
<PackageReference Include="Azure.Storage.Blobs" Version="12.7.0" />

Replace the HTTP Trigger Azure Function with the following:

NOTE: The below code assumes the Parquet files are stored in a container called files

public static async Task<IActionResult> Run(
            [HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)] HttpRequest req,
            ILogger log)
        {
            log.LogInformation("C# HTTP trigger function processed a request.");

            // Connect to Blob Storage using a connection string
            // Assumes there is a container called 'files'
            string connectionString = "<CONNECTION STRING>";
            BlobContainerClient container = new BlobContainerClient(connectionString, "files");

            // Initialise our list of objects
            List<UserData> userDataList = new List<UserData>();

            // Loop through each of the files in the container
            foreach (var file in container.GetBlobs())
            {

                // Initialise a BlobClient so we can worj with the named file
                var blockBlob = container.GetBlobClient(file.Name);

                // Initialise a unique temporary path to hold the Parquet file
                var tempPath = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString());

                // Download the file to the temporary path
                using (var fileStream = File.OpenWrite(tempPath + blockBlob.Name))
                {
                    blockBlob.DownloadTo(fileStream);
                }

                // Open file stream
                using (Stream fileStream = File.OpenRead(tempPath + blockBlob.Name))
                {
                    // Open parquet file reader
                    using (var parquetReader = new ParquetReader(fileStream))
                    {

                        // Enumerate through row groups in this file
                        for (int i = 0; i < parquetReader.RowGroupCount; i++)
                        {
                            // Create row group reader
                            using (ParquetRowGroupReader groupReader = parquetReader.OpenRowGroupReader(i))
                            {

                                // Deserialize the rows into an array
                                UserData[] v1structures = ParquetConvert.Deserialize<UserData>(fileStream, i);

                                // Add each item of our array to our overall list
                                foreach (var row in v1structures)
                                {
                                    userDataList.Add(row);
                                }
                            }
                        }
                    }
                }

            }

            // Check we have rows
            if (userDataList.Count > 0)
            {

                // Return the list as a JSON array
                string json = JsonConvert.SerializeObject(userDataList.ToArray(), Formatting.Indented);

                return new OkObjectResult(json);
            }
            else
            {

                return new BadRequestResult();
            }        
        }

You can find the above in the repo for this blog:

GitHub

Here is a link to the Nuget package for Parquet:

Parquet.Net

And the GitHub repo itself:

GitHub Repo

Once you have the above function working locally, deploy it to Azure and re-test.

Assuming all is well, navigate over to Power Automate and create a new flow with a manual trigger.

Next, add a new HTTP Request action and populate it as below:

HTTP Request

Method: GET

URI: YOUR FUNCTION URL

Next, add a Parse JSON action and populate it as below:

Parse JSON

Content: This is the Body content of the HTTP Request action

Schema: See Below…

{
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "registration_dttm": {
                "type": [
                    "string",
                    "null"
                ]
            },
            "id": {
                "type": [
                    "number",
                    "null"
                ]
            },
            "first_name": {
                "type": [
                    "string",
                    "null"
                ]
            },
            "last_name": {
                "type": [
                    "string",
                    "null"
                ]
            },
            "email": {
                "type": [
                    "string",
                    "null"
                ]
            },
            "gender": {
                "type": [
                    "string",
                    "null"
                ]
            },
            "ip_address": {
                "type": "string"
            },
            "cc": {
                "type": "string"
            },
            "country": {
                "type": [
                    "string",
                    "null"
                ]
            },
            "birthdate": {
                "type": [
                    "string",
                    "null"
                ]
            },
            "salary": {
                "type": [
                    "number",
                    "null"
                ]
            },
            "title": {
                "type": [
                    "string",
                    "null"
                ]
            },
            "comments": {
                "type": [
                    "string",
                    "null"
                ]
            }
        },
        "required": [
            "registration_dttm",
            "id",
            "first_name",
            "last_name",
            "email",
            "gender",
            "ip_address",
            "cc",
            "country",
            "birthdate",
            "salary",
            "title",
            "comments"
        ]
    }
}

Done – run your flow to test it works. You can now leverage the values from the Parquet file in your flow.

©2021 TDG - We Are Back Baby!

CONTACT US

We're not around right now. But you can send us an email and we'll get back to you, asap.

Sending
or

Log in with your credentials

or    

Forgot your details?

or

Create Account