Goal
To parse a Parquet file stored in Azure Blob Storage via an Azure Function, and return the result to Power Automate for use in a flow.
Assumptions
- This article assumes you have access to an instance of Power Automate
- Reader is familiar with Azure Functions and has the appropriate tooling installed (see: Creating a C# Azure Function)
- Reader is familiar with Azure Storage Accounts
- Reader as starting from a point of completing the above tutorial
Steps
In this post, I’ll show how to parse a Parquet file stored in Azure Blob Storage via an Azure Function written in C# and return this to Power Automate for use.
To make things simple, we’ve assumed we are in a position where we have just completed the tutorial linked in the Assumptions above.
The first thing we need to do is ensure we have our source of Parquet files in Azure Blob Storage. I have uploaded them as part of the repo here:
This tutorial will get you started creating an Azure Blob Storage Account:
Once you’ve set-up your files within the Storage Account, get the connection string via Access Keys down the left-hand side of the Azure Portal when viewing the Storage Account:
And then click Show Keys to view your connection string:
Make a note of this as you’ll need it later.
Head over to your Azure Function ou created as part of the tutorial in Assumptions.
Firstly, install the package in VS Code by adding the following to your .csproj file:
<PackageReference Include="Parquet.Net" Version="3.8.4" />
<PackageReference Include="Azure.Storage.Blobs" Version="12.7.0" />
Replace the HTTP Trigger Azure Function with the following:
NOTE: The below code assumes the Parquet files are stored in a container called files
public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)] HttpRequest req,
ILogger log)
{
log.LogInformation("C# HTTP trigger function processed a request.");
// Connect to Blob Storage using a connection string
// Assumes there is a container called 'files'
string connectionString = "<CONNECTION STRING>";
BlobContainerClient container = new BlobContainerClient(connectionString, "files");
// Initialise our list of objects
List<UserData> userDataList = new List<UserData>();
// Loop through each of the files in the container
foreach (var file in container.GetBlobs())
{
// Initialise a BlobClient so we can worj with the named file
var blockBlob = container.GetBlobClient(file.Name);
// Initialise a unique temporary path to hold the Parquet file
var tempPath = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString());
// Download the file to the temporary path
using (var fileStream = File.OpenWrite(tempPath + blockBlob.Name))
{
blockBlob.DownloadTo(fileStream);
}
// Open file stream
using (Stream fileStream = File.OpenRead(tempPath + blockBlob.Name))
{
// Open parquet file reader
using (var parquetReader = new ParquetReader(fileStream))
{
// Enumerate through row groups in this file
for (int i = 0; i < parquetReader.RowGroupCount; i++)
{
// Create row group reader
using (ParquetRowGroupReader groupReader = parquetReader.OpenRowGroupReader(i))
{
// Deserialize the rows into an array
UserData[] v1structures = ParquetConvert.Deserialize<UserData>(fileStream, i);
// Add each item of our array to our overall list
foreach (var row in v1structures)
{
userDataList.Add(row);
}
}
}
}
}
}
// Check we have rows
if (userDataList.Count > 0)
{
// Return the list as a JSON array
string json = JsonConvert.SerializeObject(userDataList.ToArray(), Formatting.Indented);
return new OkObjectResult(json);
}
else
{
return new BadRequestResult();
}
}
You can find the above in the repo for this blog:
Here is a link to the Nuget package for Parquet:
And the GitHub repo itself:
Once you have the above function working locally, deploy it to Azure and re-test.
Assuming all is well, navigate over to Power Automate and create a new flow with a manual trigger.
Next, add a new HTTP Request action and populate it as below:
Method: GET
URI: YOUR FUNCTION URL
Next, add a Parse JSON action and populate it as below:
Content: This is the Body content of the HTTP Request action
Schema: See Below…
{
"type": "array",
"items": {
"type": "object",
"properties": {
"registration_dttm": {
"type": [
"string",
"null"
]
},
"id": {
"type": [
"number",
"null"
]
},
"first_name": {
"type": [
"string",
"null"
]
},
"last_name": {
"type": [
"string",
"null"
]
},
"email": {
"type": [
"string",
"null"
]
},
"gender": {
"type": [
"string",
"null"
]
},
"ip_address": {
"type": "string"
},
"cc": {
"type": "string"
},
"country": {
"type": [
"string",
"null"
]
},
"birthdate": {
"type": [
"string",
"null"
]
},
"salary": {
"type": [
"number",
"null"
]
},
"title": {
"type": [
"string",
"null"
]
},
"comments": {
"type": [
"string",
"null"
]
}
},
"required": [
"registration_dttm",
"id",
"first_name",
"last_name",
"email",
"gender",
"ip_address",
"cc",
"country",
"birthdate",
"salary",
"title",
"comments"
]
}
}
Done – run your flow to test it works. You can now leverage the values from the Parquet file in your flow.