PYTHON HELP

fastparquet Python (How It Works For Developers)

fastparquet is a Python library designed to handle the Parquet file format, which is commonly used in big data workflows. It integrates well with other Python-based data processing tools like Dask and Pandas. Let’s explore its features and see some code examples. Later in this article, we will also learn about IronPDF, a PDF generation library from Iron Software.

Overview of fastparquet

fastparquet is efficient and supports a wide range of Parquet features. Some of its key features include:

Reading and Writing Parquet Files

Easily read from and write to Parquet files and other data files.

Integration with Pandas and Dask

Seamlessly work with Pandas DataFrames and Dask for parallel processing.

Compression Support

Supports various compression algorithms like gzip, snappy, brotli, lz4, and zstandard in data files.

Efficient Storage

Optimized for both storage and retrieval of large datasets or data files using parquet columnar file format and metadata file pointing to file.

Installation

You can install fastparquet using pip:

pip install fastparquet
pip install fastparquet
SHELL

Or using conda:

conda install -c conda-forge fastparquet
conda install -c conda-forge fastparquet
SHELL

Basic Usage

Here’s a simple example to get you started with fastparquet.

Writing a Parquet File

You can write a Pandas DataFrame to a Parquet file:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['New York', 'Los Angeles', 'Chicago']
})

# Write the DataFrame to a Parquet file
df.to_parquet('example.parquet', engine='fastparquet')

# Display confirmation message
print("DataFrame successfully written to 'example.parquet'.")
import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['New York', 'Los Angeles', 'Chicago']
})

# Write the DataFrame to a Parquet file
df.to_parquet('example.parquet', engine='fastparquet')

# Display confirmation message
print("DataFrame successfully written to 'example.parquet'.")
PYTHON

Output

fastparquet Python (How It Works For Developers): Figure 1 - Console output

Reading a Parquet File

You can read a Parquet file into a Pandas DataFrame:

import pandas as pd

# Read a Parquet file
df = pd.read_parquet('example.parquet', engine='fastparquet')

# Display the DataFrame
print(df.head())
import pandas as pd

# Read a Parquet file
df = pd.read_parquet('example.parquet', engine='fastparquet')

# Display the DataFrame
print(df.head())
PYTHON

Output

fastparquet Python (How It Works For Developers): Figure 2 - Console output

Displaying Parquet File Metadata

import fastparquet as fp

# Reading metadata from Parquet file
meta = fp.ParquetFile('example.parquet').metadata
print("Parquet file metadata:")
print(meta)
import fastparquet as fp

# Reading metadata from Parquet file
meta = fp.ParquetFile('example.parquet').metadata
print("Parquet file metadata:")
print(meta)
PYTHON

Output

fastparquet Python (How It Works For Developers): Figure 3 - Console output

Advanced Features

Using Dask for Parallel Processing

fastparquet integrates well with Dask for handling large datasets in parallel:

import dask.dataframe as dd

# Read a Parquet file into a Dask DataFrame
ddf = dd.read_parquet('example.parquet', engine='fastparquet')

# Perform operations on the Dask DataFrame
result = ddf.groupby('name').mean().compute()

# Display the result
print(result)
import dask.dataframe as dd

# Read a Parquet file into a Dask DataFrame
ddf = dd.read_parquet('example.parquet', engine='fastparquet')

# Perform operations on the Dask DataFrame
result = ddf.groupby('name').mean().compute()

# Display the result
print(result)
PYTHON

Customizing Compression

You can specify different compression algorithms when writing Parquet files:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35]
})

# Write the DataFrame to a Parquet file with gzip compression
df.to_parquet('example.parquet', engine='fastparquet', compression='gzip')
import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35]
})

# Write the DataFrame to a Parquet file with gzip compression
df.to_parquet('example.parquet', engine='fastparquet', compression='gzip')
PYTHON

Introducing IronPDF

fastparquet Python (How It Works For Developers): Figure 4 - IronPDF for Python: The Python PDF Library

IronPDF is a robust Python library crafted for generating, modifying, and digitally signing PDF documents derived from HTML, CSS, images, and JavaScript. It excels in performance while maintaining a minimal memory footprint. Here are its key features:

1. HTML to PDF Conversion

Convert HTML files, HTML strings, and URLs into PDF documents with IronPDF. For instance, effortlessly render webpages into PDFs using the Chrome PDF renderer.

2. Cross-Platform Support

Compatible with Python 3+ across Windows, Mac, Linux, and various Cloud Platforms. IronPDF is also accessible for .NET, Java, Python, and Node.js environments.

3. Editing and Signing

Modify document properties, enhance security with password protection and permissions, and integrate digital signatures into your PDFs using IronPDF.

4. Page Templates and Settings

Tailor PDFs with customized headers, footers, page numbers, and adjustable margins. It supports responsive layouts and accommodates custom paper sizes.

5. Standards Compliance

Conforms to PDF standards like PDF/A and PDF/UA. It handles UTF-8 character encoding and manages assets such as images, CSS stylesheets, and fonts effectively.

Generate PDF Documents using IronPDF and fastparquet

IronPDF for Python prerequisites

  1. IronPDF counts on .NET 6.0 as its underlying technology. Thus, please make sure .NET 6.0 runtime is installed on your system.
  2. Python 3.0+: Ensure you have Python version 3 or later installed.
  3. pip: Install Python package installer pip for installing IronPDF package.

Installation

# Install latest version of the libraries
pip install fastparquet
pip install pandas
pip install ironpdf
# Install latest version of the libraries
pip install fastparquet
pip install pandas
pip install ironpdf
SHELL

Code example

The following code example demonstrates the use of fastparquet and IronPDF together in Python:

import pandas as pd
import fastparquet as fp
from ironpdf import ChromePdfRenderer, License

# Apply your license key for IronPDF
License.LicenseKey = "your Key"

# Sample DataFrame
data = {
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# Writing DataFrame to a Parquet file
fp.write('example.parquet', df)

# Reading from Parquet file into DataFrame
df_read = fp.ParquetFile('example.parquet').to_pandas()

# Displaying the read DataFrame
print("Original DataFrame:")
print(df)
print("\nDataFrame read from Parquet file:")
print(df_read)

# Initialize a ChromePdfRenderer instance
renderer = ChromePdfRenderer()

# Create a PDF from a HTML string using IronPDF
content = "<h1>Awesome Iron PDF with FastParquet</h1>"
content += "<p> Original DataFrame:</p>"
content += "<p>" + f"{str(df)}" + "</p>"
content += "<p> DataFrame read from Parquet file:</p>"
content += "<p>" + f"{str(df_read)}" + "</p>"

# Render the HTML content to a PDF
pdf = renderer.RenderHtmlAsPdf(content)

# Export the PDF to a file
pdf.SaveAs("Demo-FastParquet.pdf")
import pandas as pd
import fastparquet as fp
from ironpdf import ChromePdfRenderer, License

# Apply your license key for IronPDF
License.LicenseKey = "your Key"

# Sample DataFrame
data = {
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# Writing DataFrame to a Parquet file
fp.write('example.parquet', df)

# Reading from Parquet file into DataFrame
df_read = fp.ParquetFile('example.parquet').to_pandas()

# Displaying the read DataFrame
print("Original DataFrame:")
print(df)
print("\nDataFrame read from Parquet file:")
print(df_read)

# Initialize a ChromePdfRenderer instance
renderer = ChromePdfRenderer()

# Create a PDF from a HTML string using IronPDF
content = "<h1>Awesome Iron PDF with FastParquet</h1>"
content += "<p> Original DataFrame:</p>"
content += "<p>" + f"{str(df)}" + "</p>"
content += "<p> DataFrame read from Parquet file:</p>"
content += "<p>" + f"{str(df_read)}" + "</p>"

# Render the HTML content to a PDF
pdf = renderer.RenderHtmlAsPdf(content)

# Export the PDF to a file
pdf.SaveAs("Demo-FastParquet.pdf")
PYTHON

Code Explanation

This code snippet demonstrates how to utilize several Python libraries to manipulate data and generate a PDF document from HTML content.

  1. Imports and Setup: Import necessary libraries for data manipulation, reading and writing Parquet files, and PDF generation.

  2. Setting License Key: Set the license key for IronPDF, enabling its full features.

  3. Creating a Sample DataFrame: Define a sample DataFrame (df) containing information about individuals (name, age, city).

  4. Writing DataFrame to Parquet: Write the DataFrame df to a Parquet file named example.parquet.

  5. Reading from Parquet File: Read data from the Parquet file (example.parquet) back into a DataFrame (df_read).

  6. Generating PDF from HTML:
    • Initialize a ChromePdfRenderer instance using IronPDF.
    • Construct an HTML string (content) that includes a heading (<h1>) and paragraphs (<p>) displaying the original DataFrame (df) and the DataFrame read from the Parquet file (df_read).
    • Render the HTML content as a PDF document using IronPDF.
    • Save the generated PDF document as Demo-FastParquet.pdf.

The code demonstrates a sample code for FastParquet, integrating data processing capabilities with PDF generation, making it useful for creating reports or documents based on data stored in parquet files.

OUTPUT

fastparquet Python (How It Works For Developers): Figure 5 - Console output displaying data from the original Dataframe and the Dataframe read from Parquet file.

OUTPUT PDF

fastparquet Python (How It Works For Developers): Figure 6 - Output PDF generated using IronPDF library

IronPDF License

For license information, visit the IronPDF licensing page.

Place the License Key at the start of the script before using IronPDF package:

from ironpdf import License

# Apply your license key
License.LicenseKey = "your Key"
from ironpdf import License

# Apply your license key
License.LicenseKey = "your Key"
PYTHON

Conclusion

fastparquet is a powerful and efficient library for working with parquet files in Python. Its integration with Pandas and Dask makes it a great choice for handling large datasets in a Python-based big data workflow. IronPDF is a robust Python library that facilitates the creation, manipulation, and rendering of PDF documents directly from Python applications. It simplifies tasks such as converting HTML content into PDF documents, creating interactive forms, and performing various PDF manipulations like merging files or adding watermarks. IronPDF integrates seamlessly with existing Python frameworks and environments, providing developers with a versatile solution for generating and customizing PDF documents dynamically. Together with fastparquet, IronPDF enables seamless data manipulation in parquet file formats and PDF generation.

IronPDF offers comprehensive documentation and code examples to help developers make the best of its features. For more information, please refer to the documentation and code example pages.

Chaknith Bin
Software Engineer
Chaknith works on IronXL and IronBarcode. He has deep expertise in C# and .NET, helping improve the software and support customers. His insights from user interactions contribute to better products, documentation, and overall experience.
< PREVIOUS
Flask Python (How It Works For Developers)
NEXT >
Keras Python (How It Works For Developers)

Ready to get started? Version: 2025.6 just released

View Licenses >