Pandas Integration API

pyopenxlsx offers deep and native support for Python’s most popular data analysis library: pandas.

By combining the C++ memory-mapped engine of pyopenxlsx with the vectorized operations of pandas, you can achieve unprecedented speeds when importing and exporting huge datasets (e.g., writing 1 million styled rows in under 8 seconds).

Exporting a DataFrame (Writing to Excel)

Use Worksheet.write_dataframe() to instantly dump a pandas.DataFrame into an Excel sheet.

import pandas as pd
import datetime
from pyopenxlsx import Workbook

wb = Workbook()
ws = wb.active

df = pd.DataFrame({
    "ID": [101, 102, 103],
    "Department": ["Sales", "Engineering", "HR"],
    "Salary": [85000.50, 120000.00, 75000.75],
    "Hire Date": [datetime.date(2021, 5, 1), datetime.date(2019, 8, 15), datetime.date(2022, 1, 10)]
})

# Basic export (writes headers automatically)
ws.write_dataframe(df, start_row=1, start_col=1)

wb.save("pandas_export.xlsx")

High-Performance Column Styling

A common pain point when exporting DataFrames is applying Excel formatting (like Currency $ or Date yyyy-mm-dd) to specific columns without looping over millions of cells in Python (which is extremely slow).

pyopenxlsx solves this with the column_styles parameter. When provided, the engine automatically switches to the C++ XLStreamWriter, injecting your styles natively during the streaming process with Zero Overhead.

from pyopenxlsx import Workbook

wb = Workbook()
ws = wb.active

# 1. Create the styles you need
currency_style = wb.add_style(number_format="$#,##0.00")
date_style = wb.add_style(number_format="yyyy-mm-dd")

# 2. Export and style in ONE operation!
# You can map by column name or 0-based column index
ws.write_dataframe(df, column_styles={
    "Salary": currency_style,
    "Hire Date": date_style
})

wb.save("styled_export.xlsx")

Note: Using column_styles is capable of styling 1,000,000 rows in just ~3 seconds!

Importing to a DataFrame (Reading from Excel)

To extract data back into a pandas.DataFrame with maximum performance, use Worksheet.read_dataframe().

Instead of allocating Python Cell objects for the entire document, this method uses the underlying C++ XML XLStreamReader to extract raw values instantly.

from pyopenxlsx import load_workbook
import pandas as pd

wb = load_workbook("styled_export.xlsx")
ws = wb.active

# 1. Read the data directly into a DataFrame
# You can specify the exact bounding box, or let it read the entire used range
df_read = ws.read_dataframe(header=True)

# 2. Restore Date formats
# Since stream reading extracts raw Excel serial numbers (e.g., 44317.0) for maximum speed,
# you should use pandas' native vectorized functions to restore datetime objects:
if "Hire Date" in df_read.columns:
    df_read["Hire Date"] = pd.to_datetime(
        df_read["Hire Date"], 
        unit='D', 
        origin='1899-12-30'
    ).dt.date

print(df_read)

Async Pandas Operations

If you are building a web backend (like FastAPI) that exports or imports reports, you can use the async variants to completely offload the CPU-bound conversions and disk I/O to a threadpool, keeping your event loop responsive.

import asyncio
from pyopenxlsx import Workbook

async def export_report(df):
    wb = Workbook()
    ws = wb.active
    
    # Non-blocking DataFrame write
    await ws.write_dataframe_async(df)
    
    # Non-blocking zip compression and save
    await wb.save_async("async_report.xlsx")