Pass S3_REGION and S3_PUBLIC_USE_SSL through compose, treat blank public SSL as unset, and skip CreateBucket when IAM only allows access to an existing bucket.
officeconvert
officeconvert is a multimodule conversion toolkit for turning presentation files into
typed SlideDeck artifacts with rendered slide images and notes. The repository is
organized around Protocol Buffer schemas with ConnectRPC code generation for both server
and client compatibility.
Modules
proto/contains protobuf schemas and RPC definitions.gen/pythonandgen/gocontain generated protocol and Connect code.python/packages/officeconvertis the core conversion library (PPTX -> PDF -> images + notes).python/packages/serveris the ConnectRPC Python server with SeaweedFS (S3-compatible) orchestration.clients/gois the first client library with layered orchestration helpers.deploy/contains production-ish and dev Docker Compose files.
Supported Document Types
MVP currently supports PPTX only and produces a SlideDeck result containing:
- ordered slide image URLs
- plain-text notes per slide
Quick Commands
Use the root Makefile:
make buf-lintto lint protobufsmake buf-generateto regenerate Go and Python typesmake py-syncto sync Python workspace dependencies with uvmake go-testto run Go client testsmake compose-upto run server + SeaweedFSmake compose-up-devto run SeaweedFS onlymake run-serverto start hostuvicornwith.env(if present) plus defaults
Development Server Workflow
This is the recommended local workflow for iterating on the Python server and conversion library while keeping SeaweedFS in Docker.
1) Prerequisites
bufon yourPATHuvon yourPATH- Docker + Docker Compose
- Local tools if running server on host (not in container):
- LibreOffice (
soffice) - Poppler (
pdftoppm)
- LibreOffice (
2) Generate typed API code
From repo root:
make buf-lint
make buf-generate
3) Sync Python workspace dependencies
From repo root:
make py-sync
4) Start SeaweedFS dependency stack (dev compose)
From repo root:
make compose-up-dev
SeaweedFS endpoints:
- S3 API:
http://localhost:8333 - Master API:
http://localhost:9333 - Filer API:
http://localhost:8888 - Default S3 creds:
minioadmin/minioadmin
5) Start Connect server (host process)
In a separate terminal, from repo root:
make run-server
make run-server behavior:
- loads
.envautomatically if present - applies reasonable defaults when values are not set
- defaults S3 endpoint to
localhost:8333for host-based development - auto-normalizes
seaweedfs:8333tolocalhost:8333for host runs - supports optional
UVICORN_HOSTandUVICORN_PORToverrides - exposes conversion timeout tuning vars (
CONVERSION_PPTX_TO_PDF_TIMEOUT_SECONDS,CONVERSION_PDF_TO_IMAGES_TIMEOUT_SECONDS)
Server endpoint base URL:
http://localhost:8080
6) Quick smoke test
Create a conversion request:
curl \
--header "Content-Type: application/json" \
--data '{
"sourceFilename":"example.pptx",
"full":{"resolution":"CONVERSION_RESOLUTION_FHD","jpeg":{"quality":85}},
"thumbnail":{"resolution":"CONVERSION_RESOLUTION_SD","jpeg":{"quality":75}}
}' \
http://localhost:8080/officeconvertapi.v1.ConversionService/CreateConversion
Then:
- Upload the PPTX to the returned
uploadUrlusing HTTPPUT. - Call
StartConversionwith the returnedconversionId. - Poll
GetConversionStatusuntilCONVERSION_STATUS_SUCCEEDED. - Call
GetSlideDeckand download eachimageUrl. - Optionally call
DeleteConversionfor early cleanup.
7) Full container workflow (optional)
If you want to run both server and SeaweedFS in Docker:
make compose-up
Use .env.example as your baseline env configuration.
Storage Backend Notes
- Local development defaults to SeaweedFS (S3-compatible) via Docker Compose.
- Production can use any S3-compatible provider; AWS S3 is the expected choice.
- The Python server uses the
minioPython SDK against the S3 API. - Runtime configuration uses
S3_*environment variables. - All conversions share one bucket (
S3_BUCKET, required). Each conversion's objects live under a{conversion_id}/key prefix (for example{conversion_id}/input/source.pptxand{conversion_id}/output/slide-0001.jpg).
AWS setup
Bucket
- Create one bucket (for example
officeconvert-prod) in the region where the server runs. - Leave Block Public Access enabled. Presigned URLs work without a public bucket.
- Optional: add a lifecycle rule to expire objects after a few days as a safety net if cleanup fails.
Server environment
Set at minimum:
S3_BUCKET=officeconvert-prod
S3_ENDPOINT=s3.us-east-1.amazonaws.com
S3_PUBLIC_ENDPOINT=s3.us-east-1.amazonaws.com
S3_REGION=us-east-1
S3_USE_SSL=true
S3_PUBLIC_USE_SSL=true
S3_ACCESS_KEY=...
S3_SECRET_KEY=...
Use your bucket's regional hostname for both endpoints unless you deliberately split internal vs client-facing access. S3_PUBLIC_ENDPOINT must be reachable by whatever uploads and downloads via presigned URLs (clients, not just the server).
On startup the server calls CreateBucket if the bucket is missing. In AWS it is simpler to pre-create the bucket and grant object permissions only (see IAM below).
IAM permissions
Scope access to the single bucket. Object keys are per-conversion prefixes, so list/delete can target the whole bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": "arn:aws:s3:::officeconvert-prod"
},
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
"Resource": "arn:aws:s3:::officeconvert-prod/*"
}
]
}
Add s3:CreateBucket on arn:aws:s3:::officeconvert-prod only if you want the server to create the bucket on first boot.
CORS
Required only if uploads or downloads go directly from a browser to presigned URLs. Server-side clients (curl, the Go client) do not need CORS. Allow PUT and GET for your web origin on the bucket.
IAM roles vs IAM users
AWS recommends roles over long-lived IAM user access keys when the server runs on AWS compute (ECS, EC2, Lambda): a role grants temporary credentials that rotate automatically, with no static keys to store or leak.
For this project today, the server reads explicit S3_ACCESS_KEY and S3_SECRET_KEY via the MinIO SDK. That maps cleanly to:
| Where you run | Practical choice |
|---|---|
| Docker on a VPS, bare metal, or outside AWS | IAM user with the policy above; store keys in env or a secrets manager. Fine for a single service at low volume. |
| ECS / EC2 / EKS on AWS | Prefer an IAM role attached to the task or instance. Your orchestrator injects short-lived credentials; you still pass them into S3_ACCESS_KEY / S3_SECRET_KEY (and a session token if your runtime provides one — the server does not yet read a dedicated S3_SESSION_TOKEN env var). |
Conversion Tuning Notes
If conversion fails on larger decks, tune these environment variables:
CreateConversionRequest.full.resolutioncontrols full-size output dimensions via presets:SD,HD,FHD,QHD,UHD.CreateConversionRequest.thumbnail.resolutioncontrols thumbnail output dimensions with the same presets.- Omitting full/thumbnail resolution (or sending
CONVERSION_RESOLUTION_UNSPECIFIED) defaults toFHDfor full andSDfor thumbnail. - Output is JPEG-only for now; set
CreateConversionRequest.full.jpeg.qualityandCreateConversionRequest.thumbnail.jpeg.qualityto1..100(0or omitted uses server defaults: full85, thumbnail75). - Rasterization DPI is inferred automatically from source slide size and selected full/thumbnail output dimensions.
CONVERSION_PPTX_TO_PDF_TIMEOUT_SECONDS(default180): timeout for LibreOffice export.CONVERSION_PDF_TO_IMAGES_TIMEOUT_SECONDS(default1800): timeout for Poppler rasterization.