end 4d8da4c910
Docker server image / build-and-push (push) Successful in 1m31s
add S3 policy docs to README
2026-06-17 11:14:44 -07:00
2026-06-02 19:35:21 -07:00
2026-06-02 19:35:21 -07:00
2026-03-26 14:01:10 -07:00
2026-06-02 19:35:21 -07:00
2026-06-02 19:28:22 -07:00
2026-06-17 11:14:44 -07:00

officeconvert

officeconvert is a multimodule conversion toolkit for turning presentation files into typed SlideDeck artifacts with rendered slide images and notes. The repository is organized around Protocol Buffer schemas with ConnectRPC code generation for both server and client compatibility.

Modules

  • proto/ contains protobuf schemas and RPC definitions.
  • gen/python and gen/go contain generated protocol and Connect code.
  • python/packages/officeconvert is the core conversion library (PPTX -> PDF -> images + notes).
  • python/packages/server is the ConnectRPC Python server with SeaweedFS (S3-compatible) orchestration.
  • clients/go is the first client library with layered orchestration helpers.
  • deploy/ contains production-ish and dev Docker Compose files.

Supported Document Types

MVP currently supports PPTX only and produces a SlideDeck result containing:

  • ordered slide image URLs
  • plain-text notes per slide

Quick Commands

Use the root Makefile:

  • make buf-lint to lint protobufs
  • make buf-generate to regenerate Go and Python types
  • make py-sync to sync Python workspace dependencies with uv
  • make go-test to run Go client tests
  • make compose-up to run server + SeaweedFS
  • make compose-up-dev to run SeaweedFS only
  • make run-server to start host uvicorn with .env (if present) plus defaults

Development Server Workflow

This is the recommended local workflow for iterating on the Python server and conversion library while keeping SeaweedFS in Docker.

1) Prerequisites

  • buf on your PATH
  • uv on your PATH
  • Docker + Docker Compose
  • Local tools if running server on host (not in container):
    • LibreOffice (soffice)
    • Poppler (pdftoppm)

2) Generate typed API code

From repo root:

make buf-lint
make buf-generate

3) Sync Python workspace dependencies

From repo root:

make py-sync

4) Start SeaweedFS dependency stack (dev compose)

From repo root:

make compose-up-dev

SeaweedFS endpoints:

  • S3 API: http://localhost:8333
  • Master API: http://localhost:9333
  • Filer API: http://localhost:8888
  • Default S3 creds: minioadmin / minioadmin

5) Start Connect server (host process)

In a separate terminal, from repo root:

make run-server

make run-server behavior:

  • loads .env automatically if present
  • applies reasonable defaults when values are not set
  • defaults S3 endpoint to localhost:8333 for host-based development
  • auto-normalizes seaweedfs:8333 to localhost:8333 for host runs
  • supports optional UVICORN_HOST and UVICORN_PORT overrides
  • exposes conversion timeout tuning vars (CONVERSION_PPTX_TO_PDF_TIMEOUT_SECONDS, CONVERSION_PDF_TO_IMAGES_TIMEOUT_SECONDS)

Server endpoint base URL:

  • http://localhost:8080

6) Quick smoke test

Create a conversion request:

curl \
  --header "Content-Type: application/json" \
  --data '{
    "sourceFilename":"example.pptx",
    "full":{"resolution":"CONVERSION_RESOLUTION_FHD","jpeg":{"quality":85}},
    "thumbnail":{"resolution":"CONVERSION_RESOLUTION_SD","jpeg":{"quality":75}}
  }' \
  http://localhost:8080/officeconvertapi.v1.ConversionService/CreateConversion

Then:

  1. Upload the PPTX to the returned uploadUrl using HTTP PUT.
  2. Call StartConversion with the returned conversionId.
  3. Poll GetConversionStatus until CONVERSION_STATUS_SUCCEEDED.
  4. Call GetSlideDeck and download each imageUrl.
  5. Optionally call DeleteConversion for early cleanup.

7) Full container workflow (optional)

If you want to run both server and SeaweedFS in Docker:

make compose-up

Use .env.example as your baseline env configuration.

Storage Backend Notes

  • Local development defaults to SeaweedFS (S3-compatible) via Docker Compose.
  • Production can use any S3-compatible provider; AWS S3 is the expected choice.
  • The Python server uses the minio Python SDK against the S3 API.
  • Runtime configuration uses S3_* environment variables.
  • All conversions share one bucket (S3_BUCKET, required). Each conversion's objects live under a {conversion_id}/ key prefix (for example {conversion_id}/input/source.pptx and {conversion_id}/output/slide-0001.jpg).

AWS setup

Bucket

  1. Create one bucket (for example officeconvert-prod) in the region where the server runs.
  2. Leave Block Public Access enabled. Presigned URLs work without a public bucket.
  3. Optional: add a lifecycle rule to expire objects after a few days as a safety net if cleanup fails.

Server environment

Set at minimum:

S3_BUCKET=officeconvert-prod
S3_ENDPOINT=s3.us-east-1.amazonaws.com
S3_PUBLIC_ENDPOINT=s3.us-east-1.amazonaws.com
S3_REGION=us-east-1
S3_USE_SSL=true
S3_PUBLIC_USE_SSL=true
S3_ACCESS_KEY=...
S3_SECRET_KEY=...

Use your bucket's regional hostname for both endpoints unless you deliberately split internal vs client-facing access. S3_PUBLIC_ENDPOINT must be reachable by whatever uploads and downloads via presigned URLs (clients, not just the server).

On startup the server calls CreateBucket if the bucket is missing. In AWS it is simpler to pre-create the bucket and grant object permissions only (see IAM below).

IAM permissions

Scope access to the single bucket. Object keys are per-conversion prefixes, so list/delete can target the whole bucket:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": "arn:aws:s3:::officeconvert-prod"
    },
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
      "Resource": "arn:aws:s3:::officeconvert-prod/*"
    }
  ]
}

Add s3:CreateBucket on arn:aws:s3:::officeconvert-prod only if you want the server to create the bucket on first boot.

CORS

Required only if uploads or downloads go directly from a browser to presigned URLs. Server-side clients (curl, the Go client) do not need CORS. Allow PUT and GET for your web origin on the bucket.

IAM roles vs IAM users

AWS recommends roles over long-lived IAM user access keys when the server runs on AWS compute (ECS, EC2, Lambda): a role grants temporary credentials that rotate automatically, with no static keys to store or leak.

For this project today, the server reads explicit S3_ACCESS_KEY and S3_SECRET_KEY via the MinIO SDK. That maps cleanly to:

Where you run Practical choice
Docker on a VPS, bare metal, or outside AWS IAM user with the policy above; store keys in env or a secrets manager. Fine for a single service at low volume.
ECS / EC2 / EKS on AWS Prefer an IAM role attached to the task or instance. Your orchestrator injects short-lived credentials; you still pass them into S3_ACCESS_KEY / S3_SECRET_KEY (and a session token if your runtime provides one — the server does not yet read a dedicated S3_SESSION_TOKEN env var).

Conversion Tuning Notes

If conversion fails on larger decks, tune these environment variables:

  • CreateConversionRequest.full.resolution controls full-size output dimensions via presets: SD, HD, FHD, QHD, UHD.
  • CreateConversionRequest.thumbnail.resolution controls thumbnail output dimensions with the same presets.
  • Omitting full/thumbnail resolution (or sending CONVERSION_RESOLUTION_UNSPECIFIED) defaults to FHD for full and SD for thumbnail.
  • Output is JPEG-only for now; set CreateConversionRequest.full.jpeg.quality and CreateConversionRequest.thumbnail.jpeg.quality to 1..100 (0 or omitted uses server defaults: full 85, thumbnail 75).
  • Rasterization DPI is inferred automatically from source slide size and selected full/thumbnail output dimensions.
  • CONVERSION_PPTX_TO_PDF_TIMEOUT_SECONDS (default 180): timeout for LibreOffice export.
  • CONVERSION_PDF_TO_IMAGES_TIMEOUT_SECONDS (default 1800): timeout for Poppler rasterization.
S
Description
Microservice to convert common office document files to server consumable formats.
Readme 489 KiB
Languages
Python 87.1%
Go 10.3%
Makefile 2.6%