Files
officeconvert/README.md
T

154 lines
5.1 KiB
Markdown

# officeconvert
officeconvert is a multimodule conversion toolkit for turning presentation files into
typed `SlideDeck` artifacts with rendered slide images and notes. The repository is
organized around Protocol Buffer schemas with ConnectRPC code generation for both server
and client compatibility.
## Modules
- `proto/` contains protobuf schemas and RPC definitions.
- `gen/python` and `gen/go` contain generated protocol and Connect code.
- `python/packages/officeconvert` is the core conversion library (PPTX -> PDF -> images + notes).
- `python/packages/server` is the ConnectRPC Python server with SeaweedFS (S3-compatible) orchestration.
- `clients/go` is the first client library with layered orchestration helpers.
- `deploy/` contains production-ish and dev Docker Compose files.
## Supported Document Types
MVP currently supports **PPTX only** and produces a `SlideDeck` result containing:
- ordered slide image URLs
- plain-text notes per slide
## Quick Commands
Use the root `Makefile`:
- `make buf-lint` to lint protobufs
- `make buf-generate` to regenerate Go and Python types
- `make py-sync` to sync Python workspace dependencies with uv
- `make go-test` to run Go client tests
- `make compose-up` to run server + SeaweedFS
- `make compose-up-dev` to run SeaweedFS only
- `make run-server` to start host `uvicorn` with `.env` (if present) plus defaults
## Development Server Workflow
This is the recommended local workflow for iterating on the Python server and conversion
library while keeping SeaweedFS in Docker.
### 1) Prerequisites
- `buf` on your `PATH`
- `uv` on your `PATH`
- Docker + Docker Compose
- Local tools if running server on host (not in container):
- LibreOffice (`soffice`)
- Poppler (`pdftoppm`)
### 2) Generate typed API code
From repo root:
```bash
make buf-lint
make buf-generate
```
### 3) Sync Python workspace dependencies
From repo root:
```bash
make py-sync
```
### 4) Start SeaweedFS dependency stack (dev compose)
From repo root:
```bash
make compose-up-dev
```
SeaweedFS endpoints:
- S3 API: `http://localhost:8333`
- Master API: `http://localhost:9333`
- Filer API: `http://localhost:8888`
- Default S3 creds: `minioadmin` / `minioadmin`
### 5) Start Connect server (host process)
In a separate terminal, from repo root:
```bash
make run-server
```
`make run-server` behavior:
- loads `.env` automatically if present
- applies reasonable defaults when values are not set
- defaults S3 endpoint to `localhost:8333` for host-based development
- auto-normalizes `seaweedfs:8333` to `localhost:8333` for host runs
- supports optional `UVICORN_HOST` and `UVICORN_PORT` overrides
- exposes conversion timeout tuning vars (`CONVERSION_PPTX_TO_PDF_TIMEOUT_SECONDS`, `CONVERSION_PDF_TO_IMAGES_TIMEOUT_SECONDS`)
Server endpoint base URL:
- `http://localhost:8080`
### 6) Quick smoke test
Create a conversion request:
```bash
curl \
--header "Content-Type: application/json" \
--data '{
"sourceFilename":"example.pptx",
"full":{"resolution":"CONVERSION_RESOLUTION_FHD","jpeg":{"quality":85}},
"thumbnail":{"resolution":"CONVERSION_RESOLUTION_SD","jpeg":{"quality":75}}
}' \
http://localhost:8080/officeconvertapi.v1.ConversionService/CreateConversion
```
Then:
1. Upload the PPTX to the returned `uploadUrl` using HTTP `PUT`.
2. Call `StartConversion` with the returned `conversionId`.
3. Poll `GetConversionStatus` until `CONVERSION_STATUS_SUCCEEDED`.
4. Call `GetSlideDeck` and download each `imageUrl`.
5. Optionally call `DeleteConversion` for early cleanup.
### 7) Full container workflow (optional)
If you want to run both server and SeaweedFS in Docker:
```bash
make compose-up
```
Use `.env.example` as your baseline env configuration.
## Storage Backend Notes
- This project defaults to **SeaweedFS S3 API** for object transit in development and compose deployments.
- The Python server uses the `minio` Python SDK, which is intentional because SeaweedFS is S3-compatible.
- Runtime configuration uses `S3_*` environment variables.
- All conversions share one bucket (`S3_BUCKET`, required). Each conversion's objects live under a `{conversion_id}/` key prefix (for example `{conversion_id}/input/source.pptx` and `{conversion_id}/output/slide-0001.jpg`).
## Conversion Tuning Notes
If conversion fails on larger decks, tune these environment variables:
- `CreateConversionRequest.full.resolution` controls full-size output dimensions via presets: `SD`, `HD`, `FHD`, `QHD`, `UHD`.
- `CreateConversionRequest.thumbnail.resolution` controls thumbnail output dimensions with the same presets.
- Omitting full/thumbnail resolution (or sending `CONVERSION_RESOLUTION_UNSPECIFIED`) defaults to `FHD` for full and `SD` for thumbnail.
- Output is JPEG-only for now; set `CreateConversionRequest.full.jpeg.quality` and `CreateConversionRequest.thumbnail.jpeg.quality` to `1..100` (`0` or omitted uses server defaults: full `85`, thumbnail `75`).
- Rasterization DPI is inferred automatically from source slide size and selected full/thumbnail output dimensions.
- `CONVERSION_PPTX_TO_PDF_TIMEOUT_SECONDS` (default `180`): timeout for LibreOffice export.
- `CONVERSION_PDF_TO_IMAGES_TIMEOUT_SECONDS` (default `1800`): timeout for Poppler rasterization.