Compare commits

...

2 Commits

Author SHA1 Message Date
end 4d8da4c910 add S3 policy docs to README
Docker server image / build-and-push (push) Successful in 1m31s
2026-06-17 11:14:44 -07:00
end 42b9ae85a8 use a single bucket rather than one per conversion 2026-06-17 11:09:52 -07:00
10 changed files with 146 additions and 53 deletions
+1
View File
@@ -1,5 +1,6 @@
S3_ENDPOINT=seaweedfs:8333
S3_PUBLIC_ENDPOINT=localhost:8333
S3_BUCKET=officeconvert
S3_USE_SSL=false
# Presigned URLs; omit to match S3_USE_SSL (internal client uses S3_ENDPOINT).
S3_PUBLIC_USE_SSL=false
+1
View File
@@ -41,6 +41,7 @@ run-server:
if [ "$${S3_PUBLIC_ENDPOINT:-}" = "seaweedfs:8333" ]; then S3_PUBLIC_ENDPOINT=localhost:8333; fi; \
export S3_ENDPOINT="$${S3_ENDPOINT:-localhost:8333}"; \
export S3_PUBLIC_ENDPOINT="$${S3_PUBLIC_ENDPOINT:-localhost:8333}"; \
export S3_BUCKET="$${S3_BUCKET:-officeconvert}"; \
export S3_USE_SSL="$${S3_USE_SSL:-false}"; \
export S3_ACCESS_KEY="$${S3_ACCESS_KEY:-minioadmin}"; \
export S3_SECRET_KEY="$${S3_SECRET_KEY:-minioadmin}"; \
+70 -2
View File
@@ -135,9 +135,77 @@ Use `.env.example` as your baseline env configuration.
## Storage Backend Notes
- This project defaults to **SeaweedFS S3 API** for object transit in development and compose deployments.
- The Python server uses the `minio` Python SDK, which is intentional because SeaweedFS is S3-compatible.
- Local development defaults to **SeaweedFS** (S3-compatible) via Docker Compose.
- Production can use any S3-compatible provider; **AWS S3** is the expected choice.
- The Python server uses the `minio` Python SDK against the S3 API.
- Runtime configuration uses `S3_*` environment variables.
- All conversions share one bucket (`S3_BUCKET`, required). Each conversion's objects live under a `{conversion_id}/` key prefix (for example `{conversion_id}/input/source.pptx` and `{conversion_id}/output/slide-0001.jpg`).
### AWS setup
**Bucket**
1. Create one bucket (for example `officeconvert-prod`) in the region where the server runs.
2. Leave **Block Public Access** enabled. Presigned URLs work without a public bucket.
3. Optional: add a lifecycle rule to expire objects after a few days as a safety net if cleanup fails.
**Server environment**
Set at minimum:
```bash
S3_BUCKET=officeconvert-prod
S3_ENDPOINT=s3.us-east-1.amazonaws.com
S3_PUBLIC_ENDPOINT=s3.us-east-1.amazonaws.com
S3_REGION=us-east-1
S3_USE_SSL=true
S3_PUBLIC_USE_SSL=true
S3_ACCESS_KEY=...
S3_SECRET_KEY=...
```
Use your bucket's regional hostname for both endpoints unless you deliberately split internal vs client-facing access. `S3_PUBLIC_ENDPOINT` must be reachable by whatever uploads and downloads via presigned URLs (clients, not just the server).
On startup the server calls `CreateBucket` if the bucket is missing. In AWS it is simpler to **pre-create the bucket** and grant object permissions only (see IAM below).
**IAM permissions**
Scope access to the single bucket. Object keys are per-conversion prefixes, so list/delete can target the whole bucket:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": "arn:aws:s3:::officeconvert-prod"
},
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
"Resource": "arn:aws:s3:::officeconvert-prod/*"
}
]
}
```
Add `s3:CreateBucket` on `arn:aws:s3:::officeconvert-prod` only if you want the server to create the bucket on first boot.
**CORS**
Required only if uploads or downloads go **directly from a browser** to presigned URLs. Server-side clients (`curl`, the Go client) do not need CORS. Allow `PUT` and `GET` for your web origin on the bucket.
**IAM roles vs IAM users**
AWS recommends **roles** over long-lived **IAM user** access keys when the server runs on AWS compute (ECS, EC2, Lambda): a role grants **temporary** credentials that rotate automatically, with no static keys to store or leak.
For this project today, the server reads explicit `S3_ACCESS_KEY` and `S3_SECRET_KEY` via the MinIO SDK. That maps cleanly to:
| Where you run | Practical choice |
|---------------|------------------|
| Docker on a VPS, bare metal, or outside AWS | IAM **user** with the policy above; store keys in env or a secrets manager. Fine for a single service at low volume. |
| ECS / EC2 / EKS on AWS | Prefer an IAM **role** attached to the task or instance. Your orchestrator injects short-lived credentials; you still pass them into `S3_ACCESS_KEY` / `S3_SECRET_KEY` (and a session token if your runtime provides one — the server does not yet read a dedicated `S3_SESSION_TOKEN` env var). |
## Conversion Tuning Notes
+1
View File
@@ -23,6 +23,7 @@ services:
environment:
S3_ENDPOINT: ${S3_ENDPOINT:-seaweedfs:8333}
S3_PUBLIC_ENDPOINT: ${S3_PUBLIC_ENDPOINT:-localhost:8333}
S3_BUCKET: ${S3_BUCKET:-officeconvert}
S3_USE_SSL: ${S3_USE_SSL:-false}
S3_ACCESS_KEY: ${S3_ACCESS_KEY:-minioadmin}
S3_SECRET_KEY: ${S3_SECRET_KEY:-minioadmin}
+1
View File
@@ -22,6 +22,7 @@ services:
environment:
S3_ENDPOINT: ${S3_ENDPOINT:-seaweedfs:8333}
S3_PUBLIC_ENDPOINT: ${S3_PUBLIC_ENDPOINT:-localhost:8333}
S3_BUCKET: ${S3_BUCKET:-officeconvert}
S3_USE_SSL: ${S3_USE_SSL:-false}
S3_ACCESS_KEY: ${S3_ACCESS_KEY:-minioadmin}
S3_SECRET_KEY: ${S3_SECRET_KEY:-minioadmin}
@@ -10,7 +10,9 @@ from officeconvertapi.v1.conversion_connect import ConversionServiceASGIApplicat
from officeconvert_server.config import load_server_config
from officeconvert_server.service import ConversionServiceImpl
from officeconvert_server.storage import S3Store
from officeconvert_server.storage import S3Store, log_s3_error
from minio.error import S3Error
logger = logging.getLogger(__name__)
@@ -55,6 +57,16 @@ def create_app() -> ConversionServiceASGIApplication:
if os.getenv("OFFICECONVERT_S3_TRACE", "").lower() in ("1", "true", "yes"):
store.enable_http_trace(sys.stderr)
logger.warning("OFFICECONVERT_S3_TRACE enabled: S3 HTTP dumps on stderr")
try:
store.ensure_bucket(config.s3_bucket)
except S3Error as exc:
log_s3_error(
"ensure_bucket",
endpoint=config.s3_endpoint,
secure=config.s3_secure,
exc=exc,
)
raise
service = ConversionServiceImpl(config=config, store=store)
return ConversionServiceASGIApplication(service)
@@ -10,6 +10,7 @@ import os
class ServerConfig:
"""Defines environment-driven settings for server orchestration."""
s3_bucket: str
s3_endpoint: str
s3_access_key: str
s3_secret_key: str
@@ -37,7 +38,11 @@ def load_server_config() -> ServerConfig:
else s3_secure
)
region_env = os.getenv("S3_REGION", "").strip()
s3_bucket = os.getenv("S3_BUCKET", "").strip()
if not s3_bucket:
raise ValueError("S3_BUCKET is required")
return ServerConfig(
s3_bucket=s3_bucket,
s3_endpoint=os.getenv("S3_ENDPOINT", "localhost:8333"),
s3_access_key=os.getenv("S3_ACCESS_KEY", "minioadmin"),
s3_secret_key=os.getenv("S3_SECRET_KEY", "minioadmin"),
@@ -23,7 +23,7 @@ class ConversionSession:
thumbnail_resolution: conversion_pb2.ConversionResolution
full_jpeg_quality: int
thumbnail_jpeg_quality: int
bucket_name: str
object_prefix: str
upload_object_key: str
status: conversion_pb2.ConversionStatus
notes: conversion_pb2.NotesOptions | None = None
@@ -122,23 +122,13 @@ class ConversionServiceImpl(conversion_connect.ConversionService):
ksuid = Ksuid()
conversion_id = str(ksuid)
bucket_name = f"oc-{bytes(ksuid).hex()}"
upload_key = "input/source.pptx"
object_prefix = f"{conversion_id}/"
upload_key = f"{object_prefix}input/source.pptx"
expires_at = utc_now() + timedelta(seconds=self._config.s3_session_ttl_seconds)
try:
self._store.ensure_bucket(bucket_name)
except S3Error as exc:
log_s3_error(
"ensure_bucket",
endpoint=self._config.s3_endpoint,
secure=self._config.s3_secure,
exc=exc,
)
raise
try:
upload_url = self._store.presigned_put_url(
bucket_name,
self._config.s3_bucket,
upload_key,
ttl_seconds=self._config.s3_session_ttl_seconds,
)
@@ -159,7 +149,7 @@ class ConversionServiceImpl(conversion_connect.ConversionService):
full_jpeg_quality=full_jpeg_quality,
thumbnail_jpeg_quality=thumbnail_jpeg_quality,
notes=request.notes if request.HasField("notes") else None,
bucket_name=bucket_name,
object_prefix=object_prefix,
upload_object_key=upload_key,
status=conversion_pb2.CONVERSION_STATUS_PENDING,
)
@@ -168,7 +158,7 @@ class ConversionServiceImpl(conversion_connect.ConversionService):
return conversion_pb2.CreateConversionResponse(
conversion_id=conversion_id,
upload_bucket=bucket_name,
upload_bucket=self._config.s3_bucket,
upload_object_key=upload_key,
upload_url=upload_url,
expires_at=_to_timestamp(expires_at),
@@ -265,7 +255,11 @@ class ConversionServiceImpl(conversion_connect.ConversionService):
if session.conversion_task is not None and not session.conversion_task.done():
session.conversion_task.cancel()
await self._cleanup_local_artifacts(session)
await asyncio.to_thread(self._store.remove_bucket_tree, session.bucket_name)
await asyncio.to_thread(
self._store.remove_prefix,
self._config.s3_bucket,
session.object_prefix,
)
return conversion_pb2.DeleteConversionResponse(
conversion_id=session.conversion_id,
deleted=True,
@@ -295,7 +289,7 @@ class ConversionServiceImpl(conversion_connect.ConversionService):
try:
await asyncio.to_thread(
self._store.fget_object,
session.bucket_name,
self._config.s3_bucket,
session.upload_object_key,
source_path,
)
@@ -436,10 +430,12 @@ class ConversionServiceImpl(conversion_connect.ConversionService):
upload_total = slide_total * 2
upload_index = 0
for slide in slides:
object_key = f"output/slide-{slide.index:04d}{slide.image_path.suffix}"
self._store.fput_object(session.bucket_name, object_key, slide.image_path)
object_key = (
f"{session.object_prefix}output/slide-{slide.index:04d}{slide.image_path.suffix}"
)
self._store.fput_object(self._config.s3_bucket, object_key, slide.image_path)
image_url = self._store.presigned_get_url(
session.bucket_name,
self._config.s3_bucket,
object_key,
ttl_seconds=self._config.s3_session_ttl_seconds,
)
@@ -447,15 +443,16 @@ class ConversionServiceImpl(conversion_connect.ConversionService):
if progress_callback is not None:
progress_callback(upload_index, upload_total)
thumbnail_object_key = (
f"output/thumb/slide-{slide.index:04d}{slide.thumbnail_path.suffix}"
f"{session.object_prefix}output/thumb/slide-{slide.index:04d}"
f"{slide.thumbnail_path.suffix}"
)
self._store.fput_object(
session.bucket_name,
self._config.s3_bucket,
thumbnail_object_key,
slide.thumbnail_path,
)
thumbnail_image_url = self._store.presigned_get_url(
session.bucket_name,
self._config.s3_bucket,
thumbnail_object_key,
ttl_seconds=self._config.s3_session_ttl_seconds,
)
@@ -515,7 +512,11 @@ class ConversionServiceImpl(conversion_connect.ConversionService):
"""Delete storage resources after the configured session retention period."""
try:
await asyncio.sleep(self._config.conversion_cleanup_delay_seconds)
await asyncio.to_thread(self._store.remove_bucket_tree, session.bucket_name)
await asyncio.to_thread(
self._store.remove_prefix,
self._config.s3_bucket,
session.object_prefix,
)
except asyncio.CancelledError:
return
finally:
@@ -123,35 +123,38 @@ class S3Store:
"""Upload one local filesystem object to storage."""
self._client.fput_object(bucket_name, object_key, str(source_path))
def remove_bucket_tree(self, bucket_name: str) -> None:
"""Remove all objects in a bucket and then delete the bucket."""
objects = list(self._client.list_objects(bucket_name, recursive=True))
if objects:
delete_requests: list[DeleteObject] = []
for obj in objects:
object_name = obj.object_name
if object_name is None:
raise RuntimeError(
"encountered unnamed object while removing bucket contents"
)
delete_requests.append(DeleteObject(object_name))
errors = self._client.remove_objects(
def remove_prefix(self, bucket_name: str, prefix: str) -> None:
"""Remove all objects under a key prefix within a bucket."""
normalized_prefix = prefix if prefix.endswith("/") else f"{prefix}/"
objects = list(
self._client.list_objects(
bucket_name,
delete_requests,
prefix=normalized_prefix,
recursive=True,
)
for err in errors:
object_name = err.name or "<unknown>"
message = err.message or err.code
)
if not objects:
return
delete_requests: list[DeleteObject] = []
for obj in objects:
object_name = obj.object_name
if object_name is None:
raise RuntimeError(
f"failed to delete object {object_name}: {message}"
"encountered unnamed object while removing prefix contents"
)
try:
self._client.remove_bucket(bucket_name)
except S3Error as exc:
# Concurrent cleanup paths may race to remove the same bucket.
if exc.code != "NoSuchBucket":
raise
delete_requests.append(DeleteObject(object_name))
errors = self._client.remove_objects(
bucket_name,
delete_requests,
)
for err in errors:
object_name = err.name or "<unknown>"
message = err.message or err.code
raise RuntimeError(
f"failed to delete object {object_name}: {message}"
)
def object_key_from_presigned_url(url: str) -> str: