A Hybrid LLM and Relational Approach for Personalized Movie Recommendation
CS 5614 - Database Management Systems · Virginia Tech · Spring 2026
CineMatch is a hybrid movie recommendation system where each component handles the part of the problem it is best suited for. A Go backend serves a REST API with JWT auth, an LLM converts natural-language questions into safe SQL, Apache AGE runs graph traversals for related-title discovery, and a React/TypeScript frontend renders the experience. PostgreSQL 16 with PostGIS manages all structured storage, and an enrichment trigger pipeline auto-queues new movies for LLM-generated mood/theme tagging.
The system collects structured preference data at signup so recommendations are personalized from the first session, addressing the cold-start problem. Mood is treated as a real feature in the user profile, updated through feedback.
Checkpoint 3 (Coming Soon)
Proposal (PDF) Repository Live Demo YouTube Demo
Numbers below are pulled from the tokei CLI
tool.
| Area | Files | Code Lines |
|---|---|---|
| Go | 41 | 4,471 |
| Frontend (TS/TSX/CSS/HTML) | 22 | 4,128 |
| SQL | 3 | 229 |
| Shell scripts | 6 | 1,001 |
| Total project | 98 | 13,276 |
Routes below are pulled from
internal/infra/http/server/server.go. The
current API surface has 20 HTTP endpoints total (19 under
/api + 1 health route).
GET /healthPOST /api/auth/registerPOST /api/auth/signup (register + onboarding in one transaction)POST /api/auth/loginPOST /api/auth/refreshPOST /api/auth/logoutGET /api/movies (supports q, genre, limit, offset)GET /api/movies/searchGET /api/movies/top-rated/all-timeGET /api/movies/trending/weekGET /api/movies/{id}GET /api/movies/{id}/crewGET /api/movies/{id}/relatedGET /api/movies/{id}/graph-relatedPOST /api/chatPOST /api/users/onboardGET /api/feedback/{movieID}POST /api/feedbackPOST /api/feedback/not-interestedGET /api/recommendations/graphThis matrix separates what is live in backend routes, what currently runs with mock data in UI, and what is wired in frontend but not exposed in the backend yet.
| Area | Capability | Status | Notes |
|---|---|---|---|
| API | GET /health |
Live | Health check route is registered and returns 200 |
| API | Auth + refresh rotation | Live | /api/auth/register|signup|login|refresh|logout |
| API | Movie catalog + search + details + crew | Live | /api/movies* routes are active |
| API | Onboarding save endpoint | Live | POST /api/users/onboard behind JWT middleware |
| API | Chat NL -> SQL -> query execution | Live | /api/chat with prompt + SQL guard |
| API | Graph recommendations | Live | /api/recommendations/graph + graph-related movie endpoint |
| API | Feedback + not interested loop | Planned | Writes both relational and graph edges in one transaction |
| Frontend | Home / Search / Movie / Chat routes | Live | Connected to realApi and live backend routes |
| Frontend | 5-step onboarding signup flow | Live | Uses atomic signup endpoint /api/auth/signup |
| Frontend | Dashboard analytics page | UI Demo | Dashboard page is currently bound to mockApi |
| LLM | Schema-grounded system prompt | Live | schema_prompt.go defines full SQL contract + graph mapping rules |
| LLM | SQL safety guard | Live | SELECT-only, blocked DML/DDL keywords, required LIMIT |
| LLM | Retry + multi-model fallback chain | Planned | Primary retries x5, then 4 fallback models |
| Data | 14-table relational schema + enums | Live | schema-01.sql with domain-specific enums and constraints |
| Data | Movie enrichment trigger queue | Live | trg_movie_enrich inserts into enrichment_queue on new movies |
| Data | Apache AGE graph support | Live | AGE loaded in DB connections and graph build script available |
| Ops | CLI workflows (app, dl, migrate, psql) | Live | run.sh + Makefile commands wired |
| Ops | Graceful shutdown | Live | Signal handling with 10s server shutdown timeout |
| Ops | Migration utility | Live | Supports reset|drop|create|indexes|status |
| Page | Mermaid rendering bootstrap | Live | main.js added; diagrams initialized client-side |
| API Gap | /api/analytics/overview |
Planned | Frontend client references it, backend route is not registered |
| API Gap | /api/search/nl + /api/analytics/nl |
Planned | Frontend client includes calls, backend currently uses /api/chat |
| API Gap | /location ingest endpoint |
Planned | Frontend sends geolocation, backend route not present yet |
API startup and shutdown behavior is explicit in
cmd/cenimatch and internal/container.
| Lifecycle Step | Behavior | Code Path |
|---|---|---|
| Boot | Loads environment, parses typed config, constructs container, wires DB, services, and HTTP server | cmd/cenimatch/main.go, internal/container/container.go |
| Serve | Starts HTTP server in a goroutine and keeps main process on signal wait | App.Start(), App.Run() |
| Shutdown trigger | Listens for SIGINT and SIGTERM |
signal.Notify(...) |
| Shutdown window | Graceful HTTP shutdown with 10-second timeout, then DB pool close | container.Shutdown() |
Each new DB connection is prepared for both relational SQL and AGE graph calls before request handling starts.
| Step | Purpose | Implementation |
|---|---|---|
| Parse URL + init pool config | Standardized pgxpool setup from DATABASE_URL |
database.NewConnection() |
| Register enum types | Registers tag_source, crew_role, mood_type in type map |
AfterConnect hook |
| Activate AGE context | Runs LOAD 'age' and sets search_path = ag_catalog, "$user", public |
AfterConnect hook |
| Health gate | Pool Ping must succeed before container starts serving |
pool.Ping(...) |
Config loading has separate behavior for dev and production to speed local work while enforcing stricter prod settings.
| Setting | Dev Mode | Production Mode |
|---|---|---|
BCRYPT_COST |
Default 10 | Required from env |
JWT_EXPIRATION |
Default 24h | Required from env |
REFRESH_TOKEN_EXPIRATION |
Default 30 days | Required from env |
| Core infra vars | DATABASE_URL, PORT, JWT_SECRET, JWT_ISSUER required |
Same required set |
CORS_ALLOWED_ORIGINS |
Optional CSV, merged with local defaults | Optional CSV, appended to default allowed origins |
OPENROUTER_API_KEY |
Optional | Optional |
The project ships with executable workflows for app runtime, downloads, and migration operations.
| Command | What It Does | Entry Point |
|---|---|---|
./run.sh app |
Builds and runs API server binary | cmd/cenimatch |
./run.sh dl ... |
Builds and runs downloader with source filters, worker count, and output directory | cmd/download |
./run.sh migrate reset|drop|create|indexes|status |
Runs migration utility commands | cmd/migrate |
./run.sh migrate seed |
Runs seed pipeline script | migration/seed.sh |
make db / make db-stop |
Starts or stops database stack with Docker Compose | Makefile |
./run.sh psql |
Opens interactive psql session inside running DB container | run.sh |
Migration behavior is catalog-driven and explicit about destructive operations.
| Behavior | Details | Source |
|---|---|---|
| Drop sequencing | Drops tables first, then enums, then sequences using Postgres catalogs | internal/migrator/migrator.go |
| Sync pauses | Includes short waits between drop phases to avoid catalog timing issues | time.Sleep(500ms) |
| Schema split | Tables from schema-01.sql, indexes from schema-02-indexes.sql |
CreateTables(), CreateIndexes() |
| Status introspection | Shows connection, table count, and table list | migrate status |
| Command timeouts | create/indexes/reset use 30m context; lighter commands use 30s |
timeoutForCommand() |
The download pipeline is generic and source-agnostic, with built-in protection for common ingestion failures.
| Capability | Implementation Detail |
|---|---|
| Pluggable auth strategies | NoAuth, BearerToken, BasicAuth, APIKey via shared AuthMethod interface |
| Controlled concurrency | Goroutines with worker semaphore to cap parallel downloads |
| Archive handling | Supports direct file downloads, .gz extraction, and zip extraction |
| Zip-slip protection | Validates extracted path stays under target directory before writing |
| Resume-friendly behavior | Skips already downloaded/extracted files where possible |
| Source selection UX | --list and source alias expansion (imdb, tmdb, netflix, etc.) |
Access and refresh tokens are handled as separate concerns with rotation and server-side revocation checks.
| Control | Current Behavior | Implementation |
|---|---|---|
| Access tokens | HS256 JWT with sub, username, iss, iat, exp |
internal/infra/security/jwt.go |
| Refresh format | uuid:hex_secret; database stores sha256(secret), not raw secret |
internal/infra/security/refresh.go |
| Rotation + revoke | Refresh revokes previous token before issuing new pair | internal/service/auth.go |
| Metadata capture | Refresh tokens can store request IP and User-Agent | tokenMeta() + StoreRefreshToken() |
| Auth middleware | Accepts cookie or bearer token, validates JWT, injects user context | internal/infra/http/middleware/auth.go |
API responses follow a consistent envelope structure so frontend parsing stays uniform across endpoints.
{
"success": true,
"data": { "...": "..." }
}
{
"success": false,
"error": { "code": "INVALID_REQUEST" }
}
This table tracks calls referenced in frontend API clients against routes currently registered in backend server setup.
| Route | Frontend Usage | Backend Route Status | Notes |
|---|---|---|---|
/api/chat |
Used in Chat page | Live | Core NL query path in production frontend |
/api/analytics/overview |
Referenced in realApi |
Planned | Dashboard currently uses mockApi |
/api/search/nl |
Referenced in Search page client | Planned | Backend currently exposes /api/chat for NL SQL |
/api/analytics/nl |
Referenced in dashboard client helpers | Planned | No server route registration yet |
/location |
Location send call in Home page client | Planned | No server route registration yet |
main.js is now included to initialize diagrams client-side.The Go backend follows a strict layered architecture: handlers only decode/encode HTTP, services own all business logic, and repositories own SQL. Security interfaces (bcrypt, JWT, refresh tokens) are injected via ports.
WithTx is used for multi-step writes (like signup + onboarding) so partial writes are rolled back on failure.Multi-step writes are wrapped in a shared transaction helper so profile, feedback, and graph state stay consistent.
| Flow | What Happens in One Transaction | Failure Behavior |
|---|---|---|
POST /api/auth/signup |
Create user, preference JSONB, mood profile, graph user node, liked/disliked graph edges | Any failure rolls back all writes |
POST /api/users/onboard |
Update preferences + mood profile, clear old graph edges, rebuild WATCHED/RATED edges | No partial profile or graph drift |
POST /api/feedback |
Upsert watch history + user feedback, then sync graph WATCHED/RATED edges | Relational and graph signals stay aligned |
POST /api/feedback/not-interested |
Upsert not_interested relational flag and update graph edge attributes |
Preference exclusion is applied consistently |
14 tables across four domains: movie catalog, user identity,
interaction tracking, and infrastructure. Movies are keyed
on tmdb_id, users on UUID. Three
PostgreSQL enums enforce type safety:
tag_source, crew_role,
mood_type. PostGIS GEOGRAPHY
columns on feedback and locations enable spatial queries.
JSONB stores preference weights, arrays track mood IDs, and PostGIS GEOGRAPHY supports location-aware features.trg_movie_enrich.Users type natural-language questions. The backend wraps them with a schema prompt, sends to OpenRouter (Llama 3.3 70B with multi-model fallback), validates the returned SQL through a strict guard, then executes it read-only against PostgreSQL.
SELECT is allowed, write operations are blocked, and a LIMIT clause is required.| Layer | Rule | Implementation |
|---|---|---|
| Prompt contract | Model must output one raw SELECT, no markdown, max LIMIT 50, and UNSAFE: for disallowed asks |
internal/llm/schema_prompt.go |
| Post-generation guard | Strip code fences, block DML/DDL keywords, require SELECT, require LIMIT |
internal/llm/query_guard.go |
| Model resiliency | Retry primary model 5 times, then rotate through 4 free fallback models | internal/llm/openrouter.go |
| Execution boundary | LLM call timeout 20s, SQL execution timeout 10s, chat history trimmed to last 20 messages | internal/service/chat.go |
The prompt in internal/llm/schema_prompt.go
is a full SQL contract, not a short instruction. It
includes a table-by-table schema reference, graph-to-SQL
mapping rules, and strict output constraints so the model
returns one executable PostgreSQL SELECT.
LIMIT with max 50; unsafe requests must return UNSAFE: <reason>.tmdb_id, title, release_year, vote_avg, and poster_path.
JWT access tokens (HS256, configurable expiry) plus
refresh-token rotation. The refresh token format is
uuid:hex_secret — only
sha256(secret) is stored. Old tokens are
revoked on rotation, preventing replay.
The movie_graph is built from relational tables
and queried via Cypher through
ag_catalog.cypher(). It supports genre-based
recommendations, collaborative filtering, actor/director
traversals, and explainability paths.
| Recipe | Cypher Pattern | Where It Appears |
|---|---|---|
| Connected by director | (Movie)<-[:DIRECTED]-(Person)-[:DIRECTED]->(Movie) |
GET /api/movies/{id}/graph-related |
| Connected by cast | (Movie)<-[:ACTED_IN]-(Person)-[:ACTED_IN]->(Movie) |
GET /api/movies/{id}/graph-related |
| Theme overlap | (Movie)-[:IN_GENRE]->(Genre)<-[:IN_GENRE]-(Movie) with overlap ranking |
GET /api/movies/{id}/graph-related |
| Collaborative filtering | Users with overlapping high ratings, then unseen highly-rated titles from similar users | GET /api/recommendations/graph |
CineMatch uses a concurrent Go pipeline to ingest, clean, and normalize movie data from 7 sources including IMDb, TMDB, and Kaggle.
COPY into staging tables, then CTE-based normalization and index migrations for faster queries.
Two Docker Compose files: docker-compose.yml
runs the database (custom PG16 image with AGE + PostGIS),
docker-compose.deploy.yml runs the Go backend,
React frontend (Nginx), and a Cloudflared tunnel for
zero-trust HTTPS access.
GET /api/me so frontend can fetch current user profile without decoding token client-side.after_id/after_score) to avoid large offset scans./api/chat and auth routes to protect free-tier model and login endpoints.title.principals.tsv — directors, actors,
producers in movie_crew
Abdulrahman Alamoudi
Gary Young
Jack Lyons
Team Leader
Yazeed Alharthi
Yordanos Tessema