Node.js OCR Middleware for Optical CMS — Full Problem & Solution Deep Dive


(A complete, real-world GovTech integration story for developers)

GovTech, MCCY, MOM, BCA जैसे agency projects में जब हम content-centric applications Optical CMS पर बनाते हैं, तब अक्सर एक ज़रूरी requirement आती है:

“User document upload करेगा → उसमे से AI/OCR से data निकालकर profile में update कर दो।”

इसी requirement के कारण हमने BCA POC के दौरान एक अलग Node.js “Passport OCR Middleware” बनाया।
यह blog बताता है क्यों, और क्यों यह Optical के अंदर नहीं करना चाहिए


🧩 📌 Problem: Passport OCR integration inside Optical CMS

In the BCA POC (construction worker onboarding flow), we had to:

  1. Allow user/facilitator to upload Passport (PDF/JPG)

  2. Extract information such as:

    • Passport Number

    • Passport Expiry

    • Full Name

    • Nationality

    • Date of Birth

  3. Update extracted data automatically into the worker profile stored inside Optical CMS (Directus)

  4. Avoid manual entry and human error

GovTech ने Passport OCR के लिए AISAY / KIE OCR API provide किया था.

Surface level पर यह बहुत सीधा लगता है:

“User uploads passport → Optical CMS → Run script → Call OCR → Save data.”

But actual world में यह simple नहीं था.

हमने design शुरू किया Optical Flow script के अंदर से ही AISAY OCR call करके — लेकिन जल्दी ही पता चल गया कि यह रास्ता टूटेगा ही।


🚨 📌 What went wrong — Why Optical alone couldn’t do this?

1. AISAY requires sending file as “multipart/form-data” (binary stream)

Optical Flow का JS step:

  • lightweight है

  • कोई external npm module install नहीं कर सकता

  • binary streams properly handle नहीं करता

  • form-data, axios, node-fetch जैसे heavy utilities import नहीं कर सकते

  • File streaming + buffer manipulation मुश्किल है

लेकिन AISAY का OCR endpoint बिल्कुल वही मांग रहा था:

POST /v1/extract Content-Type: multipart/form-data file=<pdf-file> model_type=EXTRACT_SPECIFIC document_type=PASSPORT

👉 Directus Flow = JSON-oriented
👉 AISAY = binary/form-data API
👉 दोनों की दुनिया अलग थी।


2. File handling inside SaaS Optical = Limited & risky

Optical एक managed GovTech platform है:

  • Direct file I/O का control नहीं

  • No custom server-level code

  • No streaming libraries

  • No long-running processes

  • No large PDF handling

  • Memory limits inside Flow engine

Large passport PDFs → Flow crash → CMS service impact = possible risk.

GovCloud environment में stability primary concern है.
OCR processing को CMS के अंदर डालना unsafe माना गया।


3. External integration logic Optical के अंदर नहीं रखा जाता

GovStack architecture में best practice:

  • CMS = content + data layer

  • Business rules / integrations = external microservices

Optical के अंदर integration code डालना:

❌ tight-coupling
❌ system instability
❌ vendor lock-in
❌ difficult debugging
❌ API keys exposure
❌ not reusable for other projects

GovTech के IE2025 stack के हिसाब से OCR logic एक separate service होना चाहिए।


4. Sensitive secrets (OCR keys) को CMS में मत रखो

Optical में:

  • Flow variables visible रहते हैं

  • Admin accidentally देख सकता है

  • Audit trails नहीं बनते

  • No secure secret vault

OCR key highly sensitive थी → middleware में SSM / Env vars में secure रखना ज़्यादा safe था।


5. AISAY may require proxy / TLS configurations

कुछ Gov internal APIs केवल VPC / SG restricted networks में accessible होते हैं।

Optical SaaS का networking fixed है → custom proxy/agent use नहीं कर सकता।

Node.js middleware में:

  • custom proxy agent

  • retry logic

  • timeout handling

सब कुछ control था।


6. OCR logic should be reusable across agencies

आज BCA POC में use हो रहा था,
कल:

  • MCCY PRJ

  • SLA

  • MOM

  • ICA-PR flows

किसी भी system को same OCR चाहिए।

अगर logic Optical के अंदर अटक गया होता → हर project में copy-paste करना पड़ता।

Middleware = central reusable capability.


🧠 📌 Final architecture decision

हमने बनाया एक simple लेकिन powerful middleware:

[Frontend / Directus Flow] ↓ (PASSPORT_ASSET_ID) ↓ [Node.js OCR Middleware] ↓ - Download asset from Directus - Send to AISAY/KIE OCR - Parse AI response - Update worker profile in Directus ↓ [Optical CMS]

Middleware कोई भी client call कर सकता:

  • Directus Flow

  • Directus webhook

  • Admin UI button

  • External system

  • Next.js frontend

एक generic, robust, reusable layer बन गई.


🚀 📌 Why Node.js Middleware was the Perfect Choice

RequirementOptical FlowNode.js Middleware
Handle large PDFs
Multipart/form-data
External npm modules
Proxy agent
AES / Secure API keys
Error logginglimitedfull
Reusable across apps
Stable under loadriskysafe
Decoupled architectureperfect fit

Middleware clearly wins.


🔧 📌 Setup of the middleware app

Folder structure:

PASSPORT-OCR-MIDDLEWARE/ ├── app.js ├── package.json └── node_modules/

In app.js:

  1. Receive request from Directus

  2. Fetch passport asset from Directus storage URL

  3. Convert to buffer/stream

  4. POST file → AISAY OCR API

  5. Extract fields

  6. Update Optical CMS

  7. Return success response

Simple example:

app.post("/passport-ocr", async (req, res) => { const { worker_id, passport_asset_id } = req.body; // 1. Fetch asset from Directus const file = await fetchFileFromDirectus(passport_asset_id); // 2. Send to AISAY const extracted = await runAisayOCR(file); // 3. Update Directus await updateWorker(worker_id, extracted); res.json({ worker_id, extracted }); });

🔐 📌 Hosting the middleware for staging

Two valid options:

✔ Option 1: AWS Lambda + API Gateway

(best for PRJ/BCA/Gov projects)

Steps:

  1. Convert app.js → Lambda handler

  2. Deploy to Lambda

  3. Front it with API Gateway

  4. Add Directus token + AISAY key as env vars

  5. Hit it from Directus Flow

✔ Option 2: Docker container on Tooling VM

(staging setups में simple solution)

Steps:

  1. Dockerize Node app

  2. Run it on the same VPC as Optical

  3. Expose via Nginx reverse proxy

  4. Lock down access using SG rules

  5. Directus Flow से just hit https://passport-ocr.bcapoc.staging.optical.gov.sg

दोनों approaches GovTech-compliant और production-grade हैं।


📌 Conclusion — The Middleware Was the Correct Architectural Choice

Creating the OCR middleware was not a workaround —
it was the right architectural decision:

  • Security

  • Maintainability

  • Scalability

  • Reusability

  • Separation of concerns

  • GovCloud best practices

  • Stability of Optical CMS

Optical CMS को उसका काम करने दो (content + data).
OCR को उसका काम करने दो (AI extraction + heavy file logic).
Integration middleware दोनों को जोड़ देता है cleanly.

यह आज भी, और 2025 के GovTech stack में भी standard pattern है.

Comments