🧑💻Tech & AI Stack
Belto’s infrastructure is engineered for scalability, security, and performance, built entirely around delivering safe and structured AI in classrooms.
🧠 Core AI Engine
LLaMA.cpp + GGUF Models: Belto runs open-source large language models (LLMs) via
llama.cpp
, optimized for low-latency inference using quantized GGUF models.DeepSeek & LLaMA Support: We support both Meta’s LLaMA family and DeepSeek models depending on teachers selection (e.g., reasoning, generation or coding) to balance performance and token efficiency.
Modular Server Architecture: Our system supports the parallel deployment of multiple LLMs across different backends. Workloads are distributed dynamically based on load and model suitability.
Local Infrastructure: Hosted on-premises across 3 physical servers running 3× RTX 3060 and 1× RTX 4090, optimized for inference throughput.
Hybrid-Cloud Ready: In high-load scenarios or for redundancy, we spin up cloud instances on AWS and Azure, enabling seamless hybrid operation.
Token Efficiency: All generations are capped and throttled to comply with teacher-set token rules, ensuring performance and budget control.
🧩 System Architecture
Frontend: Built with Next.js, hosted on Vercel
Backend: FastAPI w/ async endpoints, exposed via secure API Gateway
Database: MongoDB Atlas for scalability, with schema enforcement per classroom
Webhook System: Stripe + internal services for payment, authentication, and AI usage logging
Last updated