🧑‍💻Tech & AI Stack

Belto’s infrastructure is engineered for scalability, security, and performance, built entirely around delivering safe and structured AI in classrooms.

🧠 Core AI Engine
  • LLaMA.cpp + GGUF Models: Belto runs open-source large language models (LLMs) via llama.cpp, optimized for low-latency inference using quantized GGUF models.

  • DeepSeek & LLaMA Support: We support both Meta’s LLaMA family and DeepSeek models depending on teachers selection (e.g., reasoning, generation or coding) to balance performance and token efficiency.

  • Modular Server Architecture: Our system supports the parallel deployment of multiple LLMs across different backends. Workloads are distributed dynamically based on load and model suitability.

  • Local Infrastructure: Hosted on-premises across 3 physical servers running 3× RTX 3060 and 1× RTX 4090, optimized for inference throughput.

  • Hybrid-Cloud Ready: In high-load scenarios or for redundancy, we spin up cloud instances on AWS and Azure, enabling seamless hybrid operation.

  • Token Efficiency: All generations are capped and throttled to comply with teacher-set token rules, ensuring performance and budget control.

🧩 System Architecture
  • Frontend: Built with Next.js, hosted on Vercel

  • Backend: FastAPI w/ async endpoints, exposed via secure API Gateway

  • Database: MongoDB Atlas for scalability, with schema enforcement per classroom

  • Webhook System: Stripe + internal services for payment, authentication, and AI usage logging

🏫 LMS & Classroom Integration
  • Supports Canvas, Blackboard, Moodle, Google Classroom, and others (via OAuth + LTI coming soon)

  • Teachers sync rosters, upload lecture files, and control AI permissions per class

  • Future: SIS integrations for district-wide scaling

Last updated