Intelligent Middleware

Route each prompt to Perfect Expert.

l3mcore acts as the central brain between your users and Artificial Intelligence. Analyze what you need and redirect the conversation to the ideal model in milliseconds, whether in the cloud or on your own local servers.

Quick Start Characteristics

lemoe

Watch it in action

Here we see the console and Open WebUI. We are using 4 experts: 1 local ONNX model (Malbec), 1 on Ollama, and 2 API calls with Groq.

Why use l3mcore?

Designed for speed, privacy and maximum flexibility.

Smart Routing

100% local semantic decision engine. Understand the real context of each message instantly and select the right model without sending your data to the cloud.

Extreme Efficiency

Super optimized systems. In real stress tests with 15 experts available in the system, the kernel consumes only 1.5GB RAM.

Audited Security

Being from open source and auditable, we guarantee transparency. Prevents Path Traversal, SSRF and obfuscates sensitive logs automatically.

Multi-Backend

Connect local Ollama models, ultra-light inference in RAM (ONNX), Llama.cpp and external APIs (Groq, OpenAI) into a single central system.

Plugin System

Extend l3mcore capabilities to your needs. Discover, download and create custom modules in the Plugin Directory.

See all features

How magic works

A solid architecture that decides in milliseconds.

Frontend (UI)

"command to start nginx on port 80"

l3mcore Router

Vectorization E5 + Softmax (Score: 0.98)

External API (OpenAI Compatible)

Legal Expert / Copywriter

Local ONNX (T5)

DevOps Expert (malbec)

Local Ollama

Python programmer

Solving Real Problems

How l3mcore fits into your infrastructure.

AI switchboard

A single bot that routes customer questions to specialized models (legal, support, shipping) in milliseconds.

Zero Data Leak

It keeps your code and secrets on secure local servers, while pushing only trivial queries to the public cloud.

Smart Routing

Save thousands of dollars by submitting easy tasks to local free models and using premium APIs only when necessary.

Business Scale

For the user, there is only one "model". All the complexity of orchestrating 15 or 100 experts behind them is 100% invisible to them.

Explore Use Cases

Deployment Options

Open License. Ready to adapt to your Artificial Intelligence adoption level.

Community

Free / Self-hosted

Important notice: The current license exclusively allows non-commercial use of l3mcore.

Target audience: Solo developers, students, and very small startups (1-5 employees).

Internal use exclusively (Non-commercial)
Full source code on GitHub
Community support

Download Code

Frequently Asked Questions

We resolve typical doubts before you have them.

Do I need a powerful graphics card (GPU) to use ONNX? +

No. l3mcore's ONNXRunner is designed to run small model inference on CPU by loading them directly into system RAM. In fact, it is so optimized that it works perfectly on modest hardware.

Can I connect to Anthropic or Gemini instead of OpenAI? +

l3mcore speaks the universal dialect of OpenAI (/v1/chat/completions). You can connect third-party APIs without problem using proxies that translate the API (like LiteLLM) or directly use those that are already supported natively (like Groq, Together, etc.).

How many experts can I put in? +

Practically unlimited. The router compares mathematical vectors using cosine similarity ultra-fast. Having 50 or 100 experts will only add a few extra milliseconds to the decision phase, being imperceptible to the human user.

Can I use a custom routing model? +

Yes. Although by default l3mcore uses HuggingFace fast models like E5-small, you can configure your own model or routing algorithm on the backend to tailor the decision logic to your exact needs.

What happens if my server runs out of RAM? +

For local models (ONNX), l3mcore implements a system of LRU cache (Least Recently Used). You can limit, for example, that there are only 2 models loaded at a time. When the third party is called, l3mcore automatically evicts the model that has not been used the longest from memory.