mlx-lm cheatsheet

Created: Apr 20, 2025 • Updated: Jun 23, 2025

#mlx #ai

mlx-cheatsheet.md

MLX Framework Cheatsheet

Overview

MLX is an array framework for machine learning on Apple silicon, designed by Apple machine learning research. It offers high performance, familiar APIs, and seamless integration with Apple's ecosystem.

Core Features

Familiar APIs: Python API based on NumPy, with C++, Swift interfaces
Composable function transformations: For automatic differentiation, vectorization, optimization
Lazy computation: Arrays only materialized when needed
Dynamic graph construction: No slow recompilations when shapes change
Unified memory model: Operations across devices without data copies

Installation

bash
Install MLX
pip install mlx
Install MLX-LM for language models
pip install mlx-lm

MLX Core Components

Arrays and Basic Operations

python
import mlx.core as mx
Create arrays
a = mx.array([1, 2, 3])
b = mx.zeros((3, 3))
c = mx.ones((2, 4))
d = mx.random.normal((2, 2))
Basic operations
result = a + b
result = mx.matmul(b, b)
Evaluate lazily computed arrays
mx.eval(result)

Function Transformations

python
import mlx.core as mx
Gradient computation
def f(x):
    return mx.sum(x ** 2)
grad_f = mx.grad(f)
x = mx.array([1.0, 2.0, 3.0])
gradvalue = gradf(x)  # [2.0, 4.0, 6.0]
Vectorization
def scalar_fn(x):
    return x ** 2
vectorfn = mx.vmap(scalarfn)
vector_fn(mx.array([1.0, 2.0, 3.0]))  # [1.0, 4.0, 9.0]
Combined transformations
gradvectorfn = mx.grad(mx.vmap(scalar_fn))

Compilation

python
import mlx.core as mx
@mx.compile
def optimized_fn(x):
    return mx.sum(x ** 2)
With state tracking
state = [mx.array(1.0)]
@mx.compile(inputs=state, outputs=state)
def stateful_fn(x):
    result = x + state[0]
    state[0] = result
    return result

Neural Networks (mlx.nn)

Building a Basic Neural Network

python
import mlx.core as mx
import mlx.nn as nn
class MLP(nn.Module):
    def init(self, indims, hiddendims, out_dims):
        super().init()
        self.layers = [
            nn.Linear(indims, hiddendims),
            nn.Linear(hiddendims, outdims)
        ]
    def call(self, x):
        for i, layer in enumerate(self.layers[:-1]):
            x = layer(x)
            x = mx.maximum(x, 0)  # ReLU activation
        return self.layers-1
Create model
model = MLP(10, 128, 1)
Initialize parameters
mx.eval(model.parameters())
Access parameters
params = model.parameters()

Common Layers

python
Linear layer
linear = nn.Linear(inputdim, outputdim)
Convolutional layer
conv = nn.Conv2d(inchannels, outchannels, kernel_size=3)
Layer normalization
norm = nn.LayerNorm(dim)
Dropout (for training)
dropout = nn.Dropout(p=0.5)
Multi-head attention
attention = nn.MultiHeadAttention(dim, num_heads)

Loss Functions

python
import mlx.nn.losses as losses
Common loss functions
mseloss = losses.mseloss(predictions, targets)
bceloss = losses.binarycross_entropy(predictions, targets)
celoss = losses.crossentropy(predictions, targets)

Optimizers (mlx.optimizers)

python
import mlx.optimizers as optim
Create optimizer
optimizer = optim.SGD(learning_rate=0.01)
Or
optimizer = optim.Adam(learning_rate=0.001, betas=(0.9, 0.999))
Update model with gradients
optimizer.update(model, gradients)
Evaluate optimizer state and model parameters
mx.eval(optimizer.state, model.parameters())

Training Loop Pattern

python
import mlx.core as mx
import mlx.nn as nn
import mlx.optimizers as optim
Create model
model = MyModel()
mx.eval(model.parameters())
Define loss function
def loss_fn(model, x, y):
    y_pred = model(x)
    return nn.losses.mseloss(ypred, y)
Create gradient function and optimizer
lossandgradfn = nn.valueandgrad(model, lossfn)
optimizer = optim.Adam(learning_rate=0.001)
Training loop
for epoch in range(num_epochs):
    for xbatch, ybatch in data_loader:
        # Forward and backward pass
        loss, grads = lossandgradfn(model, xbatch, y_batch)
        # Update model parameters
        optimizer.update(model, grads)
        # Evaluate parameters and optimizer state
        mx.eval(model.parameters(), optimizer.state)

MLX-LM Commands

Model Generation

bash
Generate text with a model
mlx_lm.generate --model mistralai/Mistral-7B-Instruct-v0.3 --prompt "hello"
Stream text generation
mlx_lm.generate --model mistralai/Mistral-7B-Instruct-v0.3 --prompt "hello" --stream
Set generation parameters
mlxlm.generate --model <modelname> --prompt "hello" --max-tokens 100 --temperature 0.7 --top-p 0.9

Model Conversion

bash
Convert Hugging Face model to MLX format
mlx_lm.convert --hf-path mistralai/Mistral-7B-Instruct-v0.3
Convert and quantize to 4-bit
mlx_lm.convert --hf-path mistralai/Mistral-7B-Instruct-v0.3 -q
Convert, quantize, and upload to Hugging Face
mlx_lm.convert --hf-path mistralai/Mistral-7B-Instruct-v0.3 -q --upload-repo <username>/<repo-name>

Interactive Chat

bash
Start interactive chat with a model
mlx_lm.chat --model mistralai/Mistral-7B-Instruct-v0.3
Use a local model
mlx_lm.chat --model ./path/to/local/model

Fine-tuning with LoRA

bash
Basic LoRA fine-tuning
mlxlm.lora --model mistralai/Mistral-7B-v0.1 --train --data ./mydata_folder
Set specific parameters
mlx_lm.lora \
  --model mistralai/Mistral-7B-v0.1 \
  --train \
  --data ./mydatafolder \
  --batch-size 1 \
  --num-layers 4 \
  --iters 500
Use quantized model (QLoRA)
mlxlm.lora --model <quantizedmodelpath> --train --data ./mydata_folder
Test a fine-tuned model
mlx_lm.lora \
  --model <pathtomodel> \
  --adapter-path <pathtoadapters> \
  --data <pathtodata> \
  --test
Generate with a fine-tuned model
mlx_lm.generate \
  --model <pathtomodel> \
  --adapter-path <pathtoadapters> \
  --prompt "<your_prompt>"

Fusing Adapters

bash
Fuse LoRA adapters with the original model
mlx_lm.fuse \
  --model <pathtomodel> \
  --adapter-path <pathtoadapters> \
  --save-path <output_path>
Fuse and upload to Hugging Face
mlx_lm.fuse \
  --model <pathtomodel> \
  --adapter-path <pathtoadapters> \
  --save-path <output_path> \
  --upload-name <username>/<repo-name>
Export to GGUF format
mlx_lm.fuse \
  --model <pathtomodel> \
  --adapter-path <pathtoadapters> \
  --export-gguf

Model Management

bash
Scan all locally cached models
mlx_lm.manage --scan
Delete specific models
mlxlm.manage --delete --pattern <modelname_pattern>

API Server

bash
Run OpenAI-compatible API server
mlx_lm.server
Interact with the server
curl localhost:8080/v1/chat/completions -d '{
  "model": "mlx-community/Llama-3.2-3B-Instruct-4bit", 
  "maxcompletiontokens": 2000, 
  "messages": [{"role": "user", "content": "Hello there"}]
}'

Swift MLX Integration

swift
// Add dependency in Package.swift
dependencies: [
    .package(url: "https://github.com/ml-explore/mlx-swift", from: "0.10.0")
]
// Import packages
import MLX
import MLXNN
import MLXOptimizers
import MLXRandom

Resource Links

MLX Documentation
MLX GitHub Repository
MLX Examples Repository
MLX-LM Repository
MLX Community Models