gary.info

here be dragons

mlx-lm cheatsheet

mlx-cheatsheet.md

MLX Framework Cheatsheet

Overview

MLX is an array framework for machine learning on Apple silicon, designed by Apple machine learning research. It offers high performance, familiar APIs, and seamless integration with Apple's ecosystem.

Core Features

  • Familiar APIs: Python API based on NumPy, with C++, Swift interfaces
  • Composable function transformations: For automatic differentiation, vectorization, optimization
  • Lazy computation: Arrays only materialized when needed
  • Dynamic graph construction: No slow recompilations when shapes change
  • Unified memory model: Operations across devices without data copies
  • Installation

    bash
    

    Install MLX

    pip install mlx

    Install MLX-LM for language models

    pip install mlx-lm

    MLX Core Components

    Arrays and Basic Operations

    python
    import mlx.core as mx
    

    Create arrays

    a = mx.array([1, 2, 3]) b = mx.zeros((3, 3)) c = mx.ones((2, 4)) d = mx.random.normal((2, 2))

    Basic operations

    result = a + b result = mx.matmul(b, b)

    Evaluate lazily computed arrays

    mx.eval(result)

    Function Transformations

    python
    import mlx.core as mx
    

    Gradient computation

    def f(x): return mx.sum(x ** 2)

    grad_f = mx.grad(f) x = mx.array([1.0, 2.0, 3.0]) gradvalue = gradf(x) # [2.0, 4.0, 6.0]

    Vectorization

    def scalar_fn(x): return x ** 2

    vectorfn = mx.vmap(scalarfn) vector_fn(mx.array([1.0, 2.0, 3.0])) # [1.0, 4.0, 9.0]

    Combined transformations

    gradvectorfn = mx.grad(mx.vmap(scalar_fn))

    Compilation

    python
    import mlx.core as mx
    @mx.compile
    def optimized_fn(x):
        return mx.sum(x ** 2)
    

    With state tracking

    state = [mx.array(1.0)]

    @mx.compile(inputs=state, outputs=state) def stateful_fn(x): result = x + state[0] state[0] = result

    return result

    Neural Networks (mlx.nn)

    Building a Basic Neural Network

    python
    import mlx.core as mx
    import mlx.nn as nn
    class MLP(nn.Module):
        def init(self, indims, hiddendims, out_dims):
            super().init()
            self.layers = [
                nn.Linear(indims, hiddendims),
                nn.Linear(hiddendims, outdims)
            ]
        def call(self, x):
            for i, layer in enumerate(self.layers[:-1]):
                x = layer(x)
                x = mx.maximum(x, 0)  # ReLU activation
            return self.layers-1
    

    Create model

    model = MLP(10, 128, 1)

    Initialize parameters

    mx.eval(model.parameters())

    Access parameters

    params = model.parameters()

    Common Layers

    python
    

    Linear layer

    linear = nn.Linear(inputdim, outputdim)

    Convolutional layer

    conv = nn.Conv2d(inchannels, outchannels, kernel_size=3)

    Layer normalization

    norm = nn.LayerNorm(dim)

    Dropout (for training)

    dropout = nn.Dropout(p=0.5)

    Multi-head attention

    attention = nn.MultiHeadAttention(dim, num_heads)

    Loss Functions

    python
    import mlx.nn.losses as losses
    

    Common loss functions

    mseloss = losses.mseloss(predictions, targets) bceloss = losses.binarycross_entropy(predictions, targets)

    celoss = losses.crossentropy(predictions, targets)

    Optimizers (mlx.optimizers)

    python
    import mlx.optimizers as optim
    

    Create optimizer

    optimizer = optim.SGD(learning_rate=0.01)

    Or

    optimizer = optim.Adam(learning_rate=0.001, betas=(0.9, 0.999))

    Update model with gradients

    optimizer.update(model, gradients)

    Evaluate optimizer state and model parameters

    mx.eval(optimizer.state, model.parameters())

    Training Loop Pattern

    python
    import mlx.core as mx
    import mlx.nn as nn
    import mlx.optimizers as optim
    

    Create model

    model = MyModel() mx.eval(model.parameters())

    Define loss function

    def loss_fn(model, x, y): y_pred = model(x) return nn.losses.mseloss(ypred, y)

    Create gradient function and optimizer

    lossandgradfn = nn.valueandgrad(model, lossfn) optimizer = optim.Adam(learning_rate=0.001)

    Training loop

    for epoch in range(num_epochs): for xbatch, ybatch in data_loader: # Forward and backward pass loss, grads = lossandgradfn(model, xbatch, y_batch)

    # Update model parameters optimizer.update(model, grads)

    # Evaluate parameters and optimizer state

    mx.eval(model.parameters(), optimizer.state)

    MLX-LM Commands

    Model Generation

    bash
    

    Generate text with a model

    mlx_lm.generate --model mistralai/Mistral-7B-Instruct-v0.3 --prompt "hello"

    Stream text generation

    mlx_lm.generate --model mistralai/Mistral-7B-Instruct-v0.3 --prompt "hello" --stream

    Set generation parameters

    mlxlm.generate --model <modelname> --prompt "hello" --max-tokens 100 --temperature 0.7 --top-p 0.9

    Model Conversion

    bash
    

    Convert Hugging Face model to MLX format

    mlx_lm.convert --hf-path mistralai/Mistral-7B-Instruct-v0.3

    Convert and quantize to 4-bit

    mlx_lm.convert --hf-path mistralai/Mistral-7B-Instruct-v0.3 -q

    Convert, quantize, and upload to Hugging Face

    mlx_lm.convert --hf-path mistralai/Mistral-7B-Instruct-v0.3 -q --upload-repo <username>/<repo-name>

    Interactive Chat

    bash
    

    Start interactive chat with a model

    mlx_lm.chat --model mistralai/Mistral-7B-Instruct-v0.3

    Use a local model

    mlx_lm.chat --model ./path/to/local/model

    Fine-tuning with LoRA

    bash
    

    Basic LoRA fine-tuning

    mlxlm.lora --model mistralai/Mistral-7B-v0.1 --train --data ./mydata_folder

    Set specific parameters

    mlx_lm.lora \ --model mistralai/Mistral-7B-v0.1 \ --train \ --data ./mydatafolder \ --batch-size 1 \ --num-layers 4 \ --iters 500

    Use quantized model (QLoRA)

    mlxlm.lora --model <quantizedmodelpath> --train --data ./mydata_folder

    Test a fine-tuned model

    mlx_lm.lora \ --model <pathtomodel> \ --adapter-path <pathtoadapters> \ --data <pathtodata> \ --test

    Generate with a fine-tuned model

    mlx_lm.generate \ --model <pathtomodel> \ --adapter-path <pathtoadapters> \

    --prompt "<your_prompt>"

    Fusing Adapters

    bash
    

    Fuse LoRA adapters with the original model

    mlx_lm.fuse \ --model <pathtomodel> \ --adapter-path <pathtoadapters> \ --save-path <output_path>

    Fuse and upload to Hugging Face

    mlx_lm.fuse \ --model <pathtomodel> \ --adapter-path <pathtoadapters> \ --save-path <output_path> \ --upload-name <username>/<repo-name>

    Export to GGUF format

    mlx_lm.fuse \ --model <pathtomodel> \ --adapter-path <pathtoadapters> \

    --export-gguf

    Model Management

    bash
    

    Scan all locally cached models

    mlx_lm.manage --scan

    Delete specific models

    mlxlm.manage --delete --pattern <modelname_pattern>

    API Server

    bash
    

    Run OpenAI-compatible API server

    mlx_lm.server

    Interact with the server

    curl localhost:8080/v1/chat/completions -d '{ "model": "mlx-community/Llama-3.2-3B-Instruct-4bit", "maxcompletiontokens": 2000, "messages": [{"role": "user", "content": "Hello there"}]

    }'

    Swift MLX Integration

    swift
    // Add dependency in Package.swift
    dependencies: [
        .package(url: "https://github.com/ml-explore/mlx-swift", from: "0.10.0")
    ]
    // Import packages
    import MLX
    import MLXNN
    import MLXOptimizers
    import MLXRandom

    Resource Links

  • MLX Documentation
  • MLX GitHub Repository
  • MLX Examples Repository
  • MLX-LM Repository
  • MLX Community Models