Skip to content

Commit fe04e8b

Browse files
drbhNarsil
andauthored
Impl simple mamba model (#1480)
This draft PR is a work in progress implementation of the mamba model. This PR currently loads weights, and produces correct logits after a single pass. This PR still needs to correctly integrate this model so it produces tokens as expected, and apply optimization to avoid all copies during runtime/unnecessary operations. #### Helpful resources [Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Albert Gu and Tri Dao)](https://arxiv.org/abs/2312.00752) https://github.com/johnma2006/mamba-minimal https://github.com/huggingface/candle/blob/main/candle-examples/examples/mamba-minimal/model.rs huggingface/transformers#28094 Notes: this dev work is currently targeting `state-spaces/mamba-130m`, so if you want to test please use that model. Additionally when starting the router the prefill needs to be limited: `cargo run -- --max-batch-prefill-tokens 768 --max-input-length 768` ## Update / Current State Integration tests have been added and basic functionality such as model loading is supported. ```bash cd integration-tests pytest -vv models/test_fused_kernel_mamba.py ``` - [x] add tests - [x] load model - [x] make simple request - [ ] resolve warmup issue - [ ] resolve output issues fetching models tested during dev ```bash text-generation-server download-weights state-spaces/mamba-130m text-generation-server download-weights state-spaces/mamba-1.4b text-generation-server download-weights state-spaces/mamba-2.8b ``` The server can be run ```bash cd server MASTER_ADDR=127.0.0.1 MASTER_PORT=5555 python text_generation_server/cli.py serve state-spaces/mamba-2.8b ``` router ```bash cargo run ``` make a request ```bash curl -s localhost:3000/generate \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ -H 'Content-Type: application/json' | jq ``` response ```json { "generated_text": "\n\nDeep learning is a machine learning technique that uses a deep neural network to learn from data." } ``` --------- Co-authored-by: Nicolas Patry <[email protected]>
1 parent e97d7e8 commit fe04e8b

File tree

11 files changed

+1547
-1
lines changed

11 files changed

+1547
-1
lines changed

Dockerfile

+10
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,12 @@ COPY server/Makefile-vllm Makefile
154154
# Build specific version of vllm
155155
RUN make build-vllm-cuda
156156

157+
# Build mamba kernels
158+
FROM kernel-builder as mamba-builder
159+
WORKDIR /usr/src
160+
COPY server/Makefile-selective-scan Makefile
161+
RUN make build-all
162+
157163
# Build megablocks
158164
FROM kernel-builder as megablocks-builder
159165

@@ -205,6 +211,10 @@ COPY --from=eetq-kernels-builder /usr/src/eetq/build/lib.linux-x86_64-cpython-31
205211
# Copy builds artifacts from vllm builder
206212
COPY --from=vllm-builder /usr/src/vllm/build/lib.linux-x86_64-cpython-310 /opt/conda/lib/python3.10/site-packages
207213

214+
# Copy build artifacts from mamba builder
215+
COPY --from=mamba-builder /usr/src/mamba/build/lib.linux-x86_64-cpython-310/ /opt/conda/lib/python3.10/site-packages
216+
COPY --from=mamba-builder /usr/src/causal-conv1d/build/lib.linux-x86_64-cpython-310/ /opt/conda/lib/python3.10/site-packages
217+
208218
# Install flash-attention dependencies
209219
RUN pip install einops --no-cache-dir
210220

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
{
2+
"details": {
3+
"best_of_sequences": null,
4+
"finish_reason": "length",
5+
"generated_tokens": 10,
6+
"prefill": [],
7+
"seed": null,
8+
"tokens": [
9+
{
10+
"id": 187,
11+
"logprob": -0.3552246,
12+
"special": false,
13+
"text": "\n"
14+
},
15+
{
16+
"id": 187,
17+
"logprob": -0.38378906,
18+
"special": false,
19+
"text": "\n"
20+
},
21+
{
22+
"id": 30763,
23+
"logprob": -1.140625,
24+
"special": false,
25+
"text": "Deep"
26+
},
27+
{
28+
"id": 4715,
29+
"logprob": -0.5551758,
30+
"special": false,
31+
"text": " learning"
32+
},
33+
{
34+
"id": 310,
35+
"logprob": -0.59033203,
36+
"special": false,
37+
"text": " is"
38+
},
39+
{
40+
"id": 247,
41+
"logprob": -0.70654297,
42+
"special": false,
43+
"text": " a"
44+
},
45+
{
46+
"id": 747,
47+
"logprob": -2.0410156,
48+
"special": false,
49+
"text": " new"
50+
},
51+
{
52+
"id": 1511,
53+
"logprob": -2.3789062,
54+
"special": false,
55+
"text": " type"
56+
},
57+
{
58+
"id": 273,
59+
"logprob": -0.0026435852,
60+
"special": false,
61+
"text": " of"
62+
},
63+
{
64+
"id": 5145,
65+
"logprob": -1.2841797,
66+
"special": false,
67+
"text": " machine"
68+
}
69+
],
70+
"top_tokens": null
71+
},
72+
"generated_text": "\n\nDeep learning is a new type of machine"
73+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
{
2+
"details": {
3+
"best_of_sequences": null,
4+
"finish_reason": "length",
5+
"generated_tokens": 10,
6+
"prefill": [
7+
{
8+
"id": 2502,
9+
"logprob": null,
10+
"text": " red"
11+
},
12+
{
13+
"id": 13,
14+
"logprob": -2.5234375,
15+
"text": ","
16+
},
17+
{
18+
"id": 8862,
19+
"logprob": -3.4433594,
20+
"text": " yellow"
21+
},
22+
{
23+
"id": 13,
24+
"logprob": -0.43017578,
25+
"text": ","
26+
},
27+
{
28+
"id": 209,
29+
"logprob": -8.21875,
30+
"text": " "
31+
}
32+
],
33+
"seed": 0,
34+
"tokens": [
35+
{
36+
"id": 187,
37+
"logprob": 0.0,
38+
"special": false,
39+
"text": "\n"
40+
},
41+
{
42+
"id": 395,
43+
"logprob": -0.46411133,
44+
"special": false,
45+
"text": "and"
46+
},
47+
{
48+
"id": 13735,
49+
"logprob": -2.1132812,
50+
"special": false,
51+
"text": " orange"
52+
},
53+
{
54+
"id": 313,
55+
"logprob": -1.2128906,
56+
"special": false,
57+
"text": " ("
58+
},
59+
{
60+
"id": 249,
61+
"logprob": -2.3671875,
62+
"special": false,
63+
"text": "in"
64+
},
65+
{
66+
"id": 253,
67+
"logprob": 0.0,
68+
"special": false,
69+
"text": " the"
70+
},
71+
{
72+
"id": 1340,
73+
"logprob": -1.640625,
74+
"special": false,
75+
"text": " order"
76+
},
77+
{
78+
"id": 597,
79+
"logprob": -0.5488281,
80+
"special": false,
81+
"text": " they"
82+
},
83+
{
84+
"id": 3176,
85+
"logprob": -0.48608398,
86+
"special": false,
87+
"text": " appear"
88+
},
89+
{
90+
"id": 275,
91+
"logprob": 0.0,
92+
"special": false,
93+
"text": " in"
94+
}
95+
],
96+
"top_tokens": null
97+
},
98+
"generated_text": "blue, red, yellow, \nand orange (in the order they appear in"
99+
}

0 commit comments

Comments
 (0)