You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* feat: starting layout implementation
fix: namespace of common modules
chore: remove not needed test file
fix: op name being registered
chore: can compile the cuda kernel
fix: segmentation fault
chore: wip - paste test code just to check if everything passes
feat: wip - adding layout. unpack not working
fix: circular import
feat: wip - can almost revert
feat: can unpack. just needs cleanup
chore: improve layout code
chore: wip - mm needs work
feat: wip - something seems wrong
fix: e2e test
feat: wip - add group param
fix: unpack weights
feat: marlin is implemented and correct
chore: rebase
chore: remove old import
feat: use int4 instead of dequantizing
chore: remove unused fn
feat: add checks and validation
feat: add new kernel and refactor code (#1)
* feat: wip - adding new kernel
* feat: wip - continue working on the unpack
* feat: wip - working on unpacking
* feat: remove old op
* feat: more code changes
* chore: remove old code
* feat: more code
* chore: more code changes
* chore: more code changes
* feat: add more documentation
* fix: dataclass
* feat: add more docs
* feat: remove assert
chore: block 8 bits
chore: update comment
feat: refactor dispatch
chore: add validation on group size
chore: wip - working on fixing unpack
feat: add small readme with sources
feat: add checks
feat: tests pass & can execute llama2
* compile kind of working
* fix: batching and layout outputs correct results
* fix: torch.compile
* wip
* feat: wip
* chore: cleanup
* chore: review
* chore: review v2
* update benchmarks + README
---------
Co-authored-by: Jesse Cai <[email protected]>
Copy file name to clipboardExpand all lines: torchao/csrc/sparse_marlin.cpp
+1-1
Original file line number
Diff line number
Diff line change
@@ -5,4 +5,4 @@
5
5
TORCH_LIBRARY_FRAGMENT(torchao, m) {
6
6
m.impl_abstract_pystub("torchao.ops");
7
7
m.def("marlin_24_gemm(Tensor x, Tensor weight_marlin, Tensor meta, Tensor s, Tensor workspace, int bits, int size_m, int size_n, int size_k) -> Tensor");
0 commit comments