We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโll occasionally send you account related emails.
Already on GitHub? Sign in to your account
๊ธฐ๋ณธ ๊ฐ์ ๋ถ๋ฅ ์์ https://www.dinolabs.ai/271 ์ด ์๋ฅผ ์ฐธ๊ณ ํ์ฌ ๊ฒ์ํ๋ฉฐ ์ค๋ฅ ์์ ํ๋ฉด์ ํ๊ณ ์์ต๋๋ค..
colab์ผ๋ก ํ๊ณ ์์ต๋๋ค.
!pip install mxnet !pip install gluonnlp pandas tqdm !pip install sentencepiece==0.1.91 !pip install transformers==4.8.2 !pip install torch !pip install gluonnlp==0.10.0
!pip install 'git+https://github.com/SKTBrain/KoBERT.git#egg=kobert_tokenizer&subdirectory=kobert_hf'
import torch from torch import nn import torch.nn.functional as F import torch.optim as optim from torch.utils.data import Dataset, DataLoader import gluonnlp as nlp import numpy as np from tqdm import tqdm, tqdm_notebook from kobert_tokenizer import KoBERTTokenizer from transformers import BertModel from transformers import AdamW from transformers.optimization import get_cosine_schedule_with_warmup import torch from torch import nn import torch.nn.functional as F import torch.optim as optim from torch.utils.data import Dataset, DataLoader import gluonnlp as nlp import numpy as np from tqdm.notebook import tqdm tokenizer = KoBERTTokenizer.from_pretrained('skt/kobert-base-v1') bertmodel = BertModel.from_pretrained('skt/kobert-base-v1', return_dict=False) vocab = nlp.vocab.BERTVocab.from_sentencepiece(tokenizer.vocab_file, padding_token='[PAD]') def get_kobert_model(model_path, vocab_file, ctx="cpu"): bertmodel = BertModel.from_pretrained(model_path) device = torch.device(ctx) bertmodel.to(device) bertmodel.eval() vocab_b_obj = nlp.vocab.BERTVocab.from_sentencepiece(vocab_file, padding_token='[PAD]') return bertmodel, vocab_b_obj bertmodel, vocab = get_kobert_model('skt/kobert-base-v1',tokenizer.vocab_file)
from google.colab import drive drive.mount('/content/drive') import pandas as pd chatbot_data = pd.read_excel('drive/MyDrive/korean.xlsx') from sklearn.model_selection import train_test_split dataset_train, dataset_test = train_test_split(data_list, test_size=0.25, random_state=0)
class BERTDataset(Dataset): def __init__(self, dataset, sent_idx, label_idx, bert_tokenizer, max_len, pad, pair): transform = nlp.data.BERTSentenceTransform( bert_tokenizer, max_seq_length=max_len, pad=pad, pair=pair) self.sentences = [transform([i[sent_idx]]) for i in dataset] self.labels = [np.int32(i[label_idx]) for i in dataset] def __getitem__(self, i): return (self.sentences[i] + (self.labels[i], )) def __len__(self): return (len(self.labels)) tok = nlp.data.BERTSPTokenizer(tokenizer, vocab, lower=False) data_train = BERTDataset(dataset_train, 0, 1, tok, max_len, True, False) data_test = BERTDataset(dataset_test, 0, 1, tok, max_len, True, False)
๋ง์ง๋ง๋ถ๋ถ์์ ์๋ฌ๊ฐ ๋ฐ์ํฉ๋๋ค.
TypeError Traceback (most recent call last) [<ipython-input-60-1574bbdbfa0b>](https://localhost:8080/#) in <cell line: 2>() 1 tok = nlp.data.BERTSPTokenizer(tokenizer, vocab, lower=False) ----> 2 data_train = BERTDataset(dataset_train, 0, 1, tok, max_len, True, False) 3 data_test = BERTDataset(dataset_test, 0, 1, tok, max_len, True, False) 10 frames [/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py](https://localhost:8080/#) in LoadFromFile(self, arg) 308 309 def LoadFromFile(self, arg): --> 310 return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) 311 312 def _EncodeAsIds(self, text, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece): TypeError: not a string ## To Reproduce <!-- ๋ง์ฝ์ ์ฝ๋ ์ํ, ์๋ฌ ๋ฉ์ธ์ง, ์คํ ํธ๋ ์ด์ค ๋ฑ์ด ์๋ค๋ฉด ์ด๋ฅผ ์ฒจ๋ถํด์ฃผ์ธ์--> ๋ฒ๊ทธ๋ฅผ ์ฌํํ๊ธฐ ์ํ ์ฌํ์ ์ฐจ๋ฅผ ์์ฑํด์ฃผ์ธ์. 1. - 2. - 3. - ## Expected behavior <!-- ๋ฒ๊ทธ๊ฐ ๋ฐ๊ฒฌ๋๊ธฐ ์ด์ ์ ์ฝ๋๋ฅผ ์คํํ์ ๊ฒฝ์ฐ์ ์ด๋ค ๊ฒฐ๊ณผ๋ฅผ ์์ํ๋์ง ์์ฑํด์ฃผ์ธ์.--> ## Environment google colab tpu ## Additional context <!-- ์ถ๊ฐ์ ์ธ ์ ๋ณด๊ฐ ์๋ค๋ฉด ์์ ํด์ฃผ์ธ์.-->
The text was updated successfully, but these errors were encountered:
@kotran88 ์๋ ์ฝ๋ ์ฐธ๊ณ ํด๋ณด์๋ฉด ์ข์๊ฑฐ ๊ฐ์ต๋๋ค. https://github.com/ChangZero/koBERT-finetuning-demo/blob/main/kobert_colab.ipynb
Sorry, something went wrong.
ํน์ ํด๊ฒฐํ์ จ๋์? ์ ๋ ๊ฐ์ ์๋ฌ๋ฅผ ๋ชป๊ณ ์น๊ณ ์์ด์,,ใ ใ
No branches or pull requests
๐ Bug
๊ธฐ๋ณธ ๊ฐ์ ๋ถ๋ฅ ์์
https://www.dinolabs.ai/271
์ด ์๋ฅผ ์ฐธ๊ณ ํ์ฌ ๊ฒ์ํ๋ฉฐ ์ค๋ฅ ์์ ํ๋ฉด์ ํ๊ณ ์์ต๋๋ค..
colab์ผ๋ก ํ๊ณ ์์ต๋๋ค.
!pip install mxnet
!pip install gluonnlp pandas tqdm
!pip install sentencepiece==0.1.91
!pip install transformers==4.8.2
!pip install torch
!pip install gluonnlp==0.10.0
!pip install 'git+https://github.com/SKTBrain/KoBERT.git#egg=kobert_tokenizer&subdirectory=kobert_hf'
๋ง์ง๋ง๋ถ๋ถ์์ ์๋ฌ๊ฐ ๋ฐ์ํฉ๋๋ค.
The text was updated successfully, but these errors were encountered: