[BUG] #107

kotran88 · 2023-05-30T05:05:41Z

🐛 Bug

기본 감정분류 예와
https://www.dinolabs.ai/271
이 예를 참고하여 검색하며 오류 수정하면서 하고있습니다..

colab으로 하고있습니다.

!pip install mxnet
!pip install gluonnlp pandas tqdm
!pip install sentencepiece==0.1.91
!pip install transformers==4.8.2
!pip install torch
!pip install gluonnlp==0.10.0

!pip install 'git+https://github.com/SKTBrain/KoBERT.git#egg=kobert_tokenizer&subdirectory=kobert_hf'

import torch
from torch import nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import gluonnlp as nlp
import numpy as np
from tqdm import tqdm, tqdm_notebook
from kobert_tokenizer import KoBERTTokenizer
from transformers import BertModel
from transformers import AdamW
from transformers.optimization import get_cosine_schedule_with_warmup
import torch
from torch import nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import gluonnlp as nlp
import numpy as np
from tqdm.notebook import tqdm
tokenizer = KoBERTTokenizer.from_pretrained('skt/kobert-base-v1')
bertmodel = BertModel.from_pretrained('skt/kobert-base-v1', return_dict=False)
vocab = nlp.vocab.BERTVocab.from_sentencepiece(tokenizer.vocab_file, padding_token='[PAD]')

def get_kobert_model(model_path, vocab_file, ctx="cpu"):
    bertmodel = BertModel.from_pretrained(model_path)
    device = torch.device(ctx)
    bertmodel.to(device)
    bertmodel.eval()
    vocab_b_obj = nlp.vocab.BERTVocab.from_sentencepiece(vocab_file,
                                                         padding_token='[PAD]')
    return bertmodel, vocab_b_obj
bertmodel, vocab = get_kobert_model('skt/kobert-base-v1',tokenizer.vocab_file)


from google.colab import drive
drive.mount('/content/drive')
import pandas as pd
chatbot_data = pd.read_excel('drive/MyDrive/korean.xlsx')

from sklearn.model_selection import train_test_split
dataset_train, dataset_test = train_test_split(data_list, test_size=0.25, random_state=0)

class BERTDataset(Dataset):
    def __init__(self, dataset, sent_idx, label_idx, bert_tokenizer, max_len,
                 pad, pair):
        transform = nlp.data.BERTSentenceTransform(
            bert_tokenizer, max_seq_length=max_len, pad=pad, pair=pair)
        self.sentences = [transform([i[sent_idx]]) for i in dataset]
        self.labels = [np.int32(i[label_idx]) for i in dataset]
        
    def __getitem__(self, i):
        return (self.sentences[i] + (self.labels[i], ))

    def __len__(self):
        return (len(self.labels))

tok = nlp.data.BERTSPTokenizer(tokenizer, vocab, lower=False)
data_train = BERTDataset(dataset_train, 0, 1, tok, max_len, True, False)
data_test = BERTDataset(dataset_test, 0, 1, tok, max_len, True, False)

마지막부분에서 에러가 발생합니다.


TypeError                                 Traceback (most recent call last)
[<ipython-input-60-1574bbdbfa0b>](https://localhost:8080/#) in <cell line: 2>()
      1 tok = nlp.data.BERTSPTokenizer(tokenizer, vocab, lower=False)
----> 2 data_train = BERTDataset(dataset_train, 0, 1, tok, max_len, True, False)
      3 data_test = BERTDataset(dataset_test, 0, 1, tok, max_len, True, False)

10 frames
[/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py](https://localhost:8080/#) in LoadFromFile(self, arg)
    308 
    309     def LoadFromFile(self, arg):
--> 310         return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
    311 
    312     def _EncodeAsIds(self, text, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece):

TypeError: not a string
## To Reproduce
<!-- 만약에 코드 샘플, 에러 메세지, 스택 트레이스 등이 있다면 이를 첨부해주세요-->

버그를 재현하기 위한 재현절차를 작성해주세요.

1. -
2. -
3. -

## Expected behavior
<!-- 버그가 발견되기 이전에 코드를 실행했을 경우에 어떤 결과를 예상했는지 작성해주세요.-->

## Environment
google colab tpu 

## Additional context
<!-- 추가적인 정보가 있다면 서술해주세요.-->

The text was updated successfully, but these errors were encountered:

ChangZero · 2023-08-27T11:23:30Z

@kotran88
아래 코드 참고해보시면 좋을거 같습니다.
https://github.com/ChangZero/koBERT-finetuning-demo/blob/main/kobert_colab.ipynb

Jhyunee · 2024-03-29T10:59:03Z

혹시 해결하셨나요? 저도 같은 에러를 못고치고 있어서,,ㅠㅠ

kotran88 added the bug Something isn't working label May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] #107

[BUG] #107

kotran88 commented May 30, 2023 •

edited

Loading

ChangZero commented Aug 27, 2023

Jhyunee commented Mar 29, 2024

[BUG] #107

[BUG] #107

Comments

kotran88 commented May 30, 2023 • edited Loading

🐛 Bug

ChangZero commented Aug 27, 2023

Jhyunee commented Mar 29, 2024

kotran88 commented May 30, 2023 •

edited

Loading