From 5e09af2acde21f232a6ed2ad2972c8f2269dcecf Mon Sep 17 00:00:00 2001 From: Gabriel Yang Date: Tue, 26 Sep 2023 02:24:45 +0900 Subject: [PATCH] =?UTF-8?q?=F0=9F=8C=90=20[i18n-KO]=20Translated=20=20`aud?= =?UTF-8?q?io=5Fclassification.mdx`=20to=20Korean=20(#26200)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * ๐ŸŒ [i18n-KO] Translated to Korean * update translation * fix some sentence editing and fixing punctuation * Update docs/source/ko/_toctree.yml Co-authored-by: Wonhyeong Seo * Apply suggestions from code review Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com> --------- Co-authored-by: Wonhyeong Seo Co-authored-by: Hyeonseo Yun <0525yhs@gmail.com> --- docs/source/ko/_toctree.yml | 8 +- docs/source/ko/tasks/audio_classification.md | 329 +++++++++++++++++++ 2 files changed, 333 insertions(+), 4 deletions(-) create mode 100644 docs/source/ko/tasks/audio_classification.md diff --git a/docs/source/ko/_toctree.yml b/docs/source/ko/_toctree.yml index 631446ff8b..bac1294ea6 100644 --- a/docs/source/ko/_toctree.yml +++ b/docs/source/ko/_toctree.yml @@ -49,11 +49,11 @@ title: ์ž์—ฐ์–ด์ฒ˜๋ฆฌ isExpanded: false - sections: - - local: in_translation - title: (๋ฒˆ์—ญ์ค‘) Audio classification + - local: tasks/audio_classification + title: ์˜ค๋””์˜ค ๋ถ„๋ฅ˜ - local: tasks/asr title: ์ž๋™ ์Œ์„ฑ ์ธ์‹ - title: (๋ฒˆ์—ญ์ค‘) ์˜ค๋””์˜ค + title: ์˜ค๋””์˜ค isExpanded: false - sections: - local: tasks/image_classification @@ -152,7 +152,7 @@ - local: contributing title: ๐Ÿค— Transformers์— ๊ธฐ์—ฌํ•˜๋Š” ๋ฐฉ๋ฒ• - local: add_new_model - title: ๐Ÿค— Transformers์— ์ƒˆ๋กœ์šด ๋ชจ๋ธ์„ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ• + title: ๐Ÿค— Transformers์— ์ƒˆ๋กœ์šด ๋ชจ๋ธ์„ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ• - local: add_tensorflow_model title: ์–ด๋–ป๊ฒŒ ๐Ÿค— Transformers ๋ชจ๋ธ์„ TensorFlow๋กœ ๋ณ€ํ™˜ํ•˜๋‚˜์š”? - local: add_new_pipeline diff --git a/docs/source/ko/tasks/audio_classification.md b/docs/source/ko/tasks/audio_classification.md new file mode 100644 index 0000000000..7e1094815f --- /dev/null +++ b/docs/source/ko/tasks/audio_classification.md @@ -0,0 +1,329 @@ + + +# ์˜ค๋””์˜ค ๋ถ„๋ฅ˜[[audio_classification]] + +[[open-in-colab]] + + + +์˜ค๋””์˜ค ๋ถ„๋ฅ˜๋Š” ํ…์ŠคํŠธ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์— ํด๋ž˜์Šค ๋ ˆ์ด๋ธ” ์ถœ๋ ฅ์„ ํ• ๋‹นํ•ฉ๋‹ˆ๋‹ค. ์œ ์ผํ•œ ์ฐจ์ด์ ์€ ํ…์ŠคํŠธ ์ž…๋ ฅ ๋Œ€์‹  ์›์‹œ ์˜ค๋””์˜ค ํŒŒํ˜•์ด ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์˜ค๋””์˜ค ๋ถ„๋ฅ˜์˜ ์‹ค์ œ ์ ์šฉ ๋ถ„์•ผ์—๋Š” ํ™”์ž์˜ ์˜๋„ ํŒŒ์•…, ์–ธ์–ด ๋ถ„๋ฅ˜, ์†Œ๋ฆฌ๋กœ ๋™๋ฌผ ์ข…์„ ์‹๋ณ„ํ•˜๋Š” ๊ฒƒ ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค. + +์ด ๋ฌธ์„œ์—์„œ ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค: + +1. [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ [Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base)๋กœ ๋ฏธ์„ธ ์กฐ์ •ํ•˜์—ฌ ํ™”์ž์˜ ์˜๋„๋ฅผ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค. +2. ์ถ”๋ก ์— ๋ฏธ์„ธ ์กฐ์ •๋œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์„ธ์š”. + + +์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ ์„ค๋ช…ํ•˜๋Š” ์ž‘์—…์€ ์•„๋ž˜์˜ ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜์—์„œ ์ง€์›๋ฉ๋‹ˆ๋‹ค: + + + +[Audio Spectrogram Transformer](../model_doc/audio-spectrogram-transformer), [Data2VecAudio](../model_doc/data2vec-audio), [Hubert](../model_doc/hubert), [SEW](../model_doc/sew), [SEW-D](../model_doc/sew-d), [UniSpeech](../model_doc/unispeech), [UniSpeechSat](../model_doc/unispeech-sat), [Wav2Vec2](../model_doc/wav2vec2), [Wav2Vec2-Conformer](../model_doc/wav2vec2-conformer), [WavLM](../model_doc/wavlm), [Whisper](../model_doc/whisper) + + + + + +์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ๋ชจ๋‘ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”: + +```bash +pip install transformers datasets evaluate +``` + +๋ชจ๋ธ์„ ์—…๋กœ๋“œํ•˜๊ณ  ์ปค๋ฎค๋‹ˆํ‹ฐ์™€ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ—ˆ๊น…ํŽ˜์ด์Šค ๊ณ„์ •์— ๋กœ๊ทธ์ธํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ๋ฉ”์‹œ์ง€๊ฐ€ ํ‘œ์‹œ๋˜๋ฉด ํ† ํฐ์„ ์ž…๋ ฅํ•˜์—ฌ ๋กœ๊ทธ์ธํ•ฉ๋‹ˆ๋‹ค: + +```py +>>> from huggingface_hub import notebook_login + +>>> notebook_login() +``` + +## MInDS-14 ๋ฐ์ดํ„ฐ์…‹ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ[[load_minds_14_dataset]] + +๋จผ์ € ๐Ÿค— Datasets ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ MinDS-14 ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค: + +```py +>>> from datasets import load_dataset, Audio + +>>> minds = load_dataset("PolyAI/minds14", name="en-US", split="train") +``` + +๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ `train` ๋ถ„ํ• ์„ [`~datasets.Dataset.train_test_split`] ๋ฉ”์†Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋” ์ž‘์€ ํ›ˆ๋ จ ๋ฐ ํ…Œ์ŠคํŠธ ์ง‘ํ•ฉ์œผ๋กœ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ์ „์ฒด ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋” ๋งŽ์€ ์‹œ๊ฐ„์„ ์†Œ๋น„ํ•˜๊ธฐ ์ „์— ๋ชจ๋“  ๊ฒƒ์ด ์ž‘๋™ํ•˜๋Š”์ง€ ์‹คํ—˜ํ•˜๊ณ  ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + +```py +>>> minds = minds.train_test_split(test_size=0.2) +``` + +์ด์ œ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์„ ์‚ดํŽด๋ณผ๊ฒŒ์š”: + +```py +>>> minds +DatasetDict({ + train: Dataset({ + features: ['path', 'audio', 'transcription', 'english_transcription', 'intent_class', 'lang_id'], + num_rows: 450 + }) + test: Dataset({ + features: ['path', 'audio', 'transcription', 'english_transcription', 'intent_class', 'lang_id'], + num_rows: 113 + }) +}) +``` + +๋ฐ์ดํ„ฐ ์„ธํŠธ์—๋Š” `lang_id` ๋ฐ `english_transcription`๊ณผ ๊ฐ™์€ ์œ ์šฉํ•œ ์ •๋ณด๊ฐ€ ๋งŽ์ด ํฌํ•จ๋˜์–ด ์žˆ์ง€๋งŒ ์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” `audio` ๋ฐ `intent_class`์— ์ค‘์ ์„ ๋‘˜ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์—ด์€ [`~datasets.Dataset.remove_columns`] ๋ฉ”์†Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค: + +```py +>>> minds = minds.remove_columns(["path", "transcription", "english_transcription", "lang_id"]) +``` + +์˜ˆ์‹œ๋ฅผ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค: + +```py +>>> minds["train"][0] +{'audio': {'array': array([ 0. , 0. , 0. , ..., -0.00048828, + -0.00024414, -0.00024414], dtype=float32), + 'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~APP_ERROR/602b9a5fbb1e6d0fbce91f52.wav', + 'sampling_rate': 8000}, + 'intent_class': 2} +``` + +๋‘ ๊ฐœ์˜ ํ•„๋“œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค: + +- `audio`: ์˜ค๋””์˜ค ํŒŒ์ผ์„ ๊ฐ€์ ธ์˜ค๊ณ  ๋ฆฌ์ƒ˜ํ”Œ๋งํ•˜๊ธฐ ์œ„ํ•ด ํ˜ธ์ถœํ•ด์•ผ ํ•˜๋Š” ์Œ์„ฑ ์‹ ํ˜ธ์˜ 1์ฐจ์› `๋ฐฐ์—ด`์ž…๋‹ˆ๋‹ค. +- `intent_class`: ํ™”์ž์˜ ์˜๋„์— ๋Œ€ํ•œ ํด๋ž˜์Šค ID๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. + +๋ชจ๋ธ์ด ๋ ˆ์ด๋ธ” ID์—์„œ ๋ ˆ์ด๋ธ” ์ด๋ฆ„์„ ์‰ฝ๊ฒŒ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋„๋ก ๋ ˆ์ด๋ธ” ์ด๋ฆ„์„ ์ •์ˆ˜๋กœ ๋งคํ•‘ํ•˜๋Š” ์‚ฌ์ „์„ ๋งŒ๋“ค๊ฑฐ๋‚˜ ๊ทธ ๋ฐ˜๋Œ€๋กœ ๋งคํ•‘ํ•˜๋Š” ์‚ฌ์ „์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค: + +```py +>>> labels = minds["train"].features["intent_class"].names +>>> label2id, id2label = dict(), dict() +>>> for i, label in enumerate(labels): +... label2id[label] = str(i) +... id2label[str(i)] = label +``` + +์ด์ œ ๋ ˆ์ด๋ธ” ID๋ฅผ ๋ ˆ์ด๋ธ” ์ด๋ฆ„์œผ๋กœ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: + +```py +>>> id2label[str(2)] +'app_error' +``` + +## ์ „์ฒ˜๋ฆฌ[[preprocess]] + +๋‹ค์Œ ๋‹จ๊ณ„๋Š” ์˜ค๋””์˜ค ์‹ ํ˜ธ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด Wav2Vec2 ํŠน์ง• ์ถ”์ถœ๊ธฐ๋ฅผ ๊ฐ€์ ธ์˜ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค: + +```py +>>> from transformers import AutoFeatureExtractor + +>>> feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base") +``` + +MinDS-14 ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ์ƒ˜ํ”Œ๋ง ์†๋„๋Š” 8000khz์ด๋ฏ€๋กœ(์ด ์ •๋ณด๋Š” [๋ฐ์ดํ„ฐ์„ธํŠธ ์นด๋“œ](https://huggingface.co/datasets/PolyAI/minds14)์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค), ์‚ฌ์ „ ํ›ˆ๋ จ๋œ Wav2Vec2 ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ 16000kHz๋กœ ๋ฆฌ์ƒ˜ํ”Œ๋งํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค: + +```py +>>> minds = minds.cast_column("audio", Audio(sampling_rate=16_000)) +>>> minds["train"][0] +{'audio': {'array': array([ 2.2098757e-05, 4.6582241e-05, -2.2803260e-05, ..., + -2.8419291e-04, -2.3305941e-04, -1.1425107e-04], dtype=float32), + 'path': '/root/.cache/huggingface/datasets/downloads/extracted/f14948e0e84be638dd7943ac36518a4cf3324e8b7aa331c5ab11541518e9368c/en-US~APP_ERROR/602b9a5fbb1e6d0fbce91f52.wav', + 'sampling_rate': 16000}, + 'intent_class': 2} +``` + +์ด์ œ ์ „์ฒ˜๋ฆฌ ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค: + +1. ๊ฐ€์ ธ์˜ฌ `์˜ค๋””์˜ค` ์—ด์„ ํ˜ธ์ถœํ•˜๊ณ  ํ•„์š”ํ•œ ๊ฒฝ์šฐ ์˜ค๋””์˜ค ํŒŒ์ผ์„ ๋ฆฌ์ƒ˜ํ”Œ๋งํ•ฉ๋‹ˆ๋‹ค. +2. ์˜ค๋””์˜ค ํŒŒ์ผ์˜ ์ƒ˜ํ”Œ๋ง ์†๋„๊ฐ€ ๋ชจ๋ธ์— ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ์˜ค๋””์˜ค ๋ฐ์ดํ„ฐ์˜ ์ƒ˜ํ”Œ๋ง ์†๋„์™€ ์ผ์น˜ํ•˜๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค. ์ด ์ •๋ณด๋Š” Wav2Vec2 [๋ชจ๋ธ ์นด๋“œ](https://huggingface.co/facebook/wav2vec2-base)์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +3. ๊ธด ์ž…๋ ฅ์ด ์ž˜๋ฆฌ์ง€ ์•Š๊ณ  ์ผ๊ด„ ์ฒ˜๋ฆฌ๋˜๋„๋ก ์ตœ๋Œ€ ์ž…๋ ฅ ๊ธธ์ด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. + +```py +>>> def preprocess_function(examples): +... audio_arrays = [x["array"] for x in examples["audio"]] +... inputs = feature_extractor( +... audio_arrays, sampling_rate=feature_extractor.sampling_rate, max_length=16000, truncation=True +... ) +... return inputs +``` + +์ „์ฒด ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ์ „์ฒ˜๋ฆฌ ๊ธฐ๋Šฅ์„ ์ ์šฉํ•˜๋ ค๋ฉด ๐Ÿค— Datasets [`~datasets.Dataset.map`] ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. `batched=True`๋ฅผ ์„ค์ •ํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์˜ ์—ฌ๋Ÿฌ ์š”์†Œ๋ฅผ ํ•œ ๋ฒˆ์— ์ฒ˜๋ฆฌํ•˜๋ฉด `map`์˜ ์†๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•„์š”ํ•˜์ง€ ์•Š์€ ์—ด์„ ์ œ๊ฑฐํ•˜๊ณ  `intent_class`์˜ ์ด๋ฆ„์„ ๋ชจ๋ธ์ด ์˜ˆ์ƒํ•˜๋Š” ์ด๋ฆ„์ธ `label`๋กœ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค: + +```py +>>> encoded_minds = minds.map(preprocess_function, remove_columns="audio", batched=True) +>>> encoded_minds = encoded_minds.rename_column("intent_class", "label") +``` + +## ํ‰๊ฐ€ํ•˜๊ธฐ[[evaluate]] + +ํ›ˆ๋ จ ์ค‘์— ๋ฉ”ํŠธ๋ฆญ์„ ํฌํ•จํ•˜๋ฉด ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. ๐Ÿค— [Evaluate](https://huggingface.co/docs/evaluate/index) ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•์„ ๋น ๋ฅด๊ฒŒ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์ž‘์—…์—์„œ๋Š” [accuracy(์ •ํ™•๋„)](https://huggingface.co/spaces/evaluate-metric/accuracy) ๋ฉ”ํŠธ๋ฆญ์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค(๋ฉ”ํŠธ๋ฆญ์„ ๊ฐ€์ ธ์˜ค๊ณ  ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๐Ÿค— Evalutate [๋น ๋ฅธ ๋‘˜๋Ÿฌ๋ณด๊ธฐ](https://huggingface.co/docs/evaluate/a_quick_tour) ์ฐธ์กฐํ•˜์„ธ์š”): + +```py +>>> import evaluate + +>>> accuracy = evaluate.load("accuracy") +``` + +๊ทธ๋Ÿฐ ๋‹ค์Œ ์˜ˆ์ธก๊ณผ ๋ ˆ์ด๋ธ”์„ [`~evaluate.EvaluationModule.compute`]์— ์ „๋‹ฌํ•˜์—ฌ ์ •ํ™•๋„๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค: + +```py +>>> import numpy as np + + +>>> def compute_metrics(eval_pred): +... predictions = np.argmax(eval_pred.predictions, axis=1) +... return accuracy.compute(predictions=predictions, references=eval_pred.label_ids) +``` + +์ด์ œ `compute_metrics` ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ์ค€๋น„๊ฐ€ ๋˜์—ˆ์œผ๋ฉฐ, ํŠธ๋ ˆ์ด๋‹์„ ์„ค์ •ํ•  ๋•Œ ์ด ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. + +## ํ›ˆ๋ จ[[train]] + + + + + +[`Trainer`]๋กœ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐ ์ต์ˆ™ํ•˜์ง€ ์•Š๋‹ค๋ฉด ๊ธฐ๋ณธ ํŠœํ† ๋ฆฌ์–ผ [์—ฌ๊ธฐ](../training#train-with-pytorch-trainer)์„ ์‚ดํŽด๋ณด์„ธ์š”! + + + +์ด์ œ ๋ชจ๋ธ ํ›ˆ๋ จ์„ ์‹œ์ž‘ํ•  ์ค€๋น„๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค! [`AutoModelForAudioClassification`]์„ ์ด์šฉํ•ด์„œ Wav2Vec2๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค. ์˜ˆ์ƒ๋˜๋Š” ๋ ˆ์ด๋ธ” ์ˆ˜์™€ ๋ ˆ์ด๋ธ” ๋งคํ•‘์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค: + +```py +>>> from transformers import AutoModelForAudioClassification, TrainingArguments, Trainer + +>>> num_labels = len(id2label) +>>> model = AutoModelForAudioClassification.from_pretrained( +... "facebook/wav2vec2-base", num_labels=num_labels, label2id=label2id, id2label=id2label +... ) +``` + +์ด์ œ ์„ธ ๋‹จ๊ณ„๋งŒ ๋‚จ์•˜์Šต๋‹ˆ๋‹ค: + +1. ํ›ˆ๋ จ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ [`TrainingArguments`]์— ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ์œ ์ผํ•œ ํ•„์ˆ˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ๋ชจ๋ธ์„ ์ €์žฅํ•  ์œ„์น˜๋ฅผ ์ง€์ •ํ•˜๋Š” `output_dir`์ž…๋‹ˆ๋‹ค. `push_to_hub = True`๋ฅผ ์„ค์ •ํ•˜์—ฌ ์ด ๋ชจ๋ธ์„ ํ—ˆ๋ธŒ๋กœ ํ‘ธ์‹œํ•ฉ๋‹ˆ๋‹ค(๋ชจ๋ธ์„ ์—…๋กœ๋“œํ•˜๋ ค๋ฉด ํ—ˆ๊น… ํŽ˜์ด์Šค์— ๋กœ๊ทธ์ธํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค). ๊ฐ ์—ํญ์ด ๋๋‚  ๋•Œ๋งˆ๋‹ค [`Trainer`]๊ฐ€ ์ •ํ™•๋„๋ฅผ ํ‰๊ฐ€ํ•˜๊ณ  ํ›ˆ๋ จ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. +2. ๋ชจ๋ธ, ๋ฐ์ดํ„ฐ ์„ธํŠธ, ํ† ํฌ๋‚˜์ด์ €, ๋ฐ์ดํ„ฐ ์ฝœ๋ ˆ์ดํ„ฐ, `compute_metrics` ํ•จ์ˆ˜์™€ ํ•จ๊ป˜ ํ›ˆ๋ จ ์ธ์ž๋ฅผ [`Trainer`]์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. +3. [`~Trainer.train`]์„ ํ˜ธ์ถœํ•˜์—ฌ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค. + + +```py +>>> training_args = TrainingArguments( +... output_dir="my_awesome_mind_model", +... evaluation_strategy="epoch", +... save_strategy="epoch", +... learning_rate=3e-5, +... per_device_train_batch_size=32, +... gradient_accumulation_steps=4, +... per_device_eval_batch_size=32, +... num_train_epochs=10, +... warmup_ratio=0.1, +... logging_steps=10, +... load_best_model_at_end=True, +... metric_for_best_model="accuracy", +... push_to_hub=True, +... ) + +>>> trainer = Trainer( +... model=model, +... args=training_args, +... train_dataset=encoded_minds["train"], +... eval_dataset=encoded_minds["test"], +... tokenizer=feature_extractor, +... compute_metrics=compute_metrics, +... ) + +>>> trainer.train() +``` + +ํ›ˆ๋ จ์ด ์™„๋ฃŒ๋˜๋ฉด ๋ชจ๋“  ์‚ฌ๋žŒ์ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก [`~transformers.Trainer.push_to_hub`] ๋ฉ”์†Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ—ˆ๋ธŒ์— ๊ณต์œ ํ•˜์„ธ์š”: + +```py +>>> trainer.push_to_hub() +``` + + + + + +For a more in-depth example of how to finetune a model for audio classification, take a look at the corresponding [PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/audio_classification.ipynb). + + + +## ์ถ”๋ก [[inference]] + +์ด์ œ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ–ˆ์œผ๋‹ˆ ์ถ”๋ก ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค! + +์ถ”๋ก ์„ ์‹คํ–‰ํ•  ์˜ค๋””์˜ค ํŒŒ์ผ์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค. ํ•„์š”ํ•œ ๊ฒฝ์šฐ ์˜ค๋””์˜ค ํŒŒ์ผ์˜ ์ƒ˜ํ”Œ๋ง ์†๋„๋ฅผ ๋ชจ๋ธ์˜ ์ƒ˜ํ”Œ๋ง ์†๋„์™€ ์ผ์น˜ํ•˜๋„๋ก ๋ฆฌ์ƒ˜ํ”Œ๋งํ•˜๋Š” ๊ฒƒ์„ ์žŠ์ง€ ๋งˆ์„ธ์š”! + +```py +>>> from datasets import load_dataset, Audio + +>>> dataset = load_dataset("PolyAI/minds14", name="en-US", split="train") +>>> dataset = dataset.cast_column("audio", Audio(sampling_rate=16000)) +>>> sampling_rate = dataset.features["audio"].sampling_rate +>>> audio_file = dataset[0]["audio"]["path"] +``` + +์ถ”๋ก ์„ ์œ„ํ•ด ๋ฏธ์„ธ ์กฐ์ •ํ•œ ๋ชจ๋ธ์„ ์‹œํ—˜ํ•ด ๋ณด๋Š” ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์€ [`pipeline`]์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์˜ค๋””์˜ค ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•œ `pipeline`์„ ์ธ์Šคํ„ด์Šคํ™”ํ•˜๊ณ  ์˜ค๋””์˜ค ํŒŒ์ผ์„ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค: + +```py +>>> from transformers import pipeline + +>>> classifier = pipeline("audio-classification", model="stevhliu/my_awesome_minds_model") +>>> classifier(audio_file) +[ + {'score': 0.09766869246959686, 'label': 'cash_deposit'}, + {'score': 0.07998877018690109, 'label': 'app_error'}, + {'score': 0.0781070664525032, 'label': 'joint_account'}, + {'score': 0.07667109370231628, 'label': 'pay_bill'}, + {'score': 0.0755252093076706, 'label': 'balance'} +] +``` + +์›ํ•˜๋Š” ๊ฒฝ์šฐ `pipeline`์˜ ๊ฒฐ๊ณผ๋ฅผ ์ˆ˜๋™์œผ๋กœ ๋ณต์ œํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค: + + + +ํŠน์ง• ์ถ”์ถœ๊ธฐ๋ฅผ ๊ฐ€์ ธ์™€์„œ ์˜ค๋””์˜ค ํŒŒ์ผ์„ ์ „์ฒ˜๋ฆฌํ•˜๊ณ  `์ž…๋ ฅ`์„ PyTorch ํ…์„œ๋กœ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค: + +```py +>>> from transformers import AutoFeatureExtractor + +>>> feature_extractor = AutoFeatureExtractor.from_pretrained("stevhliu/my_awesome_minds_model") +>>> inputs = feature_extractor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt") +``` + +๋ชจ๋ธ์— ์ž…๋ ฅ์„ ์ „๋‹ฌํ•˜๊ณ  ๋กœ์ง“์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค: + +```py +>>> from transformers import AutoModelForAudioClassification + +>>> model = AutoModelForAudioClassification.from_pretrained("stevhliu/my_awesome_minds_model") +>>> with torch.no_grad(): +... logits = model(**inputs).logits +``` + +ํ™•๋ฅ ์ด ๊ฐ€์žฅ ๋†’์€ ํด๋ž˜์Šค๋ฅผ ๊ฐ€์ ธ์˜จ ๋‹ค์Œ ๋ชจ๋ธ์˜ `id2label` ๋งคํ•‘์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฅผ ๋ ˆ์ด๋ธ”๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค: + +```py +>>> import torch + +>>> predicted_class_ids = torch.argmax(logits).item() +>>> predicted_label = model.config.id2label[predicted_class_ids] +>>> predicted_label +'cash_deposit' +``` + + \ No newline at end of file