Embedding Model Explorer

O'zbek tili uchun
embedding modellar paneli

Skip-gram, CBOW, GloVe va FastText — to'rtta model uchun interaktiv koordinata tekisligi, similarity grafiklari va projector-uslubidagi 2D/3D proyeksiyalar.

Model soni 4 Skip-gram · CBOW · GloVe · FastText
Korpus satrlari 835222 Kiruvchi xom satrlar
Saqlangan gaplar 485121 Normalizatsiya va deduplikatsiyadan keyin
Unikal tokenlar 334903 Tayyorlangan korpus bo'yicha
Model Cards

Har bir embedding uchun alohida ish sahifasi

Word2Vec SG

Skip-gram

So'z atrofidagi kontekstni chuqurroq ushlaydigan Word2Vec varianti.

Vocab
111067
Vector
300
Epoch
300
Soat
0.54
Word2Vec CBOW

CBOW

Markaziy so'zni kontekstdan tiklashga asoslangan tezkor embedding modeli.

Vocab
111067
Vector
300
Epoch
300
Soat
0.32
GloVe GV

GloVe

Global birga-uchrashuv statistikasi asosida o'qitilgan zich vektorlar.

Vocab
40000
Vector
300
Epoch
100
Soat
6.22
FastText FT

FastText

Subword belgilaridan foydalangan holda OOV so'zlarga ham moslashadigan model.

Vocab
111067
Vector
300
Epoch
150
Soat
0.58
Comparison

Modellarni taqqoslash

Vocab

Lug'at hajmi

Runtime

Training davomiyligi

Corpus Snapshot

Korpus tayyorlash statistikasi

input/Dataset (344 402).txt Kept: 330611 Duplicate: 8650 Short drop: 5137
input/Dataset (109 605).txt Kept: 106047 Duplicate: 313 Short drop: 3244
input/OAV (35000).txt Kept: 35891 Duplicate: 552 Short drop: 1
input/Rasmiy/Lex.uz (30000).txt Kept: 2643 Duplicate: 27361 Short drop: 6
input/Ilmiy/ta'lim (30000).txt Kept: 1939 Duplicate: 29882 Short drop: 148
input/Ilmiy/Adabiyot (15000 satr).txt Kept: 1545 Duplicate: 13542 Short drop: 0
input/Ekologiya (20000 satr).txt Kept: 1161 Duplicate: 18970 Short drop: 0
input/Ilmiy/Tibbiyot (20000).txt Kept: 1133 Duplicate: 18763 Short drop: 124
input/Internet manba/internet (30000).txt Kept: 688 Duplicate: 29047 Short drop: 297
input/Ta'lim (maktab)/Ta'lim dasrliklar (40000).txt Kept: 676 Duplicate: 39172 Short drop: 176