Embedding Model Explorer
O'zbek tili uchun
O'zbek tili uchun
embedding modellar paneli
Skip-gram, CBOW, GloVe va FastText — to'rtta model uchun interaktiv koordinata tekisligi, similarity grafiklari va projector-uslubidagi 2D/3D proyeksiyalar.
Model soni
4
Skip-gram · CBOW · GloVe · FastText
Korpus satrlari
835222
Kiruvchi xom satrlar
Saqlangan gaplar
485121
Normalizatsiya va deduplikatsiyadan keyin
Unikal tokenlar
334903
Tayyorlangan korpus bo'yicha
Model Cards
Har bir embedding uchun alohida ish sahifasi
Word2Vec
SG
Skip-gram
So'z atrofidagi kontekstni chuqurroq ushlaydigan Word2Vec varianti.
- Vocab
- 111067
- Vector
- 300
- Epoch
- 300
- Soat
- 0.54
Word2Vec
CBOW
CBOW
Markaziy so'zni kontekstdan tiklashga asoslangan tezkor embedding modeli.
- Vocab
- 111067
- Vector
- 300
- Epoch
- 300
- Soat
- 0.32
GloVe
GV
GloVe
Global birga-uchrashuv statistikasi asosida o'qitilgan zich vektorlar.
- Vocab
- 40000
- Vector
- 300
- Epoch
- 100
- Soat
- 6.22
Comparison
Modellarni taqqoslash
Vocab
Lug'at hajmi
Runtime
Training davomiyligi
Corpus Snapshot
Korpus tayyorlash statistikasi
input/Dataset (344 402).txt
Kept: 330611
Duplicate: 8650
Short drop: 5137
input/Dataset (109 605).txt
Kept: 106047
Duplicate: 313
Short drop: 3244
input/OAV (35000).txt
Kept: 35891
Duplicate: 552
Short drop: 1
input/Rasmiy/Lex.uz (30000).txt
Kept: 2643
Duplicate: 27361
Short drop: 6
input/Ilmiy/ta'lim (30000).txt
Kept: 1939
Duplicate: 29882
Short drop: 148
input/Ilmiy/Adabiyot (15000 satr).txt
Kept: 1545
Duplicate: 13542
Short drop: 0
input/Ekologiya (20000 satr).txt
Kept: 1161
Duplicate: 18970
Short drop: 0
input/Ilmiy/Tibbiyot (20000).txt
Kept: 1133
Duplicate: 18763
Short drop: 124
input/Internet manba/internet (30000).txt
Kept: 688
Duplicate: 29047
Short drop: 297
input/Ta'lim (maktab)/Ta'lim dasrliklar (40000).txt
Kept: 676
Duplicate: 39172
Short drop: 176