Utterance-level Aggregation For Speaker Recognition In The Wild笔记-白红宇

强烈建议你试试无所不能的chatGPT，快点击我

Utterance-level Aggregation For Speaker Recognition In The Wild笔记

阅读量：4146 次

发布时间：2019-05-25

本文共 340 字，大约阅读时间需要 1 分钟。

论文链接：

开源代码：

网络结构

在这里插入图片描述

输入：每帧257维向量，256维的频率量+1维的DC量

主干网络：Thin-ResNet，提取frame-level特征

NetVLAD或GhostVLAD层：将frame-level的特征转换成utterance-level特征。大多数算法是采用Average pooling层直接对帧维度进行平均，这样做的缺点是每帧的weight是一样的，但是实际上每帧对结果的contribution肯定是不一样的，比如有说话的帧肯定比没说话帧的contribution高，本文采用的方法其实是自动学习给予每帧不同的权重。

trainning loss:标准的softmax loss和additive margin softmax(AM-Softmax)

转载地址：http://lgjti.baihongyu.com/

你可能感兴趣的文章

android中SharedPreferences的简单例子

android中使用TextView来显示某个网址的内容，使用<ScrollView>来生成下拉列表框

andorid里关于wifi的分析

Spring MVC和Struts2的比较

Hibernate和IBatis对比

Spring MVC 教程,快速入门,深入分析

Android 的source (需安装 git repo)

Commit our mod to our own repo server

LOCAL_PRELINK_MODULE和prelink-linux-arm.map

Simple Guide to use the gdb tool in Android environment

Netconsole to capture the log

Build GingerBread on 32 bit machine.

How to make SD Card world wide writable

Detecting Memory Leaks in Kernel

Linux initial RAM disk (initrd) overview

Timestamping Linux kernel printk output in dmesg for fun and profit

There's Much More than Intel/AMD Inside

CentOS7 安装MySQL 5.6.43

使用Java 导入/导出 Excel ----Jakarta POI

本地tomcat 服务器内存不足

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！-- 愿君每日到此一游！

当前时间: 2024-09-23 04:33:51 当前IP: 3.142.171.253 联系邮箱:javaeecc@qq.com Copyright © 2020 - 2022 baihongyu.com 京ICP备2021015314号-2

强烈建议你试试无所不能的CHAT-GPT，快点击我

强烈建议你试试无所不能的CHAT-GPT，快点击我