瓜斯拉的逆袭

Recbole避坑手册

发表于 2022-03-28 分类于代码阅读
本文字数： 4.1k 阅读时长 ≈ 10 分钟

RecBole是个非常好的开源库，这几天做评测的时候用上了，奈何本人能力有限，遇到了非常多bug（可能是自己行为造成的），简单记录一下。可以参考这个：RecBole小白入门系列_Turinger_2000的博客-CSDN博客

使用方法就是：RUCAIBox/RecBole (github.com)，下载下来unzip或者clone到设备上。然后再RecBole主目录下编写一个test.yaml文件记录一些配置，再运行run_recbole.py就可以。test.yaml大概要设置4类东西：dateset setting, model setting, train setting, evaluate setting.

整个项目文件如下，几个比较重要的文件夹和文件标出来了，后面会说到。

阅读全文 »

论文笔记：《GAG：Global Attributed Graph Neural Network for Streaming Session-based Recommendation》

发表于 2022-03-13 更新于 2022-03-28 分类于论文笔记
本文字数： 7k 阅读时长 ≈ 18 分钟

原paper：GAG: Global Attributed Graph Neural Network for Streaming Session-Based Recommendation

源码解读：（近期发布）

中译：基于流会话推荐的全局属性图神经网络

总结：将SSRM的encoder部分换成了图神经网络模型，并且沿用了NARM、SRGNN等采用的注意力机制，将用户信息作为全局信息融入GNN模型中，解决了保存用户长期兴趣的问题；改进了reservior的采样策略：计算推荐结果和真实交互的Wasserstein距离作为信息量指标，从而计算采样概率，改进采样策略。

展望：如何引入跨会话信息到SSR问题中，十分值得研究。

阅读全文 »

论文笔记：《Streaming Session-based Recommendation》

发表于 2022-03-07 更新于 2022-03-28 分类于论文笔记
本文字数： 4.4k 阅读时长 ≈ 11 分钟

原paper：Streaming Session-based Recommendation | Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

源码解读：未开源

中译：流会话推荐

总结：第一篇结合了流推荐和会话推荐的论文（准备开坑）。主要解决两个问题，MF attention + GRU 解决用户行为的不确定性；存储技术+主动采样策略解决了更贴近实时场景的“高速、海量、连续的流数据”的需求。个人认为可以进一步做的地方：session encoder部分，用新模型；储存技术；采样技术。

阅读全文 »

（待更新）推荐系统：经典算法——协同过滤（Collebrative Filtering）

发表于 2021-11-24 分类于推荐系统实战
本文字数： 2.4k 阅读时长 ≈ 6 分钟

数据集

经典Movielens数据集

All ratings are contained in the file "ratings.dat" and are in the
following format:

UserID::MovieID::Rating::Timestamp

- UserIDs range between 1 and 6040 
- MovieIDs range between 1 and 3952
- Ratings are made on a 5-star scale (whole-star ratings only)
- Timestamp is represented in seconds since the epoch as returned by time(2)
- Each user has at least 20 ratings

实验设计

采用K-fold交叉验证，将用户行为数据均匀分成K份，其中一份作为测试集，K-1份作为训练集。协同过滤算法只考虑物品/用户的共现关系，所以用户序列都用集合表示。

# 划分数据集
def SplitData(data, M, k, seed):
    test = []
    train = []
    random.seed(seed)
    for user, item in data:
        if random.randint(0, M) == k:
            test.append([user, item])
        else:
            train.append([user, item])
    train_ = defaultdict(set)
    test_ = defaultdict(set)
    for user, item in train:
        train_[user].add(item)
    for user, item in test:
        test_[user].add(item)
    return train_, test_

评价指标

召回率Recall，准确率Precision，覆盖率Coverage，新颖度Popularity。

阅读全文 »

TiSASRec代码笔记

发表于 2021-11-22 分类于代码阅读
本文字数： 5.3k 阅读时长 ≈ 13 分钟

完整的代码注释：https://github.com/Guadzilla/Paper_notebook/tree/main/TiSASRec

论文笔记：https://guadzilla.github.io/2021/11/18/TiSASRec/

squeeze, unsqueeze, repeat ,expand

torch.squeeze(input,dim,*,out) —>Tensor

squeeze：挤压，捏

与unsqueeze操作相反，在指定dim处加入一维，如果dim未指定，则所有为1的维度去掉。

torch.unsqueeze(input,dim) —> Tensor

unsqueeze：挤压的反义词，膨胀

与squeeze操作相反，返回一个新张量，在原来张量的指定dim处加入一维。

阅读全文 »

论文笔记：《Time Interval Aware Self-Attention for Sequential Recommendation》

发表于 2021-11-18 更新于 2022-03-28 分类于论文笔记
本文字数： 5k 阅读时长 ≈ 13 分钟

原paper：https://dl.acm.org/doi/10.1145/3336191.3371786

源码解读：https://github.com/Guadzilla/Paper_notebook/tree/main/TiSASRec

中译：时间间隔感知的自注意力序列推荐

总结：是SASRec工作的延续，在self-attention的基础上加了绝对位置信息和相对时间间隔信息（加在Q和K里）取得了更好的performamce。发现Beauty数据集序列模式不明显。

阅读全文 »

Pytorch Geometric 学习笔记

发表于 2021-11-14 更新于 2022-03-28 分类于学习笔记
本文字数： 4.1k 阅读时长 ≈ 10 分钟

官网永远是最好的学习资料：https://pytorch-geometric.readthedocs.io/en/latest/

跟着配套colaboratory的教程走，大概一天能学完五个教程，学完也算基本入门pytroch-geometric了。

1. Introduction.ipynb - Colaboratory (google.com)

This concludes the first introduction into the world of GNNs and PyTorch Geometric. In the follow-up sessions, you will learn how to achieve state-of-the-art classification results on a number of real-world graph datasets.
概要：介绍图的基本结构，GCN怎么用。

2. Node Classification.ipynb - Colaboratory (google.com)

In this chapter, you have seen how to apply GNNs to real-world problems, and, in particular, how they can effectively be used for boosting a model’s performance. In the next section, we will look into how GNNs can be used for the task of graph classification.
概要：用GNN实现某些真实的节点分类任务，与MLP效果更好。

阅读全文 »

SASRec代码笔记

发表于 2021-11-06 更新于 2021-11-29 分类于代码阅读
本文字数： 8.6k 阅读时长 ≈ 22 分钟

完整的代码注释：https://github.com/Guadzilla/Paper_notebook/tree/main/SASRec

论文笔记：https://guadzilla.github.io/2021/11/03/SASRec/

collections.defaultdict(list)

1	class collections.defaultdict(default_factory=None, /[, ...])

返回一个新的类似字典的对象。defaultdict 是内置 dict类的子类。它重载了一个方法并添加了一个可写的实例变量。

本对象包含一个名为 default_factory 的属性，构造时，第一个参数用于为该属性提供初始值，默认为 None。所有其他参数（包括关键字参数）都相当于传递给 dict 的构造函数。

使用defulydict(list)实例化对象时， default_factory=list，可以很轻松地将（键-值对组成的）序列转换为（键-列表组成的）字典：

s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = defaultdict(list)
for k, v in s:
    d[k].append(v)

sorted(d.items())
# 输出：[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

当字典中没有的键第一次出现时，python自动为其返回一个空列表，list.append()会将值添加进新列表；再次遇到相同的键时，list.append()将其它值再添加进该列表。

阅读全文 »

论文笔记：《Self-Attentive Sequential Recommendation》

发表于 2021-11-03 更新于 2022-03-28 分类于论文笔记
本文字数： 2.4k 阅读时长 ≈ 6 分钟

原paper：https://ieeexplore.ieee.org/document/8594844

源码解读：https://github.com/Guadzilla/Paper_notebook/tree/main/SASRec

中译：自注意序列推荐

总结：比较早使用self-attention的序列推荐模型

阅读全文 »

论文笔记：《Graph-Enhanced Multi-Task Learning of Multi-Level Transition Dynamics for Session-based Recommendation》

发表于 2021-10-20 更新于 2022-03-28 分类于论文笔记
本文字数： 1.3k 阅读时长 ≈ 3 分钟

原paper：https://ojs.aaai.org/index.php/AAAI/article/view/16534

开源代码：https://github.com/sessionRec/MTD

动机：

大多数现有的基于会话的推荐技术并没有很好地设计来捕捉复杂的转换动态(complex transition dynamics)，这些动态表现为时间有序和多层次相互依赖的关系结构。

complex transition dynamics 的”complex”体现在：multi-level relation(intra- and inter-session item relation) . 会话内：short-term and long-term item transition，会话间：long-range cross-session dependencies。复杂依赖的例子见Figure 1.

阅读全文 »