TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Best way to represent HTML for deep learning its structure and content?

2 点作者 legel超过 8 年前
This question recognizes that we can use deep learning techniques to learn high level semantic vector representations of raw data structures like natural language, images, and graphs. It seems that web pages are some combination of this, and more, and there should in principle be some way to develop a joint model that faithfully represents joint distributions across these types of structures found in raw HTML. This is an open ended question where any feedback and thoughts are helpful. The plan is to research and develop an open source model for representing the features of any web page (like high level activations of a CNN) based on unsupervised training with Common Crawl data.

暂无评论

暂无评论