TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Data Science portfolio projects ideas

76 点作者 koots将近 6 年前
Hello,<p>I&#x27;m looking for a couple of midium-size project ideas that are not just following tutorials.

11 条评论

natalyarostova将近 6 年前
My two cents as someone who interviews tons of data scientists is that most portfolio projects are way too easy, and amount to getting generally clean data, then just calling some API from sklearn or tensorflow.<p>I&#x27;d like to see either more non trivial software&#x2F;coding skills in getting the data and setting up a good data infrastructure or more depth on a innovative science solution.
评论 #20076569 未加载
minimaxir将近 6 年前
My personal data science blog (<a href="https:&#x2F;&#x2F;minimaxir.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;minimaxir.com&#x2F;</a>) was designed over the last few years specifically as a portfolio piece to get a data science job by creating advanced data visualization&#x2F;analysis projects w&#x2F; code, and I was eventually successful. (it has done well on HN when it pops up too)
评论 #20081004 未加载
fundamental将近 6 年前
Consider making tutorials for publications that you&#x27;re interested in, but are nontrivial to read through and understand. Building a tutorial will give yourself a deeper understanding of a problem, help you communicate that understanding, and benefit the larger community.
评论 #20076556 未加载
avebear将近 6 年前
I’ve had fun and built interesting projects based on harvesting tweets. As some other comments suggest, data collection is an important and hard skill that most tutorial projects ignore. If you can show 0) you came up with an interesting question, 1) had the idea to get this data, 2) harvested the data successfully, 3) formatted and cleaned it, and 4) ran appropriate algorithms to look for the answer to your question, you have everything you need from a portfolio project (even if the data doesn’t support your original hypothesis!).
pella将近 6 年前
OpenStreetMap (GIS, OpenData, Humanitarian, Visualisation )<p>You can import - and analyse the OpenStreetMap data, and create some nice QA reports for the community.<p>Arxiv: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;search&#x2F;?query=openstreetmap&amp;searchtype=all&amp;source=header" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;search&#x2F;?query=openstreetmap&amp;searchtype=all...</a>
评论 #20102120 未加载
TBF-RnD将近 6 年前
Perhaps give me a hand researching text input, I&#x27;m starting to gather a rather large source of ideas to be implemented. Fun work really allows you to think outside of the box. Spans all the way from UI-design down to system calls into the os ia C. So there area lot of areas to cover. Let me know by commenting if you are interested.
评论 #20076542 未加载
tompazourek将近 6 年前
I heard that a good Kaggle profile is a data science equivalent to a good GitHub profile.<p>I haven’t tried it myself, and it looks more like smaller projects, but someone might find it interesting.<p><a href="https:&#x2F;&#x2F;www.kaggle.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.kaggle.com&#x2F;</a>
usgroup将近 6 年前
whats your background? And what sort of job are you looking to get from it?<p>If you want to be an actual scientist then do something thats actually scientific: elaborate an experiment design, collect your own data, analyse it and draw conclusions from it.<p>For example, what’s the relationship between crime in San Francisco and Starbucks locations? How’s the relationship conditioned on the weather? Does the size of the parking lot adjacent to Starbucks meaningfully effect crime independent of location?<p>I’m a little biased but there are too many script kiddies. “Scientists” that copy&#x2F;paste scripts and “analyse” by calling APIs, and don’t know how anything works. Data science ala Kaggle.
评论 #20102140 未加载
Jugurtha将近 6 年前
You want your portfolio to communicate that you are fluent in <i>data</i>, <i>resourceful</i>, and think broadly on what data <i>is</i>. More generally, you want to communicate that you are valuable to the employer.<p>What is valuable is often rare. Some skills are common or are just the baseline.<p>Peculiarities in people are less commodotized, and when these peculiarities intersect with the activity domain of an organization, they become valuable. When these peculiarities are deep enough and span across a broad range and the intersection of that range and the organization&#x27;s interests is quite large, they become extremely valuable.<p>These peculiarities are often a result of lifestyle, interests, musing, and wandering. Often acquired through the years on the person&#x27;s free time and are not taugh in class.<p>This reads like something new-agey like the saying that goes &quot;Instead of trying to paint a perfect picture, become perfect and just paint&quot;<p>Now for more practical and less &quot;general&quot; speak... I&#x27;ll have to bring personal anecdotes which, by definition, are about my specific experience. The pronoun &quot;I&quot; will be used too often for a regular post as a consequence. This serves as an example of what I mean by the above.<p>The first project I was involved with when I joined my current company as an Enginner was related to heart data. It was convenient that I had worked on heart data before, read a lot of medical papers on the question, worked on anomaly detection, was familiar with PhysioNet data and format but also had worked on <i>local</i> hospital data filled with chest-hair-sweat-and-motion noise and went through the challenges it represented. I could give pointers to good resources on the question to the team, knew health professionals and faculty I was still in contact with, and personal friends who are medical doctors and surgeons I could get insights from (thinking broadly about &quot;data&quot; not just as in digital format and CSV, but network, friends, domain experts, insights gleaned socializing).<p>Another project the company did was telecom subscribers churn prediction. I was invited to a brainstorming with the team discussing data and interesting features. One of them is standard of living and financial situation. I insisted on getting USSD data from the telecom company in addition to CRM data and surveys. When I was asked what it would tell us, I asked colleagues how frequently they checked their phone balances as employees (with a source of revenue) vs. how often they did as students. They all got the point: as students, it wasn&#x27;t obvious that you even had enough airtime to make a call or send a text, so you sent a USSD request (free of charge) to see how much airtime you had left (thinking about data from &quot;human moves&quot; perspective and not forgetting the experience of being a broke student for feature engineering). It helped the project that I had gone through some books on GSM and CDMA networks (out of curiosity) and was more fluent in the data the telco sent and their jargon. I could help the team with that, recommending reading sources curated over a long time, insights from personal acquaintances in different roles in the telecom domain (engineering, sales, marketing, etc.).<p>Another project the company did was on reservoir characterization project for oil and gas. It happened that I had interned for the biggest oil services company in that exact position, read several books on reservoir characterization. I also had exposure to the hardware, the process, the different players and their incentives and went to actual reservoir characterization jobs (it paid to know about oil based muds, boreholes, deviation, cuttings, etc.). It helped by sharing context with the team, knowing what to look for, who to ask and what, where to get data, what domain name was that. I also had friends working in that domain in different geogrpahic locations with different companies.<p>Another project I was in involved sound. My training was in EE so I had more training in signal processing than the team and also had courses on acoustics. I was able to help with pure signal processing and acoustics, resources to bring someone up to speed, explanations, etc. I had interest and knowledge in the source that was producing the sound. It helped in meetings with the client because the sound source was very <i>peculiar</i>. The client was impressed because they felt I knew more than an outsider should, given regulations and the nature of the source. I was able to handle it safely and use it very accurately to their surprise and to my employer&#x27;s because I had never talked about it. I also had access to people with <i>much more domain expertise than the client organization</i> giving extremely valuable insights on real world condition and more interesting and frequent access to more diverse data sources. When we had to build custom hardware and mics, it helped that I was comfortable with a soldering iron, too.<p>When we did a project for a retail organization, it helped that I already was primed because I had gone through their site, read their pages source, knew they were using schema.org ontology, knew how their site was structured, already parsed their sitemap, built a scraper for that site and did all that <i>before joining my employer</i>. Plus I had the code.<p>Another project in banking where I had also some experience because I got interested in earlier years to how they work, wrote some code for parsing transaction data, also had friends in different banks and financial institutions explaining things (again, data of another nature and from other sources).<p>Another project was related to data from Programmable Logic Controllers, and it helped that I had read a bunch on the question, tinkered with Siemens PLCs, etc (it also helped when one of our new hires is a student working on a project relating on communication protocols for PLC and finding out during the interview that there&#x27;s someone in the company who also was familiar, giving pointers, and adding value to his work. It helped make him work here).<p>Other anecdotes of visiting sites in Russian that were not translatable (images instead of text content), and being unfazed and able to sort of get around because I had tried learning Russian earlier. It wasn&#x27;t much, but it saved time and just the spirit of &quot;whatever it takes&quot; can be contagious. This was a startup and just the boost in morale or <i>anything</i> that removes or tames obstacles helps.<p>Serendipity at its finest.<p>And last but not least, and at the risk of being tacky: being able to communicate with people in writing, face to face, and on the phone is enormously helpful. Having a certain &quot;lifestyle&quot;, for lack of a better word, that kept that sword sharp, helped a lot. It also helped being in sales as a college student didn&#x27;t hurt.<p>The underlying message is: I think you can build a portfolio based on your interests and I think it helps to cultivate your interests. I think it&#x27;s nice to be able to work on a Kaggle dataset with clean data in CSV format and nicely labeled images, but it helps to think about data in more ways and keep in mind that it&#x27;s important to get things done and help others get them done, in any way you can. Data is much more than CSV files and annotated images. The questions to ask are:<p>- How often do you think you get that kind of data (clean, ready, nicely formatted, with client being responsive and supporting you)?<p>- In which ways can you bring more value to your employer by helping getting things done, often drawing on your previous experience, work, and code in a domain of interest?<p>- How can you act as a lever for other team members?<p>- How can you act as a bridge between stakeholders and do impedance matching to increase effectiveness of the whole <i>system</i>?<p>- How do you feel about &quot;business&quot; helps (basic econ, ops management, marketing, accounting, etc.)? It helps transduce features&#x2F;bug fixes&#x2F;refactoring to business terms stakeholders understand.<p>- How can you move obstacles as small as a boulder they can be?<p>Some things I have found useful:<p>- Maintain a network of interesting and smart people in different domains (physicians and physicists, chemists, poets, painters, musicians, engineers, teachers, bankers)<p>- Reading a lot about a lot.<p>- Implementing stuff. Getting HTTP 429 and knowing what to do about it. Experimenting. Documenting.<p>- Sharing.<p>- Helping others be better at what they do, do it better and more profitably. Connecting people and wanting them to succeed.<p>Now, if I see that a candidate can <i>hustle</i>, I&#x27;d be <i>very</i> interested. I can count on one finger such a candidate, and the kid was snatched faster than I could get to him (and was snatched by an acquaintance working at a top institution with a sorry-not-sorry)
评论 #20080561 未加载
评论 #20102172 未加载
评论 #20080800 未加载
gajju3588将近 6 年前
If you are interested in NLP, Entity detection&#x2F;Classification in news articles could be an interesting place to start.<p>Training Data : Wikipedia
评论 #20102152 未加载
edoceo将近 6 年前
I&#x27;ve got some semi-clean data that needs crunching, part of a soon to be GPL project, it&#x27;s actually pretty plain but can give you stuff to blog about and post on your GitHub. My handle at gmail.