TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: How do you handle large Python projects?

69 点作者 nthompson将近 9 年前
In C++, I follow a playbook for keeping all hell from breaking loose:<p>1) Write a googletest 2) Write a googlebenchmark 3) Run all unit tests under AddressSanitizer, ThreadSanitizer, and g++ UB sanitizer 4) Tidy up with clang-format 5) Run cppcheck<p>So I feel pretty confident I&#x27;m not doing something braindead if I can get this stuff through CI.<p>But for Python, I don&#x27;t really have good idea when I&#x27;m doing something that&#x27;ll cause me agonizing pain in the future. The only tool I use is flake8, which is awesome, but I can&#x27;t see memory leaks or performance profiles.<p>What strategies do you adopt (and what tools do you use) to keep all hell from breaking loose in large Python projects?

19 条评论

justinsaccount将近 9 年前
Personally I focus on not creating large projects. I make as many small individually tested libraries as I can and assemble them in smaller projects.<p>Not exactly the &#x27;microservices&#x27; approach, but similar ideas.<p>One of the most useful things related to this is to focus on interface design. It&#x27;s easy to scrap a bad implementation and re-implement the same interface later, it&#x27;s harder to fix a bad interface that you&#x27;re using all over the place. Making some implementations pluggable up-front will also make it easier to swap things out later.<p>Another thing that ended up causing the most pain in the long run was building in too much functionality directly instead of leaving things up to plugins. Plugins can more easily be enabled or disabled to compose specific functionality. The alternative is tons of code and tons of configuration options to handle every little corner case.<p>As far as lower level tools, the &#x27;coverage&#x27; tool integrated with your test suite is a must have.
评论 #11793111 未加载
评论 #11793418 未加载
YZF将近 9 年前
I worked on a million+ line Python project, here are some observations:<p>* No duck typing<p>* Document all input&#x2F;output parameters to functions.<p>* Avoid fancy meta-progamming.<p>* Try to break it up into smaller pieces with well defined interfaces that are ideally not pythonic. Think about the interfaces as something you&#x27;d potentially want other programming languages to be able to talk to.<p>* Don&#x27;t use eval&#x2F;exec or more generally don&#x27;t pass Python code around.<p>In general as the scope of a module&#x2F;unit gets larger you want to stick with simpler stuff. If the module&#x2F;unit is small, the interface is small, and there are good tests around it, you can do anything you want inside that smaller bit.<p>[EDIT: These comments are mostly Python specific. On top of that you&#x27;d apply what you would in any other language, organize your code properly, consistent style throughout which in Python includes following the relevant standards PEP8, doing code reviews, tests etc. etc.]
评论 #11793022 未加载
dagss将近 9 年前
I work on a 100 KLOC Python project and have written a fair bit of C++ too.<p>We must have different definitions of &quot;all hell from breaking loose&quot;, since none of those tools would help me avoid that in C++ at all. Hell has broken loose when you have an unmaintainable ball of mud, which IMO has little to do with what those tools help with.<p>But for those tools:<p>Many of them you just don&#x27;t need in Python due to language differences. You need to make sure all branches of your code are somehow tested, but memory management is easier with (mandatory) reference counting, you don&#x27;t have pointers that are not valid, and so on.<p>As for performance profiles, well, since you&#x27;re thinking of using Python, you already decided IO is your bottleneck and not CPU. The moment you find yourself thinking &quot;I should profile this code and speed it up&quot; is the moment you consider using another language for the job.
评论 #11793230 未加载
sametmax将近 9 年前
First, large Python projects are much more manageable than C++ projects, even without any tools. It&#x27;s way easier to debug, much less verbose, you have 100 less possible errors while stuff are easy to refactor.<p>Starting from here, unit tests will take you a long way. Tox + pytest + coverage.py is the defactor standard for tests now, and will give you peace of mind when editing your code. Tox can run flake8 as well so it&#x27;s often done.<p>After that everything is a luxery. You can use mypy to get static typing, you can make sure to have a very good editor checking stuff for you such as PyCharm or Sublime Text + anaconda. You can use CI with something like Travis or buildbot.<p>I usually make sure to have a .editorconfig file and a clear style convention to easy team work. And I like to use sphinx to write the doc of the project, which you really, really need to do. This include docstring for modules, classes and functions (with Google style for me), comments, but also some manual rst files.<p>Last, but not least, experience matters a lot. You learn how to organize stuff in your dir tree. I like to split any file bigger than 500 lignes in Python because it&#x27;s such an expressive language. Having one module for exceptions. Having proper unicode handling from the start. Etc.
评论 #11793041 未加载
econner将近 9 年前
The Google Python style guide might be a good start: <a href="https:&#x2F;&#x2F;google.github.io&#x2F;styleguide&#x2F;pyguide.html" rel="nofollow">https:&#x2F;&#x2F;google.github.io&#x2F;styleguide&#x2F;pyguide.html</a><p>I&#x27;ve also heard that at Google they require assert isinstance for every parameter of every function.<p>New Relic has very good tools for profiling python code if you&#x27;re running a service.
评论 #11792751 未加载
metakermit将近 9 年前
I guess it&#x27;s worth taking a look at some more complex open source Python projects. I think pandas [1] is a pretty good example, with a relatively large amount of Python and Cython code.<p>They start off with a pretty decent amount of unit tests (84% coverage) and make sure it&#x27;s visible to developers using:<p>- Travis [2] (has to pass on pull requests too before contributions are accepted)<p>- Coverage [3]<p>There&#x27;s also Code Climate [4] for some more introspection.<p>[1]: <a href="https:&#x2F;&#x2F;github.com&#x2F;pydata&#x2F;pandas" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;pydata&#x2F;pandas</a><p>[2]: <a href="https:&#x2F;&#x2F;travis-ci.org" rel="nofollow">https:&#x2F;&#x2F;travis-ci.org</a><p>[3]: <a href="https:&#x2F;&#x2F;codecov.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;codecov.io&#x2F;</a><p>[4]: <a href="https:&#x2F;&#x2F;codeclimate.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;codeclimate.com&#x2F;</a>
ludwigvan将近 9 年前
In the words of Robert Love, &quot;Man, I cannot imagine writing let alone maintaining a large software stack in Python.&quot;[0]<p>Unfortunately, it is very hard and brittle.<p>You definitely need rigorous testing to keep it all in one place, but I would steer away from Python for a large project.<p>Also take a look at mypy or some other sort of typing to see if it can aid in a large project. [1]<p>[0] <a href="https:&#x2F;&#x2F;www.quora.com&#x2F;Why-does-Google-prefer-the-Java-stack-for-its-products-instead-of-Python" rel="nofollow">https:&#x2F;&#x2F;www.quora.com&#x2F;Why-does-Google-prefer-the-Java-stack-...</a><p>[1] <a href="http:&#x2F;&#x2F;code.tutsplus.com&#x2F;tutorials&#x2F;python-3-type-hints-and-static-analysis--cms-25731" rel="nofollow">http:&#x2F;&#x2F;code.tutsplus.com&#x2F;tutorials&#x2F;python-3-type-hints-and-s...</a>
sitkack将近 9 年前
1) don&#x27;t use objects, use bare methods and don&#x27;t mutate input data<p>2) `from collections import namedtuple` reinforces 1<p>3) nice lighweight tests, py.test or nose<p>4) integration tests so you can actually refactor w&#x2F;o having to recode 100 unit tests<p>5) write as little code as possible
评论 #11792845 未加载
评论 #11792985 未加载
评论 #11792968 未加载
评论 #11792987 未加载
INTPenis将近 9 年前
I&#x27;ve only ever taken over a large Python project, never written it from scratch, and I don&#x27;t have many years under my belt as a python coder.<p>But my $.02: Documentation, python code itself is easy to read but if you have a large project, broken into many small code bases for libraries, services and front ends then you need solid documentation. Not only text but also diagrams showing how all the parts work together. Not sure if they&#x27;re called diagrams in english but it&#x27;s the stuff you make in MS Visio or Draw.io.
评论 #11793291 未加载
hacker_9将近 9 年前
I&#x27;ve recently written a Blender addon using Python that came to just over 1000 LOC. Not big at all and yet already I was running into problems that are just non existent in static languages. I couldn&#x27;t refactor variable names and had to manually replace the text which was very error prone, I had no help from the language about whether I had referenced a variable by the right name as it would just create a new one, no &#x27;goto defintion&#x27; and instead I was reduced to scrolling or ctrl+F, no braces to tell me where scope starts and ends (which you wouldn&#x27;t think mattered, but only relying on indentation actually gets quite messy), and no contextual knowledge of the blender APIs unless I knew what to Google and it often came down to someone asking the same question on stackexchange.<p>The only way I could manage it was to write tiny functions, so I could literally eyeball the scope and keep all the details in my short term memory. I would not recommend using this language for larger projects.
评论 #11793721 未加载
评论 #11794091 未加载
评论 #11793787 未加载
kmike84将近 9 年前
* Write tests and run then on CI. Try to figure out how to write tests with a least amount of pain - you&#x27;ll need lots of them. Use py.test or a similar framework. Doctests are great, but it takes some time to learn how to use them efficiently and when they&#x27;re not appropriate.<p>* Measure test coverage to be aware of what is not tested, but don&#x27;t just pursue exact coverage % number - doing that leads to many integration tests and a few unit tests. Both kind of tests is important.<p>* Extract libraries from the main code, to make the main project smaller; write docs and tests for these libraries. Docs are important for these libraries. Try hard to maintain boundaries - a library should have a single purpose, and it shouldn&#x27;t be tied to the rest of the code. If you find writing docs complicated them maybe the library does too much, or maybe its API is too hard to use. Fix that.<p>* Don&#x27;t write all code yourselves, consider using open-source libraries. But don&#x27;t use open-source libraries if you&#x27;re not comfortable with contributing to them - there will be issues (like in any code). If the library you&#x27;re going to use is not an industry standard read its source; if it is &quot;ah, yeah, this is almost how I&#x27;d written that&quot; use it, try to find another library or write your own otherwise.<p>I&#x27;d say the trick to handle large Python projects is to resist making them large. Don&#x27;t be sloppy in code organization, be pedantic about which part &quot;knows&quot; about which part, extract non-specific utilities to libraries. Often projects can be kept under 20-50K lines of code after a few years of development by a small team if a team tries to maintain code quality and moves non-specific features to external libraries.<p>flake8 and alike linters may help with consistency; it is important, but not the main problems by far. The main problem to fight is non-locality: if one can reason about a piece of code just by looking at it, without checking lots of other components, the overall project size doesn&#x27;t matter much.
devnonymous将近 9 年前
* write unit tests (I find nosetests or pytest as the test runners most useful and mock incredibly helpful).<p>* run unit tests and integration tests in an automated manner for every commit. (a.k.a use tox, jenkins or somesuch...).<p>* Depending on the software you are creating deploy a chaos monkey[1] kinda approach for disaster&#x2F;HA testing.<p>* Read up on good Python practices :<p><a href="http:&#x2F;&#x2F;python-guide.readthedocs.io&#x2F;en&#x2F;latest&#x2F;" rel="nofollow">http:&#x2F;&#x2F;python-guide.readthedocs.io&#x2F;en&#x2F;latest&#x2F;</a><p><a href="http:&#x2F;&#x2F;python.net&#x2F;~goodger&#x2F;projects&#x2F;pycon&#x2F;2007&#x2F;idiomatic&#x2F;handout.html" rel="nofollow">http:&#x2F;&#x2F;python.net&#x2F;~goodger&#x2F;projects&#x2F;pycon&#x2F;2007&#x2F;idiomatic&#x2F;han...</a>
shuzchen将近 9 年前
It seems you&#x27;re looking for tools to manage code quality. In that case I highly recommend prospector (<a href="http:&#x2F;&#x2F;prospector.landscape.io&#x2F;en&#x2F;master&#x2F;" rel="nofollow">http:&#x2F;&#x2F;prospector.landscape.io&#x2F;en&#x2F;master&#x2F;</a>). It wraps a whole bunch of tools in one interface. I suggest paying close attention to any code that scores high on McCabe code complexity.
评论 #11793157 未加载
评论 #11794048 未加载
sciurus将近 9 年前
You can find an overview of some tools in <a href="https:&#x2F;&#x2F;www.slideshare.net&#x2F;mobile&#x2F;jamdatadude&#x2F;python-static-analysis-tools" rel="nofollow">https:&#x2F;&#x2F;www.slideshare.net&#x2F;mobile&#x2F;jamdatadude&#x2F;python-static-...</a>
syngrog66将近 9 年前
doing mental analysis of the code you write goes a long way. know the impact of each change. know your language. have a correct model in your head of how software behaves at runtime, especially on a given OS&#x2F;hw combination. this has the advantage of being language-agnostic and doesnt require tools. if you ALSO have tools, great. but in many cases you don&#x27;t need them, at least if the right person is doing the right kind of thinking, at design&#x2F;code&#x2F;test time.
nice_byte将近 9 年前
My advice would be to avoid developing large projects in Python, or any other language without static typing.
dochtman将近 9 年前
Lots (and I mean, lots!) of automated tests, until you have 95%+ coverage.
dukoid将近 9 年前
Simple: Don&#x27;t use Python (or any other untyped language) :)
carapace将近 9 年前
Twenty or more years of experience.<p>-----<p>Am I wrong?