Seq – A programming language for computational genomics and bioinformatics

116 点作者 tdido超过 3 年前

17 条评论

I am a CS person who works with bioinformaticians every day as part of my job.I really like that Seq seems to have built-in some parallelization ability. I spend no small amount of time in my day job doing that manually in R with RcppParallel for loops that are totally independent across each iteration.Bioinformaticians are often educated to use a specific programming language and environment. They aren't usually looking to try other languages. For example, I support our bioinformatics group and they are basically 100% R and RStudio users. We have a single user of Python and that user is doing "typical" tensorflow stuff with images.I've noticed this same bias towards a single language for some other academic niches. Like SAS or Stata camps in public health or psychology - I think of these languages as basically the same, but for non-CS folks the perception seems to be more like English vs Russian.Even more complicated, researchers may be extremely committed to a specific library in a language and suspicious of languages that don't have their favorite library available.Any shift to new tooling for these highly-committed users will almost certainly require large and obvious benefits to gain traction.

评论 #28541631 未加载

评论 #28541102 未加载

评论 #28542409 未加载

encode超过 3 年前

Also see this comparison between Julia's BioSequences and Seq by Jakob Nissen and Ben Ward: <a href="https://biojulia.net/post/seq-lang/" rel="nofollow">https://biojulia.net/post/seq-lang/</a>

评论 #28538308 未加载

评论 #28539481 未加载

bscphil超过 3 年前

> Seq is a Python-compatible language, and the vast majority of Python programs should work without any modifications> Seq is able to outperform Python code by up to 160x.So ... a reimplementation of Python that can outperform cpython by over 100 times? I know literally nothing about this project, but I have to say that rings pretty false for me. Hell, even PyPy has trouble with many applications. (Plus they're claiming to outperform "equivalent" C code by 2x.)Even if the performance claims are overblown, it's always nice to see new work on compiled languages with easy-to-read syntax. It's hard to beat Python for an education / prototyping language, so I will definitely be giving this a look.

评论 #28537922 未加载

评论 #28537784 未加载

评论 #28538632 未加载

评论 #28538449 未加载

评论 #28543518 未加载

评论 #28537873 未加载

arshajii超过 3 年前

Hi everyone, I’m one of the developers on the Seq project — I was delighted to see it posted here! We started this project with a focus on bioinformatics, but since then we’ve added a lot of language features/libraries that have closed the gap with Python by a decent margin, and Seq today can be useful in other areas or even for general Python programs (although there are still limitations of course). We’re in the process of creating an extensible / plugin-able Python compiler based on Seq that allow for other domain-extensions. The upcoming release also has some neat features like OpenMP integration (e.g. “@par(num_threads=10) for i in range(N): …” will run the loop with 10 threads). Happy to answer any questions!

评论 #28541780 未加载

fuzzythinker超过 3 年前

Used it for coding Coursera/Stepik's Bioinformatics course [1] when it was first announced 2 years ago.Not claiming it as any sort of reference, but you can see how it [2] may be used to solve some basic genome sequencing.[1] <a href="https://www.coursera.org/specializations/bioinformatics" rel="nofollow">https://www.coursera.org/specializations/bioinformatics</a>[2] <a href="https://github.com/fuzzthink/seq-genomics" rel="nofollow">https://github.com/fuzzthink/seq-genomics</a>

fwip超过 3 年前

It's an impressive project, but I'm not sure the niche is big enough. It's certainly come a long way since the last time I looked at it!My biggest concern is that Seq sucks users into a sort of local maximum. While piping syntax is nice, and the built-in routines are handy, it's a lot less flexible than a "mainstream" programming language, simply because of the smaller community and relative paucity of libraries. BioPython[1] has been around a long long time, and I think a lot of potential users of Seq would be better suited by using a regular bioinformatics library in the language they know best.e.g: The example of reading Fasta files in Seq:<pre><code> # iterate over everything for r in FASTA('genome.fa'): print r.name print r.seq </code></pre> versus BioPython:<pre><code> from Bio import SeqIO for r in SeqIO.parse("genome.fa", "fasta"): print(r.id) print(r.seq) </code></pre> It might be pretty useful as a teaching tool, but I'm skeptical of its long-term benefit to professionals. I'm not sure the ecosystem of Seq users will be large enough, y'know? Again, it's pretty impressive work, and it's come a long way. I wish the devs all the best. :)1. <a href="https://biopython.org/" rel="nofollow">https://biopython.org/</a>

评论 #28546218 未加载

totalperspectiv超过 3 年前

It’s odd that they didn’t include Nim in the benchmarks in their paper: <a href="https://dl.acm.org/doi/pdf/10.1145/3360551" rel="nofollow">https://dl.acm.org/doi/pdf/10.1145/3360551</a>

评论 #28538604 未加载

dekhn超过 3 年前

Typically, any high performance (low latency or high throughput) genomics/bioinformatics applicaiton is not going to be written in plain Python, except possibly for prototyping. Instead, nearly all codes today are written in C++ or Java, with some sort of command and control in Python or a DAG-based workflow scheduler.I don't expect the community will adopt other languages at a large scale. My hope, though, is that more of these algorithms move to real distributed processing systems like Spark, to take advantage of all the great ideas in systems like that. But genomics will continue to trail the leading edge by about 20 years for the foreseeable future.

评论 #28540283 未加载

评论 #28543595 未加载

f6v超过 3 年前

> Think of Seq as a strongly-typed and statically-compiled Python: all the bells and whistles of Python, boosted with a strong type system, without any performance overhead.A pitch most people doing applied bioinformatics won’t understand/appreciate.

car超过 3 年前

Looks great, will definitely give this a try since it does sequence manipulations that I otherwise have to write myself.Will this be available via conda? And how would seq integreate with Snakemake, since that is also based on Python?

评论 #28537458 未加载

haihaibye超过 3 年前

I'm in the target market but can't use this unless it supports all of my Python libraries like Django and Numpy.It seems to me there is a huge demand for making Python faster, whether it be via making a more optimisation friendly subset, or ideally throwing engineering talent into improving the interpreter.V8 shows this can be done with highly dynamic Javascript. I guess we need a big corporate sponsor or the community to fund some positions.It's kind of crazy how few developers are working on optimising cPython, it may even be a worth it for environmental reasons.

Bostonian超过 3 年前

The code examples look like Python 2 rather than Python 3. Print does have not parentheses. Why was this decision made?

评论 #28544942 未加载

评论 #28544944 未加载

kasperset超过 3 年前

I like this idea. However to me it is similar to using à la carte tools/programs along with bash script or DSL such as Nextflow. More often these stand-alone programs are already written in compiled languages. I am sure Seq will allow to build customized programs as compared to scripting or gluing programs.

tdido超过 3 年前

See also:<a href="https://dl.acm.org/doi/pdf/10.1145/3360551" rel="nofollow">https://dl.acm.org/doi/pdf/10.1145/3360551</a><a href="https://www.nature.com/articles/s41587-021-00985-6" rel="nofollow">https://www.nature.com/articles/s41587-021-00985-6</a> (paywalled)

gandalfgeek超过 3 年前

Quick explainer video: <a href="https://youtu.be/5bk4Wc5Op2M" rel="nofollow">https://youtu.be/5bk4Wc5Op2M</a>

chmaynard超过 3 年前

I'm wondering if Seq can also serve as a general-purpose replacement for Python whenever a fast executable is needed.

评论 #28541752 未加载

评论 #28539487 未加载

jack_riminton超过 3 年前

How do you pronounce Seq?

评论 #28540642 未加载