The question is simple. I wanted to get a general consensus on if people actually audit the code that they use from FOSS or open source software or apps.
Do you blindly trust the FOSS community? I am trying to get a rough idea here. Sometimes audit the code? Only on mission critical apps? Not at all?
Let’s hear it!
I know lemmy hates AI but auditing open source code seems like something it could be pretty good at. Maybe that’s something that may start happening more.
This is one of the few things that AI could potentially actually be good at. Aside from the few people on Lemmy who are entirely anti-AI, most people just don’t want AI jammed willy-nilly into places where it doesn’t belong to do things poorly that it’s not equipped to do.
Those are silly folks lmao
Exactly, fuck corporate greed!
Eh, I kind of get it. OpenAI’s malfeasance with regard to energy usage, data theft, and the aforementioned rampant shoe-horning (maybe “misapplication” is a better word) of the technology has sort of poisoned the entire AI well for them, and it doesn’t feel (and honestly isn’t) necessary enough that it’s worth considering ways that it might be done ethically.
I don’t agree with them entirely, but I do get where they’re coming from. Personally, I think once the hype dies down enough and the corporate money (and VC money) gets out of it, it can finally settle into a more reasonable solid-state and the money can actually go into truly useful implementations of it.
I mean that’s why I call them silly folks, that’s all still attributable to that corporate greed we all hate, but I’ve also seen them shit on research work and papers just because “AI” Soo yea lol
I don’t hate AI, I hate how it was created, how it’s foisted on us, the promises it can do things it really can’t, and the corporate governance of it.
But I acknowledge these tools exist, and I do use them because they genuinely help and I can’t undo all the stuff I hate about them.
If I had millions of dollars to spend, sure I would try and improve things, but I don’t.
Daniel Stenberg claims that the curl bug reporting system is effectively DDOSed by AI wrongly reporting various issues. Doesn’t seem like a good feature in a code auditor.
I’ve been on the receiving end of these. It’s such a monumental time waster. All the reports look legit until you get into the details and realize it’s complete bullshit.
But if you don’t look into it maybe you ignored a real report…
I’m writing a paper on this, actually. Basically, it’s okay-ish at it, but has definite blind spots. The most promising route is to have AI use a traditional static analysis tool, rather than evaluate the code directly.
That seems to be the direction the industry is headed in. GHAzDO and competitors all seem to be converging on using AI as a force-multiplier on top of the existing solutions, and it works surprisingly well.
‘AI’ as we currently know it, is terrible at this sort of task. It’s not capable of understanding the flow of the code in any meaningful way, and tends to raise entirely spurious issues (see the problems the curl author has with being overwhealmed for example). It also wont spot actually malicious code that’s been included with any sort of care, nor would it find intentional behaviour that would be harmful or counterproductive in the particular scenario you want to use the program.
Having actually worked with AI in this context alongside github/azure devops advanced security, I can tell you that this is wrong. As much as we hate AI, and as much as people like to (validly) point out issues with hallucinations, overall it’s been very on-point.
Could you let me know what sort of models you’re using? Everything I’ve tried has basically been so bad it was quicker and more reliable to to the job myself. Most of the models can barely write boilerplate code accurately and securely, let alone anything even moderately complex.
I’ve tried to get them to analyse code too, and that’s hit and miss at best, even with small programs. I’d have no faith at all that they could handle anything larger; the answers they give would be confident and wrong, which is easy to spot with something small, but much harder to catch with a large, multi process system spread over a network. It’s hard enough for humans, who have actual context, understanding and domain knowledge, to do it well, and I’ve, personally, not seen any evidence that an LLM (which is what I’m assuming you’re referring to) could do anywhere near as well. I don’t doubt that they flag some issues, but without a comprehensive, human, review of the system architecture, implementation and code, you can’t be sure what they’ve missed, and if you’re going to do that anyway, you’ve done the job yourself!
Having said that, I’ve no doubt that things will improve, programming languages have well defined syntaxes and so they should be some of the easiest types of text for an LLM to parse and build a context from. If that can be combined with enough domain knowledge, a description of the deployment environment and a model that’s actually trained for and tuned for code analysis and security auditing, it might be possible to get similar results to humans.
Its just whatever is built into copilot.
You can do a quick and dirty test by opening copilot chat and asking it something like “outline the vulnerabilities found in the following code, with the vulnerabilities listed underneath it. Outline any other issues you notice that are not listed here.” and then paste the code and the discovered vulns.
Lots of things seem like they would work until you try them.
I’m actually planning to do an evaluation of a n ai code review tool to see what it can do. I’m actually somewhat optimistic that it could do this better than it can code
I really want to sic it on this one junior programmer who doesn’t understand that you can’t just commit ai generated slop and expect it to work. This last code review after over 60 pieces of feedback I gave up on the rest and left it as he needs to understand when ai generated slop needs help
Ai is usually pretty good at unit tests but it was so bad. Randomly started using a different mocking framework, it actually mocked entire classes and somehow thought that was valid to test them. Wasting tests on non-existent constructors no negative tests, tests without verifying anything. Most of all there were so many compile errors, yet he thought that was fine
It wouldn’t be good at it, it would at most be a little patch for non audited code.
In the end it would just be an AI-powered antivirus.