ericol's comments | Hacker News [proxy]

If "AI" were doing anything more than repeating content from the web without attribution, I might agree with you.

I regularly (Say, once a month) do a comparison of results across all Claude, Gemini and ChatGPT. Just for reasons, not that I want to see if there's any benefit in changing.

It's not "fair" in that I pay for Claude [1] and not for the others, so models availability is not complete except for Claude.

So I did like things at time in the form of how they were presented, I came to really like Sonnet's "voice" a lot over the others.

Take into account Opus doesn't have the same voice, and I don't like it as much.

[1] I pay for the lower tier of their Max offering.

> The long-term effect is less clear. If we generate more code, faster, does that reduce cost or just increase the surface area we need to maintain, test, secure, and reason about later?

My take is that the focus is mostly oriented towards code, but in my experience everything around code got cheaper too. In my particular case, I do coding, I do DevOps, I do second level support, I do data analysis. Every single task I have to do is now seriously augmented by AI.

In my last performance review, my manager was actually surprised when I told him that I am now more a manager of my own work than actually doing the work.

This also means my productivity is now probably around 2.5x what it was a couple of years ago.

> In my last performance review, my manager was actually surprised when I told him that I am now more a manager of my own work than actually doing the work.

I think this is very telling. Unless you have a good manager who is paying attention, a lot of them are clueless and just see the hype of 10x ing your developers and don't care about the nuance of (as they say) all the surrounding bits to writing code. And unfortunately, they just repeat this to the people above them, who also read the hype and just see $$ of reducing headcount. (sorry, venting a little)

He definitely was paying attention.

He had to pause for a second there, arrested by the realization, and was one of the reasons I got an "Exceeds expectations" in one of my KRAs.

It is interesting though that he evidently didn't notice this 2.5X productivity increase until you pointed it out to him.

This has been my experience, too. In dealing with hardware, I'm particularly pleased with how vision models are shaping up; it's able to identify what I've photographed, put it in a simple text list, and link me to appropriate datasheets. yday, it even figured out how I wanted to reverse engineer a remote display board for a just-released inverter and correctly identified which pin of which unfamiliar Chinese chip was spitting out the serial data I was interested in; all I actually asked for was chip IDs with a quick vague note on what I was doing. It doesn't help me solder faster, but it gets me to soldering faster.

A bit OT, but I would love to see some different methods of calculating economic productivity. After looking into how BLS calculates software productivity, I quit giving weight to the number altogether and it left me feeling a bit blue; they apply a deflator in part by considering the value of features (which they claim to be able to estimate by comparing feature sets and prices in a select basket of items of a category, applying coefficients based on differences); it'll likely never actually capture what's going on in AI unless Adobe decides to add a hundred new buttons "because it's so quick and easy to do." Their methodology requires ignoring FOSS (except for certain corporate own-account cases), too; if everyone switched from Microsoft365 to LibreOffice, US productivity as measured by BLS would crash.

BLS lays methodology out in a FAQ page on "Hedonic Quality Adjustment"[1], which covers hardware instead of software, but software becomes more reliant on these "what does the consumer pay" guesses at value (what is the value of S-Video input on your TV? significantly more than supporting picture-in-picture, at least in 2020).

[1] https://www.bls.gov/cpi/quality-adjustment/questions-and-ans...

> Having the LLM write down a skill representing the lessons from the struggle you just had to get something done is more typical (I hope) and quite different from what they're referring to

Just as of last week I had Claude build me a skill when I ask it to help me troubleshoot issues, and it came out quite good.

It did had some issues (Claude tends to o er specify over anecdotal data) but it's a strong step in the right direction.

Also, "skills" are too broad in my opinion. I have one (that Claude wrote) with my personal data that I have available when I analyze my workouts.

I think there's ample room for self-generated skills when you use a rather long exchange on a domain you plan to revisit, _specially_ when it comes to telling Claude what not to do.

I recently had to create a MySQL shim for upgrading a large PHP codebase that currently is running in version 5.6 (Don't ask)

The way I aimed at it (Yes, I know there are already existing shims, but I felt more comfortable vibe coding it than using something that might not cover all my use cases) was to:

1. Extract already existing test suit [1] from the original PHP extensions repo (All .phpt files)

2. Get Claude to iterate over the results of the tests while building the code

3. Extract my complete list of functions called and fill the gaps

3. Profit?

When I finally got to test the shim, the fact that it ran in the first run was rather emotional.

[1] My shim fails quite a lot of tests, but all of them are cosmetics (E.g., no warning for deprecation) rather than functional.

This applies to many different things, depending on the pair of languages you are using.

In Spanish the closes approximation would be "ni mal ni bien" (Not bad not wrong) but I understand the Chinese expression has a strong lean on "not being wrong".

Not so long ago (I'm 50+, Spanish native speaker, and I've spoken English for the past 30 years almost daily) I learnt about "accountability".

Now before I get a barrage of WTFs, the situation is that in Spanish we only have "Responsabilidad" and that accounts for both responsibility and accountability, with a strong lean on responsibility.

So basically we recognise what is it to be responsible of something, but being accountable is seriously diluted.

The implications of this are enormous, and this particular though exercise I'd leave for people that spend more time thinking about these things than I do.

> Ok, so does anyone remember 'Watson'? It was the chatgpt before chatgpt. they built it in house

I do. I remember going to a chat once where they wanted to get people on-board in using it. It was 90 minutes of hot air. They "showed" how Watson worked and how to implement things, and I think every single person in the room knew they were full of it. Imagine we were all engineers and there were no questions at the end.

Comparing Watson to LLMs is like comparing a rock to an AIM-9 Sidewinder.

Watson was nothing like ChatGPT. The first iteration was a system specifically built to play Jeopardy. It did some neat stuff with NLP and information retrieval, but it was all still last generation AI/ML technology. It then evolved into a brand that IBM used to sell its consulting services. The product itself was a massive failure because it had no real applications and was too weak as a general purpose chat bot.

I had no idea about what Watson was initially meant to solve.

I do remember they tried to sell it - at least in the meeting I went - as a general purpose chatbot.

I did try briefly to understand how to use it, but the documentation was horrendous (As in, "totally devoid of any technical information")

Watson was intended to solve fuzzy optimization problems.

Unfortunately, the way it solved fuzzy was 'engineer the problem to fit Watson, then engineer the output to be usable.'

Which required every project to be a huge custom implementation lift. Similar to early Palantir.

> Watson was intended to solve fuzzy optimization problems.

> Unfortunately, the way it solved fuzzy was 'engineer the problem to fit Watson, then engineer the output to be usable.'

I'm going to review my understanding of fuzzy optimization because this last line doesn't fit the bill in it.

The reason LLMs are viable for use cases that Watson wasn't is their natural language and universal parsing strengths.

In the Watson era, all the front- and back-ends had to be custom engineered per use case. Read, huge IBM services implementation projects that the company bungled more often than not.

Which is where the Palantir comparison is apt (and differs). Palantir understood their product was the product, and implementation was a necessary evil, to be engineered away ASAP.

To IBM, implementation revenue was the only reason to have a product.

> Read, huge IBM services implementation projects that the company bungled more often than not

Well this is _not_ what they wanted to sell in that talk.

But the implementation shown was über vanilla, and once I got home the documentation was close to un existent (Or, at least, not even trying to be what the docs for such a technology should be).

I had to reason with my brain before it would accept it as a triangle. It has 3 sides and 3 corners so...

It's one of those things where it's technically correct but the headline is misleading. When you say "a triangle" without any qualification as the headline does, people are going to interpret that as a good old fashioned triangle. Using the term without clarification that you mean spherical geometry is kind of underhanded writing, imo.

The title attribute of the article is <title>A hyperbolic triangle with three cusps</title>

I think it's just a normal ages-old pattern for writing headlines that pique people's curiosity. It's super common in popular math in particular, because math is always about generalizing. There's a fine line between that and actual clickbait meant to actively mislead.