Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’ve been beta testing this for several months. It’s OK. The notes it generates are too verbose for most medical notes even with all the customization enabled. Most medical interviews jump around chronologically and Dragon Copilot does a poor job of organizing that, which means I had to go back and edit my note which kind of defeated the purpose of the app in the first place.

It does a really good job with recognizing medications though, which most-patients butcher the name on.

Hallucinations are present, but usually they’re pretty minor (screwing up gender, years).

It doesn’t really seem to understand what the most important part of the conversation is, it treats all the information equally as important when that’s not really the case. So you end up with long text of useless information that the patient thought was useful but not at all relevant to their current presentation. That’s where having an actual physician is useful to parse through what is important or not.

At baseline it doesn’t take me long to write a note so it really wasn’t saving me that much more time.

What I do use it for is recording the conversation and then referencing back to it when I’m writing the note. Useful to “jog my memory” in a structured format.

I have to put a disclaimer in my note saying that I was using it. I also have to let the patient know upfront that the conversation is getting recorded and I’m testing something for Microsoft, etc. etc. You can tell who the programmer patients are because they immediately ask if it’s “copilot“ lol



I've been helping test it as well - your experience sounds identical to mine. I was initially very excited for it, but nowadays I don't really bother turning it on unless I feel the conversation will be a long one. Although I am very much looking forward to them rolling out the automated pending of orders based on what was said during the conversation.

LLM's have so much potential in medicine, and I think one of the most important applications they will have is the ability to ingest a patient's medical chart within their context and present key information to clinicians that would've otherwise been overlooked in the bloated mess that most EMR's are nowadays (including Epic).

There's been so many times where I've found critically important details hidden away as a sidenote in some lab/path note overlooked for years that very likely could've been picked up by an LLM. Just a recent example - a patient with repeated admissions over the years due to severe anemia, would usually be scoped and/or given a transfusion without much further workup and discharged once Hgb >7. Blood bank path note from 10 years ago mentions presence of warm autoantibodies as a sidenote; for some reason the diagnosis of AIHA is never mentioned nor carried forward in their chart. A few missed words which would've saved millions of dollars in prolonged admissions and diagnostic costs over the years.


Given everything I hear about LLMs for similar summary purposes including your description and that given above, it seems unlikely that the LLM would be all that likely to “notice” a side note in a huge chart. I agree that’d be great but I’m curious why you think it would necessarily pick up on that sort of thing.


> it seems unlikely that the LLM would be all that likely to “notice” a side note in a huge chart

I respectfully disagree - I think LLM's have already made significant advances in this area as shown in the various "needle in a haystack" demonstrations we've seen over the past couple years. I've already been impressed by the minute but relevant details they can "recall" after being fed very dense journal articles and the technology is only getting better. Also keep in mind that the raw text / "data" itself found in many patients' charts is not always that expansive (though it certainly can be for patients with recurrent admissions). It's more an issue of finding the actual information given that EMR's are a nightmare to navigate effectively.

Hallucinations are always a consideration too, but any implementation of the sort I mentioned before would certainly contain in-text backlinks to actual notes in the EMR. Epic already does this with their basic text search function. So I don't think hallucinations would be too problematic as clinicians should always be verifying this type of information at the source in good practice.


> A few missed words which would've saved millions of dollars in prolonged admissions and diagnostic costs over the years.

I don't mean to come off antagonistic here. But surely the more important benefit is the patient who would've avoided years of sickness and repeated hospital visits?


> But surely the more important benefit is the patient who would've avoided years of sickness and repeated hospital visits?

The patient experience is always important and maybe I could've been less implicit in what I wrote. I think I was focusing more on the collective/societal impact this would have, which I felt would resonate more with the readers here.


As a patient with an under-served condition I quite often focus on the financial rather than human cost of not having a better system of care when talking about it.

If someone’s going to object to improving the system it’s mostly likely going to be on grounds of cost.


I don't know, if it was really millions of dollars for a single patient - I wouldn't pay a few million dollars to avoid a few bouts of illness for a random member of my health insurance group scheme cohort, and that seems like the correct comparison to make. Increase the costs by another order of magnitude and I'd rather let them die.


I think that is that "prolonged admissions" was meant to cover


But it's "millions of dollars in prolonged admissions", not just "prolonged admissions". The point is the financial cost, not the wellness of the patient.


I just wanted to jump in and say - don't give them too much credit on transcribing medication, I'm guessing this is Deepgram behind the scenes and their medication transcription works pretty well out of the box in my experience.


Screwing up gender and years sounds pretty serious to me?


Maybe they mean that it either doesn’t matter in context or it’s easy to catch and correct. Either way it seems reasonable to trust the judgement of the professional reporting on their experience with a new tool.


I worry that we'll get complacent and not check details like that when they are important, not just the medical field but everywhere.


Yes, I think this is likely

The OP says there aren't many hallucinations, but I think that observation is almost impossible to verify. It relies on the person making it to have a very strong ability to notice hallucinations when they happen

Most people do not have the attention to detail to really spot inaccuracies consistently. Even when someone is very good at this normally all it takes is being overtired or stressed or distracted and the rate of misses will go way up

I trust coworkers to write good code more than I trust them to do good code review, because review is arguably a harder skill

Similarly, I think reviewing ML output is harder than creating your own things, and I think relying on them in this way is going to be disastrous until they are more trustworthy


> I worry that we'll get complacent and not check details like that when they are important, not just the medical field but everywhere.

The performance level goes up all the time, it won't be this bad for long.



If it is easy to catch and correct, why cannot Copilot do it? Sounds like something that it should know.


Like with every Microsoft product, the testing is done by the user.


It's more in scenarios where I enter the room and I ask the patient whether this is their wife/husband etc. It's not like I'm going into the room and saying "hello patient you appear to be a human female". The model is having difficulty figuring out who actors are if their are multiple different people talking. Not a big issue if all you're doing is rewriting information. But if multi-modal context is required, its not the best.


The notes it generates are too verbose for most medical notes even with all the customization enabled.

I've noticed that seems to be a common trend for any AI-generated text in general.


I think this might be because of what GP said later:

> it treats all the information equally as important when that’s not really the case

In the general case (and I imagine, in the specific case of GP), the model doesn't have any prior to weigh the content - people usually just prompt it with "summarize this for me please <pasted link or text>"[0], without telling it what to focus on. And, more importantly, you probably have some extra preferences that aren't consciously expressed - the overall situational context, your particular ideas, etc. translate to a different weighing that the model has, and you can't communicate that via the prompt.

Without a more specific prior, the model has to treat every information equally, and this also means erring on the side of verbosity, as to not omit anything the user may care about.

--

[0] - Or such prompt is hidden in the "AI summarizer" feature of some tool.


Are they charging per token


Same for AI coding assistants, most tools generate way too much unnecessary code. Scary part is that the code seems to be running OK.


Yes, the biggest problem with Healthcare AI assistants right now is that there is no way to "prompt" the AI on what a physician needs in a given scenario - eg. "only include medically relevant information in HPI", "don't give me a layman explanation of radiographic reports", "include direct patient quotes when a neurological symptom is being described" etc.

And the prompt landscape in the field is vast. And fascinating. Every specialist has their own preference for what is important to include in a note vs what should be excluded; and this preference changes by disease - what a neurologist want in an epilepsy note is very different from what they need in a dementia note for eg.

Note preferences also change widely between physicians, even in the same practice and same specialty! I'm the founder of Marvix AI (www.marvixapp.ai), an AI assistant for specialty care, we work with several small specialty care practices where every physician has their own preferences on which details they want to retain in their note.

But if you can get the prompts to really align with a physician's preferences, this tech is magical - physicians regularly confess to us that this tech saves them ~2 hours every day. We have now had half a dozen physicians tell us in their feedback calls that their wives asked them to communicate their 'thanks' to us for getting their husbands back home for dinner on an important occasion!

[Edit: typo and phrasing]


> there is no way to "prompt" the AI on what a physician needs in a given scenario - eg. "only include medically relevant information in HPI", "don't give me a layman explanation of radiographic reports", "include direct patient quotes when a neurological symptom is being described" etc.

There is, it is called RLHF.


We tried it at my job, I got us in the beta. Go try Nudge AI and tell me what you think. Our providers found Nudge to be a far better product at a fifth of the price.


> Hallucinations are present, but usually they’re pretty minor (screwing up gender, years).

And if all hospitals were doing was having doctors treat patients, this would be ok. But healthcare is fueled by these "minor" details and this will result in delays in payment and reimbursent, trouble with patient identification, corruption of clinical coding, etc.


Did you encounter any instances of hallucinations or omissions?

One would image those to be the biggest dangers.


Hallucinations are pretty minimal but present. Some lazy physicians are gonna get burned by thinking they can just zone out during the interview and let this do all the work.

I edited my original post. Omissions are less worrisome, it’s more about too much information being captured which isn’t relevant. So you get these super long notes and it’s hard to separate the “wheat from the chaff”.


> Some lazy physicians are gonna get burned by thinking they can just zone out during the interview and let this do all the work.

How many patients are going to that physician, they’re going to get burned too


Seems like capturing too much irrelevant detail would be preferable to potentially missing important details, though?


When people are nervous/scared (as they are with doctors) they start rambling about all sorts of things. I had somebody rambling about the Lion King musical versus the lion King cartoon recently, instead of telling me about their recent heart attack. The ‘art’ of medicine is redirecting the patient to provide the most important/relevant information.

In addition, doctors have finite attention span/hours of the day. Nobody wants to read paragraphs upon paragraphs of information.


it's not minor when they screw up dosage




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: