> "super case insensitive" lets say someone would make a plugin for their favori...

skrebbel · on Sept 3, 2024

Hm I'd go even simpler than that. Notably, I'd not do this:

> So the user would enter "first name" into the plugin's search field.

Why wouldn't the user just enter "first_name" or "firstName" or something like that? I'm thinking about situations like, you're looking at backend code that's snake_cased, but you also want it to catch frontend code that's camelCased. So when you search for "first_name" you automagically also match "firstName" (and "FirstName" and "first-name" and so on). I wouldn't personally introduce some convention that adds spaces into the mix, I'd simply convert anything that looks snake/kebab/pascal/camel-cased into a regex that matches all 4 forms.

Could even be as stupid as converting "first_name" or "firstName", or "FirstName" etc into "first_name|firstname|first-name", no character classes needed. That catches pretty much every naming convention right? (assuming it's searched for with case insensitivity)

specialist · on Sept 3, 2024

> "first_name" or "firstName"

Ya. Query tokenizer would emit "first" and "name" for both. That'd be neat.

__MatrixMan__ · on Sept 3, 2024

Shame on me for jumping past the simple solutions, but...

If you're going that far, and you're in a context which probably has a parser for the underlying language ready at hand, you might as well just convert all tokens to a common format and do the same with the queries. So searches for foo-bar find strings like FooBar because they both normalize to foo_bar.

Then you can index by more than just line number. For instance you might find "foo" and "bar" even when "foo = 6" shows up in a file called "bar.py" or when they show up on separate lines but still in the same function.

inanutshellus · on Sept 3, 2024

IIUC, you're not missing anything though your interpretation is off from mine*. He wasn't saying it'd be hard, he was saying it should be done.

* my understanding was simply that the regex would (A) recognize `[a-z][A-Z]` and inject optional _'s and -'s between... and (B) notice mid-word hyphens or underscores and switch them to search for both.

marcosdumay · on Sept 3, 2024

The best way would be to make an escape code that matches zero or one punctuation.

So you's search for "/first\_name/i".

Izkata · on Sept 3, 2024

That already exists as "?" and was used in their example:

  /first[-_]?name/i

Or to use your example, just checking for underscores and not also dashes:

  /first_?name/i

Backslash is already used to change special characters like "?" from these meanings into just "use this character without interpreting it" (or the reverse, in some dialects).

kiitos · on Sept 3, 2024

It would be a mistake to try to solve this problem with regexes.