Hacker Newsnew | past | comments | ask | show | jobs | submit | leppert's commentslogin

Thanks Andy. We're also in the midst of releasing the Collaborative Open Legal Dataset ("COLD cases") specifically aimed at AI/ML work. More tk. https://huggingface.co/datasets/harvard-lil/cold-cases


Any idea how complete the COLD dataset is compared to the Caselaw Access Project?

Unfortunately Caselaw limits access to the full text bulk data of most jurisdictions without a research account and I’m trying to find an alternative.


COLD is bigger than Caselaw Access Project. It's over 8 million cases vs 6.9 on case.law.

Please let me know how you find using the data and if you'd like to see any additions or changes!


> Please let me know how you find using the data and if you'd like to see any additions or changes!

I only just started yesterday but so far so good!

Having it in Parquet files on Git LFS makes a huge difference. It only took a few lines to add the entire dataset to our CI/CD cache which is an improvement over the ingestion scripts we have to normally write with change detection and all that. It took less than an hour to start running the cases through our pipeline - I wish all of the GovInfo bulk data were available this way!


I'm really glad you appreciate that!


That's rad. I'll have to dig into it.


> Unfortunately Caselaw limits access to the full text bulk data of most jurisdictions without a research account

That's only true until Feb of 2024! It should be totally unrestricted in a few months.


Great to know, thank you!

Any idea why they're time limited? I assumed that was a license restriction from reporters or FastCase, et. al. which would have been permanent.


To be clear, they are all in the public domain— Fastcase updates included. All of the proprietary info was redacted by hand and the opinions themselves are not copyrightable. The throttling is a contractual obligation to a project partner that limits Harvard's distribution of the cases until Feb of 2024, but that's it. There are also exceptions— cases where the publication is no longer in copyright, and jurisdictions that already publish their opinions online... There are 3 or 4. Those are accessible without throttling through the API and through bulk downloads right now.

This should have more up-to-date and accurate information than I do: https://case.law/about


Rad!


Harvard's Library Innovation Lab | Senior Product Designer | Full-time & remote friendly | Cambridge, MA | https://lil.law.harvard.edu/jobs/#uxd

Harvard's Library Innovation Lab is a product studio building open-source tools and services for open knowledge, available to everyone and in the public interest.

We're looking for a Senior Product Designer to join our product team and help lead the brand and visual development of our lab and its products. Promising candidates will have a strong visual design portfolio and experience in user research, brainstorming, and wireframing products from inception to high fidelity. This is an opportunity for a design lead to utilize their craft toward helping positive ideas travel further and faster.

Apply at the link above or feel free to send us questions at lil@law.harvard.edu


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: