, pub-5618279750012654, DIRECT, f08c47fec0942fa0

Determining the sets you have in a box of random parts

Posted by Huw, 25 Apr 2021 09:30

59072 lead


Ben Nicholson (bnic99) is currently in his 3rd year at Newcastle University studying Computer Science. For his dissertation project he has created a piece of software to solve a problem many of us have encountered at one time or another, and he’d appreciate your feedback:

Picture this, you buy a box of random LEGO parts from a second hand seller, and you wish to know what LEGO sets you are now in possession of. Where would you start? What would you do? A large box of parts can be very daunting especially if you don’t know what you are looking for…

This was the situation I was in one day when looking through a box of LEGO at my grandma’s. After a short while of scavenging through various parts I found part 2626 Boat, Bow Brick 6x 6×1 in old light grey, which was only ever available in 2 sets.

After a bit more searching and finding 6104 wing 8X8 in yellow and 30356 wing 6X12, left in old light grey I was reasonably confident that the set it had come in was 7141 Naboo Fighter.


This gave me an idea, would it be possible to program an algorithm that you could enter what parts you find then calculate the chances of which sets you might have.

When it was time to decide what to do for my University dissertation that is what I did, and I have made a system where the user can enter parts using drop down menus to search by type. For example, for part 2626 the user could search by tags for “Wedge”, “Brick”, and “Slope”.

Once selecting the correct part the user will also have to select the colour that part is before adding it.

Once a number of parts have been added users can then click the “show probability” button to get the system’s analysis on what sets are in the box based on what parts they have added.

Viewing the percentages allows users to view a picture of the suggested sets and confirm that it is in the box. Once confirmed the system will make the assumption that all parts found in that set will be used for reconstructing it and take them out of consideration when making new calculations.

I have been testing this system by getting others to try to identify which 10 sets I have broken up and put in to a box. With this I have found that the system is commonly able to identify 80%+ of the sets being tested on in just over an hour, with the sets not found being small polybag type sets containing just a few parts, none of which are rare.

The system works by communicating with the Rebrickable database to gather lists of sets that a part is in once the user has entered one. It will then check if any of these sets have already had one of their parts found and if they have it will update the chance of that set.

The program then calculates the probability of sets being owned proportional to the rarity of parts that have been found contained within it. For example a unique part would give the set it is from a 100% probability of being owned as the part could not have come from any other set, whereas common bricks will only raise the probability of sets a smaller amount.

At present the program only has a small subset of parts to search and add from, this is due to currently having to write them out manually along with a list of tags that the user can search by. In the future I plan to expand this subset so that many more parts are available.

I also plan to rework the user interface to make it more user-friendly for a better experience since the current one is just a prototype, as I was mainly interested in if the maths would work and the sets were able to be identified. Another reason for this interface is because I initially planned on using image recognition as the input device, however was not able to get that working and had to resort to manual input as a back-up.

I am also considering creating a website to provide convenience and easy access for people who wish to use it.

So, my questions to you are:

  • Is this an analyser that you would find useful and would use?
  • What ways would you like to see this system develop if I continue working on it and make it available for everyone to use?

Any feedback given will help in my dissertation. Thank you!

You May Also Like